[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical
[ https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10646: -- Issue Type: Sub-task (was: New Feature) Parent: SPARK-10385 > Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. > categorical > > > Key: SPARK-10646 > URL: https://issues.apache.org/jira/browse/SPARK-10646 > Project: Spark > Issue Type: Sub-task >Reporter: Jihong MA > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical
Jihong MA created SPARK-10646: - Summary: Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical Key: SPARK-10646 URL: https://issues.apache.org/jira/browse/SPARK-10646 Project: Spark Issue Type: New Feature Reporter: Jihong MA -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10320) Kafka Support new topic subscriptions without requiring restart of the streaming context
[ https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790843#comment-14790843 ] Cody Koeninger commented on SPARK-10320: I don't think there's much benefit to multiple dstreams with the direct api, because it's straightforward to filter or match on the topic on a per-partition basis. I'm not sure that adding entirely new dstreams after the streaming context has been started makes sense. As far as defaults go... I don't see a clearly reasonable default like messageHandler has. Maybe an example implementation of a function that maintains just a list of topic names and handles the offset lookups. The other thing is, in order to get much use out of this, the api for communicating with the kafka cluster would need to be made public, and there had been some reluctance on that point previously. [~tdas] Any thoughts on making the KafkaCluster api public? > Kafka Support new topic subscriptions without requiring restart of the > streaming context > > > Key: SPARK-10320 > URL: https://issues.apache.org/jira/browse/SPARK-10320 > Project: Spark > Issue Type: New Feature > Components: Streaming >Reporter: Sudarshan Kadambi > > Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe > to current ones once the streaming context has been started. Restarting the > streaming context increases the latency of update handling. > Consider a streaming application subscribed to n topics. Let's say 1 of the > topics is no longer needed in streaming analytics and hence should be > dropped. We could do this by stopping the streaming context, removing that > topic from the topic list and restarting the streaming context. Since with > some DStreams such as DirectKafkaStream, the per-partition offsets are > maintained by Spark, we should be able to resume uninterrupted (I think?) > from where we left off with a minor delay. However, in instances where > expensive state initialization (from an external datastore) may be needed for > datasets published to all topics, before streaming updates can be applied to > it, it is more convenient to only subscribe or unsubcribe to the incremental > changes to the topic list. Without such a feature, updates go unprocessed for > longer than they need to be, thus affecting QoS. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-1304) Job fails with spot instances (due to IllegalStateException: Shutdown in progress)
[ https://issues.apache.org/jira/browse/SPARK-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-1304. --- Resolution: Won't Fix > Job fails with spot instances (due to IllegalStateException: Shutdown in > progress) > -- > > Key: SPARK-1304 > URL: https://issues.apache.org/jira/browse/SPARK-1304 > Project: Spark > Issue Type: Bug > Components: EC2 >Affects Versions: 0.9.0 >Reporter: Alex Boisvert >Priority: Minor > > We had a job running smoothly with spot instances until one of the spot > instances got terminated ... which led to a series of "IllegalStateException: > Shutdown in progress" and the job failed afterwards. > 14/03/24 06:07:52 WARN scheduler.TaskSetManager: Loss was due to > java.lang.IllegalStateException > java.lang.IllegalStateException: Shutdown in progress > at > java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66) > at java.lang.Runtime.addShutdownHook(Runtime.java:211) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1441) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187) > at > org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:77) > at > org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:51) > at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:156) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149) > at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at > org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:90) > at > org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:89) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327) > at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371) > at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:57) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95) > at > org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:94) > at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) > at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:232) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161) > at > org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102) > at org.apache.spark.scheduler.Task.run(Task.scala:53) > at > org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213) > at > org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49) > at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:724) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-3603) InvalidClassException on a Linux VM - probably problem with serialization
[ https://issues.apache.org/jira/browse/SPARK-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-3603. --- Resolution: Cannot Reproduce Resolving as "Cannot Reproduce", since this is an old issue that hasn't received any updates since 1.1.0. Please re-open and update if this is still a problem. > InvalidClassException on a Linux VM - probably problem with serialization > - > > Key: SPARK-3603 > URL: https://issues.apache.org/jira/browse/SPARK-3603 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 1.0.0, 1.1.0 > Environment: Linux version 2.6.32-358.32.3.el6.x86_64 > (mockbu...@x86-029.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red > Hat 4.4.7-3) (GCC) ) #1 SMP Fri Jan 17 08:42:31 EST 2014 > java version "1.7.0_25" > OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64) > OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode) > Spark (either 1.0.0 or 1.1.0) >Reporter: Tomasz Dudziak >Priority: Critical > Labels: scala, serialization, spark > > I have a Scala app connecting to a standalone Spark cluster. It works fine on > Windows or on a Linux VM; however, when I try to run the app and the Spark > cluster on another Linux VM (the same Linux kernel, Java and Spark - tested > for versions 1.0.0 and 1.1.0) I get the below exception. This looks kind of > similar to the Big-Endian (IBM Power7) Spark Serialization issue > (SPARK-2018), but... my system is definitely little endian and I understand > the big endian issue should be already fixed in Spark 1.1.0 anyway. I'd > appreaciate your help. > 01:34:53.251 WARN [Result resolver thread-0][TaskSetManager] Lost TID 2 > (task 1.0:2) > 01:34:53.278 WARN [Result resolver thread-0][TaskSetManager] Loss was due to > java.io.InvalidClassException > java.io.InvalidClassException: scala.reflect.ClassTag$$anon$1; local class > incompatible: stream classdesc serialVersionUID = -4937928798201944954, local > class serialVersionUID = -8102093212602380348 > at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617) > at > java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620) > at > java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at scala.collection.immutable.$colon$colon.readObject(List.scala:362) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at > java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at > java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989) > at > java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913) > at > java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796) > at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348) > at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370) > at
[jira] [Resolved] (SPARK-3949) Use IAMRole in lieu of static access key-id/secret
[ https://issues.apache.org/jira/browse/SPARK-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-3949. --- Resolution: Incomplete As of SPARK-8576, spark-ec2 can launch instances with IAM instance roles. I'm going to close this issue as "Incomplete" since it's underspecified; it would be helpful to know more specifically where you think we need support for IAM roles instead of keys (i.e. while launching the cluster? for configuring access to S3?). > Use IAMRole in lieu of static access key-id/secret > -- > > Key: SPARK-3949 > URL: https://issues.apache.org/jira/browse/SPARK-3949 > Project: Spark > Issue Type: Improvement > Components: EC2 >Affects Versions: 1.1.0 >Reporter: Rangarajan Sreenivasan > > Spark currently supports AWS resource access through user-specific > key-id/secret. While this works, the AWS recommended way is to use IAM Roles > instead of specific key-id/secrets. > http://docs.aws.amazon.com/IAM/latest/UserGuide/IAMBestPractices.html#use-roles-with-ec2 > http://docs.aws.amazon.com/IAM/latest/UserGuide/IAM_Introduction.html -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10602) Univariate statistics as UDAFs: single-pass continuous stats
[ https://issues.apache.org/jira/browse/SPARK-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-10602: -- Assignee: Seth Hendrickson > Univariate statistics as UDAFs: single-pass continuous stats > > > Key: SPARK-10602 > URL: https://issues.apache.org/jira/browse/SPARK-10602 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL >Reporter: Joseph K. Bradley >Assignee: Seth Hendrickson > > See parent JIRA for more details. This subtask covers statistics for > continuous values requiring a single pass over the data, such as min and max. > This JIRA is an umbrella. For individual stats, please create and link a new > JIRA. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10642) Crash in rdd.lookup() with "java.lang.Long cannot be cast to java.lang.Integer"
[ https://issues.apache.org/jira/browse/SPARK-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790884#comment-14790884 ] Thouis Jones commented on SPARK-10642: -- Simpler cases. Fails: {code} sc.parallelize([(('a', 'b'), 'c')]).groupByKey().lookup(('a', 'b')) {code} Works: {code} sc.parallelize([(('a', 'b'), 'c')]).groupByKey().map(lambda x: x).lookup(('a', 'b')) {code} > Crash in rdd.lookup() with "java.lang.Long cannot be cast to > java.lang.Integer" > --- > > Key: SPARK-10642 > URL: https://issues.apache.org/jira/browse/SPARK-10642 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 1.5.0 > Environment: OSX >Reporter: Thouis Jones > > Running this command: > {code} > sc.parallelize([(('a', 'b'), > 'c')]).groupByKey().partitionBy(20).cache().lookup(('a', 'b')) > {code} > gives the following error: > {noformat} > 15/09/16 14:22:23 INFO SparkContext: Starting job: runJob at > PythonRDD.scala:361 > Traceback (most recent call last): > File "", line 1, in > File "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/pyspark/rdd.py", > line 2199, in lookup > return self.ctx.runJob(values, lambda x: x, [self.partitioner(key)]) > File > "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/pyspark/context.py", > line 916, in runJob > port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, > partitions) > File > "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py", > line 538, in __call__ > File > "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/pyspark/sql/utils.py", > line 36, in deco > return f(*a, **kw) > File > "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py", > line 300, in get_return_value > py4j.protocol.Py4JJavaError: An error occurred while calling > z:org.apache.spark.api.python.PythonRDD.runJob. > : java.lang.ClassCastException: java.lang.Long cannot be cast to > java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.scheduler.DAGScheduler$$anonfun$submitJob$1.apply(DAGScheduler.scala:530) > at scala.collection.Iterator$class.find(Iterator.scala:780) > at scala.collection.AbstractIterator.find(Iterator.scala:1157) > at scala.collection.IterableLike$class.find(IterableLike.scala:79) > at scala.collection.AbstractIterable.find(Iterable.scala:54) > at > org.apache.spark.scheduler.DAGScheduler.submitJob(DAGScheduler.scala:530) > at > org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:558) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826) > at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839) > at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:361) > at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala) > at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:606) > at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231) > at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379) > at py4j.Gateway.invoke(Gateway.java:259) > at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133) > at py4j.commands.CallCommand.execute(CallCommand.java:79) > at py4j.GatewayConnection.run(GatewayConnection.java:207) > at java.lang.Thread.run(Thread.java:745) > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-4389) Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located behind NAT
[ https://issues.apache.org/jira/browse/SPARK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-4389. --- Resolution: Won't Fix I'm going to resolve this as "Won't Fix": - Akka 2.4 has not shipped yet, so this feature isn't supported in any released version. - Akka 2.4 requires Java 8 and Scala 2.11 or 2.12, meaning that we can't use it in Spark as long as we need to continue to support Java 7 and Scala 2.10. We'll replace Akka RPC with a custom RPC layer long before we'll be able to drop support for these platforms. > Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located > behind NAT > - > > Key: SPARK-4389 > URL: https://issues.apache.org/jira/browse/SPARK-4389 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 1.2.0 >Reporter: Josh Rosen >Priority: Minor > > We should set {{akka.remote.netty.tcp.bind-hostname="0.0.0.0"}} in our Akka > configuration so that Spark drivers can be located behind NATs / work with > weird DNS setups. > This is blocked by upgrading our Akka version, since this configuration is > not present Akka 2.3.4. There might be a different approach / workaround > that works on our current Akka version, though. > EDIT: this is blocked by Akka 2.4, since this feature is only available in > the 2.4 snapshot release. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10645) Bivariate Statistics for continuous vs. continuous
[ https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10645: -- Issue Type: Sub-task (was: New Feature) Parent: SPARK-10385 > Bivariate Statistics for continuous vs. continuous > -- > > Key: SPARK-10645 > URL: https://issues.apache.org/jira/browse/SPARK-10645 > Project: Spark > Issue Type: Sub-task >Reporter: Jihong MA > > this is an umbrella jira, which covers Bivariate Statistics for continuous > vs. continuous columns, including covariance, Pearson's correlation, > Spearman's correlation (for both continuous & categorical). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical
[ https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10646: -- Description: Pearson's chi-squared goodness of fit test for observed against the expected distribution. > Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. > categorical > > > Key: SPARK-10646 > URL: https://issues.apache.org/jira/browse/SPARK-10646 > Project: Spark > Issue Type: Sub-task >Reporter: Jihong MA > > Pearson's chi-squared goodness of fit test for observed against the expected > distribution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Description: In 1.5.0 there are some extra classes in the Spark docs - including a bunch of test classes. We need to figure out what commit introduced those and fix it. The obvious things like genJavadoc version have not changed. http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ [before] http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ [after] > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Priority: Critical (was: Major) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.
[ https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791128#comment-14791128 ] Apache Spark commented on SPARK-10626: -- User 'holdenk' has created a pull request for this issue: https://github.com/apache/spark/pull/8782 > Create a Java friendly method for randomRDD & RandomDataGenerator on > RandomRDDs. > > > Key: SPARK-10626 > URL: https://issues.apache.org/jira/browse/SPARK-10626 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: holdenk >Priority: Minor > > SPARK-3136 added a large number of functions for creating Java RandomRDDs, > but for people that want to use custom RandomDataGenerators we should make a > Java friendly method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.
[ https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10626: Assignee: (was: Apache Spark) > Create a Java friendly method for randomRDD & RandomDataGenerator on > RandomRDDs. > > > Key: SPARK-10626 > URL: https://issues.apache.org/jira/browse/SPARK-10626 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: holdenk >Priority: Minor > > SPARK-3136 added a large number of functions for creating Java RandomRDDs, > but for people that want to use custom RandomDataGenerators we should make a > Java friendly method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-10650: - Assignee: Michael Armbrust (was: Andrew Or) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
Alan Braithwaite created SPARK-10647: Summary: Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir Key: SPARK-10647 URL: https://issues.apache.org/jira/browse/SPARK-10647 Project: Spark Issue Type: New Feature Reporter: Alan Braithwaite Priority: Minor This property doesn't match up with the other properties surrounding it, namely: spark.mesos.deploy.zookeeper.url and spark.mesos.deploy.recoveryMode Since it's also a property specific to mesos, it makes sense to be under that hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators
[ https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-6086. --- Resolution: Cannot Reproduce Resolving as "cannot reproduce" for now, pending updates. > Exceptions in DAGScheduler.updateAccumulators > - > > Key: SPARK-6086 > URL: https://issues.apache.org/jira/browse/SPARK-6086 > Project: Spark > Issue Type: Bug > Components: Scheduler, Spark Core, SQL >Affects Versions: 1.3.0 >Reporter: Kai Zeng >Priority: Critical > > Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler > is collecting status from tasks. These exceptions happen occasionally, > especially when there are many stages in a job. > Application code: > https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala > Script used: ./bin/spark-submit --class > org.apache.spark.examples.sql.hive.SQLSuite > examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar > benchmark-cache 6 > There are two types of error messages: > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to > scala.collection.TraversableOnce > at > org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} > {code} > java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer > at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106) > at > org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263) > at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340) > at > org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335) > at > scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98) > at > scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226) > at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39) > at scala.collection.mutable.HashMap.foreach(HashMap.scala:98) > at > scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771) > at org.apache.spark.Accumulators$.add(Accumulators.scala:335) > at > org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892) > at > org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398) > at > org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362) > at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48) > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Affects Version/s: 1.5.0 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10650) Spark docs include test and other extra classes
Patrick Wendell created SPARK-10650: --- Summary: Spark docs include test and other extra classes Key: SPARK-10650 URL: https://issues.apache.org/jira/browse/SPARK-10650 Project: Spark Issue Type: Bug Components: Documentation Reporter: Patrick Wendell Assignee: Andrew Or -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working
[ https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791150#comment-14791150 ] Alex Rovner commented on SPARK-3978: [~barge.nilesh] What version of Spark have you tested with? > Schema change on Spark-Hive (Parquet file format) table not working > --- > > Key: SPARK-3978 > URL: https://issues.apache.org/jira/browse/SPARK-3978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Nilesh Barge >Assignee: Alex Rovner > Fix For: 1.5.0 > > > On following releases: > Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , > Apache HDFS 2.2 > Spark job is able to create/add/read data in hive, parquet formatted, tables > using HiveContext. > But, after changing schema, spark job is not able to read data and throws > following exception: > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > code snippet in short: > hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name > String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); > hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM > temp_table_people1"); > hiveContext.sql("SELECT * FROM people_table"); //Here, data read was > successful. > hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); > hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing > data and ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working
[ https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791141#comment-14791141 ] Nilesh Barge commented on SPARK-3978: - Thanks for resolving this, I also verified on my end and now it is working fine > Schema change on Spark-Hive (Parquet file format) table not working > --- > > Key: SPARK-3978 > URL: https://issues.apache.org/jira/browse/SPARK-3978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Nilesh Barge >Assignee: Alex Rovner > Fix For: 1.5.0 > > > On following releases: > Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , > Apache HDFS 2.2 > Spark job is able to create/add/read data in hive, parquet formatted, tables > using HiveContext. > But, after changing schema, spark job is not able to read data and throws > following exception: > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > code snippet in short: > hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name > String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); > hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM > temp_table_people1"); > hiveContext.sql("SELECT * FROM people_table"); //Here, data read was > successful. > hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); > hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing > data and ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10645) Bivariate Statistics for continuous vs. continuous
[ https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10645: -- Component/s: SQL ML > Bivariate Statistics for continuous vs. continuous > -- > > Key: SPARK-10645 > URL: https://issues.apache.org/jira/browse/SPARK-10645 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL >Reporter: Jihong MA > > this is an umbrella jira, which covers Bivariate Statistics for continuous > vs. continuous columns, including covariance, Pearson's correlation, > Spearman's correlation (for both continuous & categorical). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10623: Assignee: Zhan Zhang (was: Apache Spark) > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alan Braithwaite updated SPARK-10647: - Issue Type: Improvement (was: New Feature) > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement >Reporter: Alan Braithwaite >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10050) Support collecting data of MapType in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman updated SPARK-10050: -- Assignee: Sun Rui > Support collecting data of MapType in DataFrame > --- > > Key: SPARK-10050 > URL: https://issues.apache.org/jira/browse/SPARK-10050 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Sun Rui >Assignee: Sun Rui > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
Tathagata Das created SPARK-10649: - Summary: Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread Key: SPARK-10649 URL: https://issues.apache.org/jira/browse/SPARK-10649 Project: Spark Issue Type: Bug Components: Streaming Affects Versions: 1.5.0, 1.4.1, 1.3.1 Reporter: Tathagata Das Assignee: Tathagata Das The job group, job descriptions and scheduler pool information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working
[ https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791171#comment-14791171 ] Nilesh Barge commented on SPARK-3978: - I tested with the latest Spark 1.5 release... I got the source (http://www.apache.org/dyn/closer.lua/spark/spark-1.5.0/spark-1.5.0.tgz) and then build with "mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive -Phive-thriftserver -DskipTests clean package" command... and then ran my original tests... > Schema change on Spark-Hive (Parquet file format) table not working > --- > > Key: SPARK-3978 > URL: https://issues.apache.org/jira/browse/SPARK-3978 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.1.0 >Reporter: Nilesh Barge >Assignee: Alex Rovner > Fix For: 1.5.0 > > > On following releases: > Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , > Apache HDFS 2.2 > Spark job is able to create/add/read data in hive, parquet formatted, tables > using HiveContext. > But, after changing schema, spark job is not able to read data and throws > following exception: > java.lang.ArrayIndexOutOfBoundsException: 2 > at > org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284) > > at > org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278) > > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) > at scala.collection.Iterator$class.foreach(Iterator.scala:727) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) > at > scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) > at > scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) > at > scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) > at scala.collection.AbstractIterator.to(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) > at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) > at > scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) > at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at > org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121) > > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) > at org.apache.spark.scheduler.Task.run(Task.scala:54) > at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > > at java.lang.Thread.run(Thread.java:744) > code snippet in short: > hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name > String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' > STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' > OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); > hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM > temp_table_people1"); > hiveContext.sql("SELECT * FROM people_table"); //Here, data read was > successful. > hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); > hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing > data and ArrayIndexOutOfBoundsException is thrown. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-9794) ISO DateTime parser is too strict
[ https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-9794: --- Assignee: Kevin Cox > ISO DateTime parser is too strict > - > > Key: SPARK-9794 > URL: https://issues.apache.org/jira/browse/SPARK-9794 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0 >Reporter: Alex Angelini >Assignee: Kevin Cox > Fix For: 1.6.0 > > > The DateTime parser requires 3 millisecond digits, but that is not part of > the official ISO8601 spec. > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132 > https://en.wikipedia.org/wiki/ISO_8601 > This results in the following exception when trying to parse datetime columns > {code} > java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00" > {code} > [~joshrosen] [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-9794) ISO DateTime parser is too strict
[ https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-9794. Resolution: Fixed Fix Version/s: 1.6.0 > ISO DateTime parser is too strict > - > > Key: SPARK-9794 > URL: https://issues.apache.org/jira/browse/SPARK-9794 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0 >Reporter: Alex Angelini >Assignee: Kevin Cox > Fix For: 1.6.0 > > > The DateTime parser requires 3 millisecond digits, but that is not part of > the official ISO8601 spec. > https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132 > https://en.wikipedia.org/wiki/ISO_8601 > This results in the following exception when trying to parse datetime columns > {code} > java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00" > {code} > [~joshrosen] [~rxin] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-10050) Support collecting data of MapType in DataFrame
[ https://issues.apache.org/jira/browse/SPARK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shivaram Venkataraman resolved SPARK-10050. --- Resolution: Fixed Fix Version/s: 1.6.0 Issue resolved by pull request 8711 [https://github.com/apache/spark/pull/8711] > Support collecting data of MapType in DataFrame > --- > > Key: SPARK-10050 > URL: https://issues.apache.org/jira/browse/SPARK-10050 > Project: Spark > Issue Type: Sub-task > Components: SparkR >Reporter: Sun Rui > Fix For: 1.6.0 > > -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-10650: Target Version/s: 1.5.1 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Andrew Or >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6513) Add zipWithUniqueId (and other RDD APIs) to RDDApi
[ https://issues.apache.org/jira/browse/SPARK-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-6513. --- Resolution: Won't Fix > Add zipWithUniqueId (and other RDD APIs) to RDDApi > -- > > Key: SPARK-6513 > URL: https://issues.apache.org/jira/browse/SPARK-6513 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.3.0 > Environment: Windows 7 64bit, Scala 2.11.6, JDK 1.7.0_21 (though I > don't think it's relevant) >Reporter: Eran Medan >Priority: Minor > > It will be nice if we could treat a Dataframe just like an RDD (wherever it > makes sense) > *Worked in 1.2.1* > {code} > val sqlContext = new HiveContext(sc) > import sqlContext._ > val jsonRDD = sqlContext.jsonFile(jsonFilePath) > jsonRDD.registerTempTable("jsonTable") > val jsonResult = sql(s"select * from jsonTable") > val foo = jsonResult.zipWithUniqueId().map { >case (Row(...), uniqueId) => // do something useful >... > } > foo.registerTempTable("...") > {code} > *Stopped working in 1.3.0* > {code} > jsonResult.zipWithUniqueId() //since RDDApi doesn't implement that method > {code} > **Not working workaround:** > although this might give me an {{RDD\[Row\]}}: > {code} > jsonResult.rdd.zipWithUniqueId() > {code} > Now this won't work obviously since {{RDD\[Row\]}} does not have a > {{registerTempTable}} method of course > {code} > foo.registerTempTable("...") > {code} > (see related SO question: > http://stackoverflow.com/questions/29243186/is-this-a-regression-bug-in-spark-1-3) > EDIT: changed from issue to enhancement request -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-7841) Spark build should not use lib_managed for dependencies
[ https://issues.apache.org/jira/browse/SPARK-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791063#comment-14791063 ] Josh Rosen commented on SPARK-7841: --- I agree that we can probably fix this, but note that we'll have to do something about how lib_managed is used in the dev/mima script. > Spark build should not use lib_managed for dependencies > --- > > Key: SPARK-7841 > URL: https://issues.apache.org/jira/browse/SPARK-7841 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 1.3.1 >Reporter: Iulian Dragos > Labels: easyfix, sbt > > - unnecessary duplication (I will have those libraries under ./m2, via maven > anyway) > - every time I call make-distribution I lose lib_managed (via mvn clean > install) and have to wait to download again all jars next time I use sbt > - Eclipse does not handle relative paths very well (source attachments from > lib_managed don’t always work) > - it's not the default configuration. If we stray from defaults I think there > should be a clear advantage. > Digging through history, the only reference to `retrieveManaged := true` I > found was in f686e3d, from July 2011 ("Initial work on converting build to > SBT 0.10.1"). My guess this is purely an accident of porting the build form > Sbt 0.7.x and trying to keep the old project layout. > If there are reasons for keeping it, please comment (I didn't get any answers > on the [dev mailing > list|http://apache-spark-developers-list.1001551.n3.nabble.com/Why-use-quot-lib-managed-quot-for-the-Sbt-build-td12361.html]) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-6504) Cannot read Parquet files generated from different versions at once
[ https://issues.apache.org/jira/browse/SPARK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-6504. - Resolution: Fixed Fix Version/s: 1.3.1 This should be fixed. Please reopen if you are still having problems. > Cannot read Parquet files generated from different versions at once > --- > > Key: SPARK-6504 > URL: https://issues.apache.org/jira/browse/SPARK-6504 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.2.1 >Reporter: Marius Soutier > Fix For: 1.3.1 > > > When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the > same time via > `sqlContext.parquetFile("fileFrom1.1.parqut,fileFrom1.2.parquet")` an > exception occurs: > could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has > conflicting values: > [{"type":"struct","fields":[{"name":"date","type":"string","nullable":true,"metadata":{}},{"name":"account","type":"string","nullable":true,"metadata":{}},{"name":"impressions","type":"long","nullable":false,"metadata":{}},{"name":"cost","type":"double","nullable":false,"metadata":{}},{"name":"clicks","type":"long","nullable":false,"metadata":{}},{"name":"conversions","type":"long","nullable":false,"metadata":{}},{"name":"orderValue","type":"double","nullable":false,"metadata":{}}]}, > StructType(List(StructField(date,StringType,true), > StructField(account,StringType,true), > StructField(impressions,LongType,false), StructField(cost,DoubleType,false), > StructField(clicks,LongType,false), StructField(conversions,LongType,false), > StructField(orderValue,DoubleType,false)))] > The Schema is exactly equal. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
[ https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10648: Assignee: Apache Spark > Spark-SQL JDBC fails to set a default precision and scale when they are not > defined in an oracle schema. > > > Key: SPARK-10648 > URL: https://issues.apache.org/jira/browse/SPARK-10648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: using oracle 11g, ojdbc7.jar >Reporter: Travis Hegner >Assignee: Apache Spark > > Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a > scala app, I am getting an exception "Overflowed precision". Some times I > would get the exception "Unscaled value too large for precision". > This issue likely affects older versions as well, but this was the version I > verified it on. > I narrowed it down to the fact that the schema detection system was trying to > set the precision to 0, and the scale to -127. > I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
[ https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791081#comment-14791081 ] Apache Spark commented on SPARK-10648: -- User 'travishegner' has created a pull request for this issue: https://github.com/apache/spark/pull/8780 > Spark-SQL JDBC fails to set a default precision and scale when they are not > defined in an oracle schema. > > > Key: SPARK-10648 > URL: https://issues.apache.org/jira/browse/SPARK-10648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: using oracle 11g, ojdbc7.jar >Reporter: Travis Hegner > > Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a > scala app, I am getting an exception "Overflowed precision". Some times I > would get the exception "Unscaled value too large for precision". > This issue likely affects older versions as well, but this was the version I > verified it on. > I narrowed it down to the fact that the schema detection system was trying to > set the precision to 0, and the scale to -127. > I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
[ https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10648: Assignee: (was: Apache Spark) > Spark-SQL JDBC fails to set a default precision and scale when they are not > defined in an oracle schema. > > > Key: SPARK-10648 > URL: https://issues.apache.org/jira/browse/SPARK-10648 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 > Environment: using oracle 11g, ojdbc7.jar >Reporter: Travis Hegner > > Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a > scala app, I am getting an exception "Overflowed precision". Some times I > would get the exception "Unscaled value too large for precision". > This issue likely affects older versions as well, but this was the version I > verified it on. > I narrowed it down to the fact that the schema detection system was trying to > set the precision to 0, and the scale to -127. > I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10649: Assignee: Tathagata Das (was: Apache Spark) > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > The job group, job descriptions and scheduler pool information is passed > through thread local properties, and get inherited by child threads. In case > of spark streaming, the streaming jobs inherit these properties from the > thread that called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791109#comment-14791109 ] Apache Spark commented on SPARK-10649: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/8781 > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > The job group, job descriptions and scheduler pool information is passed > through thread local properties, and get inherited by child threads. In case > of spark streaming, the streaming jobs inherit these properties from the > thread that called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical
[ https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jihong MA updated SPARK-10646: -- Component/s: SQL ML > Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. > categorical > > > Key: SPARK-10646 > URL: https://issues.apache.org/jira/browse/SPARK-10646 > Project: Spark > Issue Type: Sub-task > Components: ML, SQL >Reporter: Jihong MA > > Pearson's chi-squared goodness of fit test for observed against the expected > distribution. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.
[ https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10626: Assignee: Apache Spark > Create a Java friendly method for randomRDD & RandomDataGenerator on > RandomRDDs. > > > Key: SPARK-10626 > URL: https://issues.apache.org/jira/browse/SPARK-10626 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: holdenk >Assignee: Apache Spark >Priority: Minor > > SPARK-3136 added a large number of functions for creating Java RandomRDDs, > but for people that want to use custom RandomDataGenerators we should make a > Java friendly method. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.
Travis Hegner created SPARK-10648: - Summary: Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema. Key: SPARK-10648 URL: https://issues.apache.org/jira/browse/SPARK-10648 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Environment: using oracle 11g, ojdbc7.jar Reporter: Travis Hegner Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a scala app, I am getting an exception "Overflowed precision". Some times I would get the exception "Unscaled value too large for precision". This issue likely affects older versions as well, but this was the version I verified it on. I narrowed it down to the fact that the schema detection system was trying to set the precision to 0, and the scale to -127. I have a proposed pull request to follow. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10649: Assignee: Apache Spark (was: Tathagata Das) > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Apache Spark > > The job group, job descriptions and scheduler pool information is passed > through thread local properties, and get inherited by child threads. In case > of spark streaming, the streaming jobs inherit these properties from the > thread that called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791179#comment-14791179 ] Apache Spark commented on SPARK-10623: -- User 'zhzhan' has created a pull request for this issue: https://github.com/apache/spark/pull/8783 > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10623: Assignee: Apache Spark (was: Zhan Zhang) > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Apache Spark > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty
[ https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-10623: Target Version/s: 1.6.0, 1.5.1 > turning on predicate pushdown throws nonsuch element exception when RDD is > empty > - > > Key: SPARK-10623 > URL: https://issues.apache.org/jira/browse/SPARK-10623 > Project: Spark > Issue Type: Bug > Components: SQL >Reporter: Ram Sriharsha >Assignee: Zhan Zhang > > Turning on predicate pushdown for ORC datasources results in a > NoSuchElementException: > scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15") > df: org.apache.spark.sql.DataFrame = [name: string] > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true") > scala> df.explain > == Physical Plan == > java.util.NoSuchElementException > Disabling the pushdown makes things work again: > scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false") > scala> df.explain > == Physical Plan == > Project [name#6] > Filter (age#7 < 15) > Scan > OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Description: Saw many failures recently in master build. See attached CSV for a full list. Most of the error messages are: {code} Can't find 2 executors before 1 milliseconds elapsed {code} . was: Saw many failures recently in master build. See attached CSV for a full list. Most of the error messages are: Can't find 2 executors before 1 milliseconds elapsed . > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. > Most of the error messages are: > {code} > Can't find 2 executors before 1 milliseconds elapsed > {code} > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception
[ https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791354#comment-14791354 ] Apache Spark commented on SPARK-10381: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8790 > Infinite loop when OutputCommitCoordination is enabled and > OutputCommitter.commitTask throws exception > -- > > Key: SPARK-10381 > URL: https://issues.apache.org/jira/browse/SPARK-10381 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.6.0, 1.5.1 > > > When speculative execution is enabled, consider a scenario where the > authorized committer of a particular output partition fails during the > OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator > is supposed to release that committer's exclusive lock on committing once > that task fails. However, due to a unit mismatch the lock will not be > released, causing Spark to go into an infinite retry loop. > This bug was masked by the fact that the OutputCommitCoordinator does not > have enough end-to-end tests (the current tests use many mocks). Other > factors contributing to this bug are the fact that we have many > similarly-named identifiers that have different semantics but the same data > types (e.g. attemptNumber and taskAttemptId, with inconsistent variable > naming which makes them difficult to distinguish). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791370#comment-14791370 ] Saisai Shao commented on SPARK-10644: - Does you jobs have dependencies? That is saying the 4th job relies on the first 3 jobs to be finished and get results. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10653) Remove unnecessary things from SparkEnv
[ https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791388#comment-14791388 ] Josh Rosen commented on SPARK-10653: Note that SparkEnv is technically a developer API, but all of its fields point to things which are non-developer-API. Thus I feel that there's not a compatibility concern here, but others might disagree. > Remove unnecessary things from SparkEnv > --- > > Key: SPARK-10653 > URL: https://issues.apache.org/jira/browse/SPARK-10653 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.0.0 >Reporter: Andrew Or > > As of the writing of this message, there are at least two things that can be > removed from it: > {code} > @DeveloperApi > class SparkEnv ( > val executorId: String, > private[spark] val rpcEnv: RpcEnv, > val serializer: Serializer, > val closureSerializer: Serializer, > val cacheManager: CacheManager, > val mapOutputTracker: MapOutputTracker, > val shuffleManager: ShuffleManager, > val broadcastManager: BroadcastManager, > val blockTransferService: BlockTransferService, // this one can go > val blockManager: BlockManager, > val securityManager: SecurityManager, > val httpFileServer: HttpFileServer, > val sparkFilesDir: String, // this one maybe? It's only used in 1 place. > val metricsSystem: MetricsSystem, > val shuffleMemoryManager: ShuffleMemoryManager, > val executorMemoryManager: ExecutorMemoryManager, // this can go > val outputCommitCoordinator: OutputCommitCoordinator, > val conf: SparkConf) extends Logging { > ... > } > {code} > We should avoid adding to this infinite list of things in SparkEnv's > constructors if they're not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Affects Version/s: 1.4.1 1.5.0 > Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be > documented > --- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Affects Versions: 1.4.1, 1.5.0 >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
[ https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791438#comment-14791438 ] Thomas Graves commented on SPARK-10640: --- yes 1.5 history server reading 1.5.0 logs. I'm not as worried about forward compatibility but it would be nice if we handled and put blank or unknown for values like this so it will at least be viewable. > Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied > -- > > Key: SPARK-10640 > URL: https://issues.apache.org/jira/browse/SPARK-10640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I'm seeing an exception from the spark history server trying to read a > history file: > scala.MatchError: TaskCommitDenied (of class java.lang.String) > at > org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775) > at > org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531) > at > org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791434#comment-14791434 ] Balagopal Nair edited comment on SPARK-10644 at 9/17/15 1:51 AM: - No. These are independent jobs running under different SparkContexts. Sorry about not being clear enough before... I'm trying share the same cluster between varrious applications. This issue is related to scheduling across applications and not within the same application. was (Author: nbalagopal): No. These are independent jobs running under different SparkContexts > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791252#comment-14791252 ] Peng Cheng commented on SPARK-10625: A pull request has been send that contains 2 extra unit tests and a simple fix: https://github.com/apache/spark/pull/8785 Can you help me validating it and merge in 1.5.1? > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Labels: flaky-test (was: ) > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Labels: flaky-test > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. > Most of the error messages are: > {code} > Can't find 2 executors before 1 milliseconds elapsed > {code} > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread
[ https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-10649: -- Description: The job group, and job descriptions information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. was: The job group, job descriptions and scheduler pool information is passed through thread local properties, and get inherited by child threads. In case of spark streaming, the streaming jobs inherit these properties from the thread that called streamingContext.start(). This may not make sense. 1. Job group: This is mainly used for cancelling a group of jobs together. It does not make sense to cancel streaming jobs like this, as the effect will be unpredictable. And its not a valid usecase any way, to cancel a streaming context, call streamingContext.stop() 2. Job description: This is used to pass on nice text descriptions for jobs to show up in the UI. The job description of the thread that calls streamingContext.start() is not useful for all the streaming jobs, as it does not make sense for all of the streaming jobs to have the same description, and the description may or may not be related to streaming. > Streaming jobs unexpectedly inherits job group, job descriptions from context > starting thread > - > > Key: SPARK-10649 > URL: https://issues.apache.org/jira/browse/SPARK-10649 > Project: Spark > Issue Type: Bug > Components: Streaming >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > The job group, and job descriptions information is passed through thread > local properties, and get inherited by child threads. In case of spark > streaming, the streaming jobs inherit these properties from the thread that > called streamingContext.start(). This may not make sense. > 1. Job group: This is mainly used for cancelling a group of jobs together. It > does not make sense to cancel streaming jobs like this, as the effect will be > unpredictable. And its not a valid usecase any way, to cancel a streaming > context, call streamingContext.stop() > 2. Job description: This is used to pass on nice text descriptions for jobs > to show up in the UI. The job description of the thread that calls > streamingContext.start() is not useful for all the streaming jobs, as it does > not make sense for all of the streaming jobs to have the same description, > and the description may or may not be related to streaming. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception
[ https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791347#comment-14791347 ] Apache Spark commented on SPARK-10381: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8789 > Infinite loop when OutputCommitCoordination is enabled and > OutputCommitter.commitTask throws exception > -- > > Key: SPARK-10381 > URL: https://issues.apache.org/jira/browse/SPARK-10381 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.3.1, 1.4.1, 1.5.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Critical > Fix For: 1.6.0, 1.5.1 > > > When speculative execution is enabled, consider a scenario where the > authorized committer of a particular output partition fails during the > OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator > is supposed to release that committer's exclusive lock on committing once > that task fails. However, due to a unit mismatch the lock will not be > released, causing Spark to go into an infinite retry loop. > This bug was masked by the fact that the OutputCommitCoordinator does not > have enough end-to-end tests (the current tests use many mocks). Other > factors contributing to this bug are the fact that we have many > similarly-named identifiers that have different semantics but the same data > types (e.g. attemptNumber and taskAttemptId, with inconsistent variable > naming which makes them difficult to distinguish). -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10653) Remove unnecessary things from SparkEnv
Andrew Or created SPARK-10653: - Summary: Remove unnecessary things from SparkEnv Key: SPARK-10653 URL: https://issues.apache.org/jira/browse/SPARK-10653 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 1.0.0 Reporter: Andrew Or As of the writing of this message, there are at least two things that can be removed from it: {code} @DeveloperApi class SparkEnv ( val executorId: String, private[spark] val rpcEnv: RpcEnv, val serializer: Serializer, val closureSerializer: Serializer, val cacheManager: CacheManager, val mapOutputTracker: MapOutputTracker, val shuffleManager: ShuffleManager, val broadcastManager: BroadcastManager, val blockTransferService: BlockTransferService, // this one can go val blockManager: BlockManager, val securityManager: SecurityManager, val httpFileServer: HttpFileServer, val sparkFilesDir: String, // this one maybe? It's only used in 1 place. val metricsSystem: MetricsSystem, val shuffleMemoryManager: ShuffleMemoryManager, val executorMemoryManager: ExecutorMemoryManager, // this can go val outputCommitCoordinator: OutputCommitCoordinator, val conf: SparkConf) extends Logging { ... } {code} We should avoid adding to this infinite list of things in SparkEnv's constructors if they're not needed. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs
[ https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10652: Assignee: Tathagata Das (was: Apache Spark) > Set good job descriptions for streaming related jobs > > > Key: SPARK-10652 > URL: https://issues.apache.org/jira/browse/SPARK-10652 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > Job descriptions will help distinguish jobs of one batch from the other in > the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs
[ https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10652: Assignee: Apache Spark (was: Tathagata Das) > Set good job descriptions for streaming related jobs > > > Key: SPARK-10652 > URL: https://issues.apache.org/jira/browse/SPARK-10652 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Apache Spark > > Job descriptions will help distinguish jobs of one batch from the other in > the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10652) Set good job descriptions for streaming related jobs
[ https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791357#comment-14791357 ] Apache Spark commented on SPARK-10652: -- User 'tdas' has created a pull request for this issue: https://github.com/apache/spark/pull/8791 > Set good job descriptions for streaming related jobs > > > Key: SPARK-10652 > URL: https://issues.apache.org/jira/browse/SPARK-10652 > Project: Spark > Issue Type: Improvement > Components: Streaming >Affects Versions: 1.4.1, 1.5.0 >Reporter: Tathagata Das >Assignee: Tathagata Das > > Job descriptions will help distinguish jobs of one batch from the other in > the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10635) pyspark - running on a different host
[ https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791389#comment-14791389 ] Josh Rosen commented on SPARK-10635: [~davies], do you think we should support this? This seems like a hard-to-support feature, so I'm inclined to say that this issue is "Won't Fix" as currently described. > pyspark - running on a different host > - > > Key: SPARK-10635 > URL: https://issues.apache.org/jira/browse/SPARK-10635 > Project: Spark > Issue Type: Improvement >Reporter: Ben Duffield > > At various points we assume we only ever talk to a driver on the same host. > e.g. > https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L615 > We use pyspark to connect to an existing driver (i.e. do not let pyspark > launch the driver itself, but instead construct the SparkContext with the > gateway and jsc arguments. > There are a few reasons for this, but essentially it's to allow more > flexibility when running in AWS. > Before 1.3.1 we were able to monkeypatch around this: > {code} > def _load_from_socket(port, serializer): > sock = socket.socket() > sock.settimeout(3) > try: > sock.connect((host, port)) > rf = sock.makefile("rb", 65536) > for item in serializer.load_stream(rf): > yield item > finally: > sock.close() > pyspark.rdd._load_from_socket = _load_from_socket > {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791434#comment-14791434 ] Balagopal Nair commented on SPARK-10644: No. These are independent jobs running under different SparkContexts > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Attachment: BroadcastSuiteFailures.csv > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite
[ https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10651: -- Description: Saw many failures recently in master build. See attached CSV for a full list. Most of the error messages are: Can't find 2 executors before 1 milliseconds elapsed . was: Saw many failures recently in master build. See attached CSV for a full list. > Flaky test: BroadcastSuite > -- > > Key: SPARK-10651 > URL: https://issues.apache.org/jira/browse/SPARK-10651 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Affects Versions: 1.6.0 >Reporter: Xiangrui Meng >Assignee: Shixiong Zhu >Priority: Blocker > Attachments: BroadcastSuiteFailures.csv > > > Saw many failures recently in master build. See attached CSV for a full list. > Most of the error messages are: Can't find 2 executors before 1 > milliseconds elapsed > . -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10651) Flaky test: BroadcastSuite
Xiangrui Meng created SPARK-10651: - Summary: Flaky test: BroadcastSuite Key: SPARK-10651 URL: https://issues.apache.org/jira/browse/SPARK-10651 Project: Spark Issue Type: Bug Components: Spark Core, Tests Affects Versions: 1.6.0 Reporter: Xiangrui Meng Assignee: Shixiong Zhu Priority: Blocker Saw many failures recently in master build. See attached CSV for a full list. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10058: -- Component/s: Tests > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Test > Components: Spark Core, Tests >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at >
[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type
[ https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10639: Assignee: Apache Spark > Need to convert UDAF's result from scala to sql type > > > Key: SPARK-10639 > URL: https://issues.apache.org/jira/browse/SPARK-10639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Assignee: Apache Spark >Priority: Blocker > > We are missing a conversion at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10639) Need to convert UDAF's result from scala to sql type
[ https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791302#comment-14791302 ] Apache Spark commented on SPARK-10639: -- User 'yhuai' has created a pull request for this issue: https://github.com/apache/spark/pull/8788 > Need to convert UDAF's result from scala to sql type > > > Key: SPARK-10639 > URL: https://issues.apache.org/jira/browse/SPARK-10639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Priority: Blocker > > We are missing a conversion at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type
[ https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10639: Assignee: (was: Apache Spark) > Need to convert UDAF's result from scala to sql type > > > Key: SPARK-10639 > URL: https://issues.apache.org/jira/browse/SPARK-10639 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.5.0 >Reporter: Yin Huai >Priority: Blocker > > We are missing a conversion at > https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
[ https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10640: --- Affects Version/s: 1.3.0 1.4.0 Target Version/s: 1.5.1 Priority: Critical (was: Major) > Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied > -- > > Key: SPARK-10640 > URL: https://issues.apache.org/jira/browse/SPARK-10640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves >Priority: Critical > > I'm seeing an exception from the spark history server trying to read a > history file: > scala.MatchError: TaskCommitDenied (of class java.lang.String) > at > org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775) > at > org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531) > at > org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
[ https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791416#comment-14791416 ] Josh Rosen commented on SPARK-10640: This is a 1.5.0 history server reading 1.5.0 logs? In principle we also have this bug when trying to read 1.5.x logs with a 1.4.x history server. I'm going to mark this as a 1.5.1 critical bug to make sure it gets fixed there. This probably affects 1.3.x and 1.4.x, too, so I'm going to update the affected versions. > Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied > -- > > Key: SPARK-10640 > URL: https://issues.apache.org/jira/browse/SPARK-10640 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.3.0, 1.4.0, 1.5.0 >Reporter: Thomas Graves >Assignee: Thomas Graves > > I'm seeing an exception from the spark history server trying to read a > history file: > scala.MatchError: TaskCommitDenied (of class java.lang.String) > at > org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775) > at > org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531) > at > org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488) > at > org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251) > at > scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) > at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) > at > scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251) > at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105) > at > org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289) > at > org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) > at java.lang.Thread.run(Thread.java:745) -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
Suresh Thalamati created SPARK-10655: Summary: Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT Key: SPARK-10655 URL: https://issues.apache.org/jira/browse/SPARK-10655 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 1.5.0 Reporter: Suresh Thalamati Default type mapping does not work when reading from DB2 table that contains XML, DECFLOAT for READ , and DECIMAL type for write. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
[ https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791440#comment-14791440 ] Suresh Thalamati commented on SPARK-10655: -- I am working on pull request for this issue. > Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT > - > > Key: SPARK-10655 > URL: https://issues.apache.org/jira/browse/SPARK-10655 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 1.5.0 >Reporter: Suresh Thalamati > > Default type mapping does not work when reading from DB2 table that contains > XML, DECFLOAT for READ , and DECIMAL type for write. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791439#comment-14791439 ] Saisai Shao commented on SPARK-10644: - So what's the cluster manager you use, standalone, mesos or Yarn? There shouldn't have such problem is resource is enough as far as I know. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10058: -- Priority: Blocker (was: Critical) > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Test > Components: Spark Core >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at
[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10650: Assignee: Apache Spark (was: Michael Armbrust) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Apache Spark >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791286#comment-14791286 ] Xiangrui Meng commented on SPARK-10058: --- Changed the priority to Blocker since this failed master builds frequently. > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Test > Components: Spark Core >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at >
[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10650: Assignee: Michael Armbrust (was: Apache Spark) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791285#comment-14791285 ] Apache Spark commented on SPARK-10650: -- User 'marmbrus' has created a pull request for this issue: https://github.com/apache/spark/pull/8787 > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat
[ https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-10058: -- Issue Type: Bug (was: Test) > Flaky test: HeartbeatReceiverSuite: normal heartbeat > > > Key: SPARK-10058 > URL: https://issues.apache.org/jira/browse/SPARK-10058 > Project: Spark > Issue Type: Bug > Components: Spark Core, Tests >Reporter: Davies Liu >Assignee: Andrew Or >Priority: Blocker > Labels: flaky-test > > https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/ > {code} > Error Message > 3 did not equal 2 > Stacktrace > sbt.ForkMain$ForkError: 3 did not equal 2 > at > org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500) > at > org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555) > at > org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97) > at > org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22) > at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85) > at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > at org.scalatest.Transformer.apply(Transformer.scala:22) > at org.scalatest.Transformer.apply(Transformer.scala:20) > at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166) > at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42) > at > org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at > org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175) > at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255) > at > org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413) > at > org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401) > at scala.collection.immutable.List.foreach(List.scala:318) > at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401) > at > org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396) > at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483) > at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208) > at org.scalatest.FunSuite.runTests(FunSuite.scala:1555) > at org.scalatest.Suite$class.run(Suite.scala:1424) > at > org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at > org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212) > at org.scalatest.SuperEngine.runImpl(Engine.scala:545) > at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212) > at > org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257) > at > org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256) > at > org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671) > at sbt.ForkMain$Run$2.call(ForkMain.java:294) > at sbt.ForkMain$Run$2.call(ForkMain.java:284) > at java.util.concurrent.FutureTask.run(FutureTask.java:262) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) > at
[jira] [Commented] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791425#comment-14791425 ] Apache Spark commented on SPARK-10654: -- User 'rezazadeh' has created a pull request for this issue: https://github.com/apache/spark/pull/8792 > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10654: Assignee: Apache Spark > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh >Assignee: Apache Spark > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available
[ https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791443#comment-14791443 ] Balagopal Nair commented on SPARK-10644: Standalone cluster manager. I've verified this behaviour again now. > Applications wait even if free executors are available > -- > > Key: SPARK-10644 > URL: https://issues.apache.org/jira/browse/SPARK-10644 > Project: Spark > Issue Type: Bug > Components: Scheduler >Affects Versions: 1.5.0 > Environment: RHEL 6.5 64 bit >Reporter: Balagopal Nair > > Number of workers: 21 > Number of executors: 63 > Steps to reproduce: > 1. Run 4 jobs each with max cores set to 10 > 2. The first 3 jobs run with 10 each. (30 executors consumed so far) > 3. The 4 th job waits even though there are 33 idle executors. > The reason is that a job will not get executors unless > the total number of EXECUTORS in use < the number of WORKERS > If there are executors available, resources should be allocated to the > pending job. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Issue Type: Bug (was: Improvement) > Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be > documented > --- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Bug > Components: Mesos >Affects Versions: 1.4.1, 1.5.0 >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Summary: Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented (was: Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir) > Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be > documented > --- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Component/s: Mesos > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791387#comment-14791387 ] Josh Rosen commented on SPARK-10647: The spark.deploy.zookeeper.* properties are used by the standalone mode's HA recovery features (https://spark.apache.org/docs/latest/spark-standalone.html#high-availability). I think the correct fix here is to update the Mesos code to use spark.deploy.mesos.zookeeper.dir (https://github.com/apache/spark/pull/5144/files#diff-3c5e5516915ada1d89f1259de069R97). We should also update the Mesos documentation to mention these configurations, since they don't appear to be documented anywhere. [~tnachen], I'm going to assign the doc updates and bugfixes to you. > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
[ https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-10647: --- Assignee: Timothy Chen > Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir > -- > > Key: SPARK-10647 > URL: https://issues.apache.org/jira/browse/SPARK-10647 > Project: Spark > Issue Type: Improvement > Components: Mesos >Reporter: Alan Braithwaite >Assignee: Timothy Chen >Priority: Minor > > This property doesn't match up with the other properties surrounding it, > namely: > spark.mesos.deploy.zookeeper.url > and > spark.mesos.deploy.recoveryMode > Since it's also a property specific to mesos, it makes sense to be under that > hierarchy as well. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791420#comment-14791420 ] Apache Spark commented on SPARK-10625: -- User 'tribbloid' has created a pull request for this issue: https://github.com/apache/spark/pull/8785 > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10625: Assignee: (was: Apache Spark) > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
Reza Zadeh created SPARK-10654: -- Summary: Add columnSimilarities to IndexedRowMatrix Key: SPARK-10654 URL: https://issues.apache.org/jira/browse/SPARK-10654 Project: Spark Issue Type: Improvement Components: MLlib Reporter: Reza Zadeh Add columnSimilarities to IndexedRowMatrix. In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties
[ https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10625: Assignee: Apache Spark > Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds > unserializable objects into connection properties > -- > > Key: SPARK-10625 > URL: https://issues.apache.org/jira/browse/SPARK-10625 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 1.4.1, 1.5.0 > Environment: Ubuntu 14.04 >Reporter: Peng Cheng >Assignee: Apache Spark > Labels: jdbc, spark, sparksql > > Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by > adding new objects into the connection properties, which is then reused by > Spark to be deployed to workers. When some of these new objects are unable to > be serializable it will trigger an org.apache.spark.SparkException: Task not > serializable. The following test code snippet demonstrate this problem by > using a modified H2 driver: > test("INSERT to JDBC Datasource with UnserializableH2Driver") { > object UnserializableH2Driver extends org.h2.Driver { > override def connect(url: String, info: Properties): Connection = { > val result = super.connect(url, info) > info.put("unserializableDriver", this) > result > } > override def getParentLogger: Logger = ??? > } > import scala.collection.JavaConversions._ > val oldDrivers = > DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq > oldDrivers.foreach{ > DriverManager.deregisterDriver > } > DriverManager.registerDriver(UnserializableH2Driver) > sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE") > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count) > assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", > properties).collect()(0).length) > DriverManager.deregisterDriver(UnserializableH2Driver) > oldDrivers.foreach{ > DriverManager.registerDriver > } > } -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes
[ https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-10650: -- Target Version/s: 1.6.0, 1.5.1 (was: 1.5.1) > Spark docs include test and other extra classes > --- > > Key: SPARK-10650 > URL: https://issues.apache.org/jira/browse/SPARK-10650 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 1.5.0 >Reporter: Patrick Wendell >Assignee: Michael Armbrust >Priority: Critical > > In 1.5.0 there are some extra classes in the Spark docs - including a bunch > of test classes. We need to figure out what commit introduced those and fix > it. The obvious things like genJavadoc version have not changed. > http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ > [before] > http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ > [after] -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10652) Set good job descriptions for streaming related jobs
Tathagata Das created SPARK-10652: - Summary: Set good job descriptions for streaming related jobs Key: SPARK-10652 URL: https://issues.apache.org/jira/browse/SPARK-10652 Project: Spark Issue Type: Improvement Components: Streaming Affects Versions: 1.5.0, 1.4.1 Reporter: Tathagata Das Assignee: Tathagata Das Job descriptions will help distinguish jobs of one batch from the other in the Jobs and Stages pages in the Spark UI -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix
[ https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10654: Assignee: (was: Apache Spark) > Add columnSimilarities to IndexedRowMatrix > -- > > Key: SPARK-10654 > URL: https://issues.apache.org/jira/browse/SPARK-10654 > Project: Spark > Issue Type: Improvement > Components: MLlib >Reporter: Reza Zadeh > > Add columnSimilarities to IndexedRowMatrix. > In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by > SPARK-4823 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-10656) select(df(*)) fails when a column has special characters
Nick Pritchard created SPARK-10656: -- Summary: select(df(*)) fails when a column has special characters Key: SPARK-10656 URL: https://issues.apache.org/jira/browse/SPARK-10656 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 1.5.0 Reporter: Nick Pritchard Best explained with this example: {code} val df = sqlContext.read.json(sqlContext.sparkContext.makeRDD( """{"a.b": "c", "d": "e" }""" :: Nil)) df.select("*").show() //successful df.select(df("*")).show() //throws exception df.withColumnRenamed("d", "f").show() //also fails, possibly related {code} -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791549#comment-14791549 ] Apache Spark commented on SPARK-10657: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/8793 > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Josh Rosen > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SSH-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code
[ https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-10657: Assignee: Apache Spark (was: Josh Rosen) > Remove legacy SCP-based Jenkins log archiving code > -- > > Key: SPARK-10657 > URL: https://issues.apache.org/jira/browse/SPARK-10657 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Reporter: Josh Rosen >Assignee: Apache Spark > > As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to > use our custom SCP-based mechanism for archiving Jenkins logs on the master > machine; this has been superseded by the use of a Jenkins plugin which > archives the logs and provides public viewing of them. > We should remove the legacy log syncing code, since this is a blocker to > disabling Worker -> Master SSH on Jenkins. -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org