date:20150916

[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10646:
--
Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-10385

> Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. 
> categorical
> 
>
> Key: SPARK-10646
> URL: https://issues.apache.org/jira/browse/SPARK-10646
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Jihong MA
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical

2015-09-16 Thread Jihong MA (JIRA)

Jihong MA created SPARK-10646:
-

 Summary: Bivariate Statistics: Pearson's Chi-Squared Test for 
categorical vs. categorical
 Key: SPARK-10646
 URL: https://issues.apache.org/jira/browse/SPARK-10646
 Project: Spark
  Issue Type: New Feature
Reporter: Jihong MA






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10320) Kafka Support new topic subscriptions without requiring restart of the streaming context

2015-09-16 Thread Cody Koeninger (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790843#comment-14790843
 ] 

Cody Koeninger commented on SPARK-10320:


I don't think there's much benefit to multiple dstreams with the direct api, 
because it's straightforward to filter or match on the topic on a per-partition 
basis.  I'm not sure that adding entirely new dstreams after the streaming 
context has been started makes sense.

As far as defaults go... I don't see a clearly reasonable default like 
messageHandler has.  Maybe an example implementation of a function that 
maintains just a list of topic names and handles the offset lookups.

The other thing is, in order to get much use out of this, the api for 
communicating with the kafka cluster would need to be made public, and there 
had been some reluctance on that point previously.

[~tdas] Any thoughts on making the KafkaCluster api public?

> Kafka Support new topic subscriptions without requiring restart of the 
> streaming context
> 
>
> Key: SPARK-10320
> URL: https://issues.apache.org/jira/browse/SPARK-10320
> Project: Spark
>  Issue Type: New Feature
>  Components: Streaming
>Reporter: Sudarshan Kadambi
>
> Spark Streaming lacks the ability to subscribe to newer topics or unsubscribe 
> to current ones once the streaming context has been started. Restarting the 
> streaming context increases the latency of update handling.
> Consider a streaming application subscribed to n topics. Let's say 1 of the 
> topics is no longer needed in streaming analytics and hence should be 
> dropped. We could do this by stopping the streaming context, removing that 
> topic from the topic list and restarting the streaming context. Since with 
> some DStreams such as DirectKafkaStream, the per-partition offsets are 
> maintained by Spark, we should be able to resume uninterrupted (I think?) 
> from where we left off with a minor delay. However, in instances where 
> expensive state initialization (from an external datastore) may be needed for 
> datasets published to all topics, before streaming updates can be applied to 
> it, it is more convenient to only subscribe or unsubcribe to the incremental 
> changes to the topic list. Without such a feature, updates go unprocessed for 
> longer than they need to be, thus affecting QoS.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-1304) Job fails with spot instances (due to IllegalStateException: Shutdown in progress)

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-1304?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-1304.
---
Resolution: Won't Fix

> Job fails with spot instances (due to IllegalStateException: Shutdown in 
> progress)
> --
>
> Key: SPARK-1304
> URL: https://issues.apache.org/jira/browse/SPARK-1304
> Project: Spark
>  Issue Type: Bug
>  Components: EC2
>Affects Versions: 0.9.0
>Reporter: Alex Boisvert
>Priority: Minor
>
> We had a job running smoothly with spot instances until one of the spot 
> instances got terminated ... which led to a series of "IllegalStateException: 
> Shutdown in progress" and the job failed afterwards.
> 14/03/24 06:07:52 WARN scheduler.TaskSetManager: Loss was due to 
> java.lang.IllegalStateException
> java.lang.IllegalStateException: Shutdown in progress
>   at 
> java.lang.ApplicationShutdownHooks.add(ApplicationShutdownHooks.java:66)
>   at java.lang.Runtime.addShutdownHook(Runtime.java:211)
>   at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1441)
>   at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:256)
>   at org.apache.hadoop.fs.Path.getFileSystem(Path.java:187)
>   at 
> org.apache.hadoop.mapred.LineRecordReader.(LineRecordReader.java:77)
>   at 
> org.apache.hadoop.mapred.TextInputFormat.getRecordReader(TextInputFormat.java:51)
>   at org.apache.spark.rdd.HadoopRDD$$anon$1.(HadoopRDD.scala:156)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:149)
>   at org.apache.spark.rdd.HadoopRDD.compute(HadoopRDD.scala:64)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>   at 
> org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:90)
>   at 
> org.apache.spark.rdd.CoalescedRDD$$anonfun$compute$1.apply(CoalescedRDD.scala:89)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at org.apache.spark.Aggregator.combineValuesByKey(Aggregator.scala:57)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:95)
>   at 
> org.apache.spark.rdd.PairRDDFunctions$$anonfun$1.apply(PairRDDFunctions.scala:94)
>   at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
>   at org.apache.spark.rdd.RDD$$anonfun$3.apply(RDD.scala:471)
>   at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:34)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:241)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:232)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:161)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:102)
>   at org.apache.spark.scheduler.Task.run(Task.scala:53)
>   at 
> org.apache.spark.executor.Executor$TaskRunner$$anonfun$run$1.apply$mcV$sp(Executor.scala:213)
>   at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsUser(SparkHadoopUtil.scala:49)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:178)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:724)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-3603) InvalidClassException on a Linux VM - probably problem with serialization

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3603.
---
Resolution: Cannot Reproduce

Resolving as "Cannot Reproduce", since this is an old issue that hasn't 
received any updates since 1.1.0. Please re-open and update if this is still a 
problem.

> InvalidClassException on a Linux VM - probably problem with serialization
> -
>
> Key: SPARK-3603
> URL: https://issues.apache.org/jira/browse/SPARK-3603
> Project: Spark
>  Issue Type: Bug
>  Components: Deploy
>Affects Versions: 1.0.0, 1.1.0
> Environment: Linux version 2.6.32-358.32.3.el6.x86_64 
> (mockbu...@x86-029.build.eng.bos.redhat.com) (gcc version 4.4.7 20120313 (Red 
> Hat 4.4.7-3) (GCC) ) #1 SMP Fri Jan 17 08:42:31 EST 2014
> java version "1.7.0_25"
> OpenJDK Runtime Environment (rhel-2.3.10.4.el6_4-x86_64)
> OpenJDK 64-Bit Server VM (build 23.7-b01, mixed mode)
> Spark (either 1.0.0 or 1.1.0)
>Reporter: Tomasz Dudziak
>Priority: Critical
>  Labels: scala, serialization, spark
>
> I have a Scala app connecting to a standalone Spark cluster. It works fine on 
> Windows or on a Linux VM; however, when I try to run the app and the Spark 
> cluster on another Linux VM (the same Linux kernel, Java and Spark - tested 
> for versions 1.0.0 and 1.1.0) I get the below exception. This looks kind of 
> similar to the Big-Endian (IBM Power7) Spark Serialization issue 
> (SPARK-2018), but... my system is definitely little endian and I understand 
> the big endian issue should be already fixed in Spark 1.1.0 anyway. I'd 
> appreaciate your help.
> 01:34:53.251 WARN  [Result resolver thread-0][TaskSetManager] Lost TID 2 
> (task 1.0:2)
> 01:34:53.278 WARN  [Result resolver thread-0][TaskSetManager] Loss was due to 
> java.io.InvalidClassException
> java.io.InvalidClassException: scala.reflect.ClassTag$$anon$1; local class 
> incompatible: stream classdesc serialVersionUID = -4937928798201944954, local 
> class serialVersionUID = -8102093212602380348
> at java.io.ObjectStreamClass.initNonProxy(ObjectStreamClass.java:617)
> at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1620)
> at 
> java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1515)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1769)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at scala.collection.immutable.$colon$colon.readObject(List.scala:362)
> at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
> at java.lang.reflect.Method.invoke(Method.java:606)
> at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1891)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1989)
> at 
> java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1913)
> at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1796)
> at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1348)
> at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
> at

[jira] [Resolved] (SPARK-3949) Use IAMRole in lieu of static access key-id/secret

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-3949?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-3949.
---
Resolution: Incomplete

As of SPARK-8576, spark-ec2 can launch instances with IAM instance roles. I'm 
going to close this issue as "Incomplete" since it's underspecified; it would 
be helpful to know more specifically where you think we need support for IAM 
roles instead of keys (i.e. while launching the cluster? for configuring access 
to S3?).

> Use IAMRole in lieu of static access key-id/secret
> --
>
> Key: SPARK-3949
> URL: https://issues.apache.org/jira/browse/SPARK-3949
> Project: Spark
>  Issue Type: Improvement
>  Components: EC2
>Affects Versions: 1.1.0
>Reporter: Rangarajan Sreenivasan
>
> Spark currently supports AWS resource access through user-specific 
> key-id/secret. While this works, the AWS recommended way is to use IAM Roles 
> instead of specific key-id/secrets. 
> http://docs.aws.amazon.com/IAM/latest/UserGuide/IAMBestPractices.html#use-roles-with-ec2
> http://docs.aws.amazon.com/IAM/latest/UserGuide/IAM_Introduction.html



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10602) Univariate statistics as UDAFs: single-pass continuous stats

2015-09-16 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10602?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-10602:
--
Assignee: Seth Hendrickson

> Univariate statistics as UDAFs: single-pass continuous stats
> 
>
> Key: SPARK-10602
> URL: https://issues.apache.org/jira/browse/SPARK-10602
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Joseph K. Bradley
>Assignee: Seth Hendrickson
>
> See parent JIRA for more details.  This subtask covers statistics for 
> continuous values requiring a single pass over the data, such as min and max.
> This JIRA is an umbrella.  For individual stats, please create and link a new 
> JIRA.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10642) Crash in rdd.lookup() with "java.lang.Long cannot be cast to java.lang.Integer"

2015-09-16 Thread Thouis Jones (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14790884#comment-14790884
 ] 

Thouis Jones commented on SPARK-10642:
--

Simpler cases.

Fails:
{code}
sc.parallelize([(('a', 'b'), 'c')]).groupByKey().lookup(('a', 'b'))
{code}

Works:
{code}
sc.parallelize([(('a', 'b'), 'c')]).groupByKey().map(lambda x: x).lookup(('a', 
'b'))
{code}

> Crash in rdd.lookup() with "java.lang.Long cannot be cast to 
> java.lang.Integer"
> ---
>
> Key: SPARK-10642
> URL: https://issues.apache.org/jira/browse/SPARK-10642
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.5.0
> Environment: OSX
>Reporter: Thouis Jones
>
> Running this command:
> {code}
> sc.parallelize([(('a', 'b'), 
> 'c')]).groupByKey().partitionBy(20).cache().lookup(('a', 'b'))
> {code}
> gives the following error:
> {noformat}
> 15/09/16 14:22:23 INFO SparkContext: Starting job: runJob at 
> PythonRDD.scala:361
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/pyspark/rdd.py", 
> line 2199, in lookup
> return self.ctx.runJob(values, lambda x: x, [self.partitioner(key)])
>   File 
> "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/pyspark/context.py", 
> line 916, in runJob
> port = self._jvm.PythonRDD.runJob(self._jsc.sc(), mappedRDD._jrdd, 
> partitions)
>   File 
> "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/java_gateway.py",
>  line 538, in __call__
>   File 
> "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/pyspark/sql/utils.py", 
> line 36, in deco
> return f(*a, **kw)
>   File 
> "/usr/local/Cellar/apache-spark/1.5.0/libexec/python/lib/py4j-0.8.2.1-src.zip/py4j/protocol.py",
>  line 300, in get_return_value
> py4j.protocol.Py4JJavaError: An error occurred while calling 
> z:org.apache.spark.api.python.PythonRDD.runJob.
> : java.lang.ClassCastException: java.lang.Long cannot be cast to 
> java.lang.Integer
>   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>   at 
> org.apache.spark.scheduler.DAGScheduler$$anonfun$submitJob$1.apply(DAGScheduler.scala:530)
>   at scala.collection.Iterator$class.find(Iterator.scala:780)
>   at scala.collection.AbstractIterator.find(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.find(IterableLike.scala:79)
>   at scala.collection.AbstractIterable.find(Iterable.scala:54)
>   at 
> org.apache.spark.scheduler.DAGScheduler.submitJob(DAGScheduler.scala:530)
>   at 
> org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:558)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1813)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1826)
>   at org.apache.spark.SparkContext.runJob(SparkContext.scala:1839)
>   at org.apache.spark.api.python.PythonRDD$.runJob(PythonRDD.scala:361)
>   at org.apache.spark.api.python.PythonRDD.runJob(PythonRDD.scala)
>   at sun.reflect.GeneratedMethodAccessor49.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:231)
>   at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:379)
>   at py4j.Gateway.invoke(Gateway.java:259)
>   at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:133)
>   at py4j.commands.CallCommand.execute(CallCommand.java:79)
>   at py4j.GatewayConnection.run(GatewayConnection.java:207)
>   at java.lang.Thread.run(Thread.java:745)
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-4389) Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located behind NAT

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-4389?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-4389.
---
Resolution: Won't Fix

I'm going to resolve this as "Won't Fix":

- Akka 2.4 has not shipped yet, so this feature isn't supported in any released 
version.
- Akka 2.4 requires Java 8 and Scala 2.11 or 2.12, meaning that we can't use it 
in Spark as long as we need to continue to support Java 7 and Scala 2.10. We'll 
replace Akka RPC with a custom RPC layer long before we'll be able to drop 
support for these platforms.

> Set akka.remote.netty.tcp.bind-hostname="0.0.0.0" so driver can be located 
> behind NAT
> -
>
> Key: SPARK-4389
> URL: https://issues.apache.org/jira/browse/SPARK-4389
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> We should set {{akka.remote.netty.tcp.bind-hostname="0.0.0.0"}} in our Akka 
> configuration so that Spark drivers can be located behind NATs / work with 
> weird DNS setups.
> This is blocked by upgrading our Akka version, since this configuration is 
> not present Akka 2.3.4.  There might be a different approach / workaround 
> that works on our current Akka version, though.
> EDIT: this is blocked by Akka 2.4, since this feature is only available in 
> the 2.4 snapshot release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10645) Bivariate Statistics for continuous vs. continuous

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10645:
--
Issue Type: Sub-task  (was: New Feature)
Parent: SPARK-10385

> Bivariate Statistics for continuous vs. continuous
> --
>
> Key: SPARK-10645
> URL: https://issues.apache.org/jira/browse/SPARK-10645
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Jihong MA
>
> this is an umbrella jira, which covers Bivariate Statistics for continuous 
> vs. continuous columns, including covariance, Pearson's correlation, 
> Spearman's correlation (for both continuous & categorical).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10646:
--
Description: Pearson's chi-squared goodness of fit test for observed 
against the expected distribution.

> Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. 
> categorical
> 
>
> Key: SPARK-10646
> URL: https://issues.apache.org/jira/browse/SPARK-10646
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Jihong MA
>
> Pearson's chi-squared goodness of fit test for observed against the expected 
> distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Description: 
In 1.5.0 there are some extra classes in the Spark docs - including a bunch of 
test classes. We need to figure out what commit introduced those and fix it. 
The obvious things like genJavadoc version have not changed.

http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ [before]
http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ [after]


> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Priority: Critical  (was: Major)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791128#comment-14791128
 ] 

Apache Spark commented on SPARK-10626:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/8782

> Create a Java friendly method for randomRDD & RandomDataGenerator on 
> RandomRDDs.
> 
>
> Key: SPARK-10626
> URL: https://issues.apache.org/jira/browse/SPARK-10626
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> SPARK-3136 added a large number of functions for creating Java RandomRDDs, 
> but for people that want to use custom RandomDataGenerators we should make a 
> Java friendly method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10626:


Assignee: (was: Apache Spark)

> Create a Java friendly method for randomRDD & RandomDataGenerator on 
> RandomRDDs.
> 
>
> Key: SPARK-10626
> URL: https://issues.apache.org/jira/browse/SPARK-10626
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Priority: Minor
>
> SPARK-3136 added a large number of functions for creating Java RandomRDDs, 
> but for people that want to use custom RandomDataGenerators we should make a 
> Java friendly method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust updated SPARK-10650:
-
Assignee: Michael Armbrust  (was: Andrew Or)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Alan Braithwaite (JIRA)

Alan Braithwaite created SPARK-10647:


 Summary: Rename property spark.deploy.zookeeper.dir to 
spark.mesos.deploy.zookeeper.dir
 Key: SPARK-10647
 URL: https://issues.apache.org/jira/browse/SPARK-10647
 Project: Spark
  Issue Type: New Feature
Reporter: Alan Braithwaite
Priority: Minor


This property doesn't match up with the other properties surrounding it, namely:

spark.mesos.deploy.zookeeper.url
and
spark.mesos.deploy.recoveryMode

Since it's also a property specific to mesos, it makes sense to be under that 
hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6086) Exceptions in DAGScheduler.updateAccumulators

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6086.
---
Resolution: Cannot Reproduce

Resolving as "cannot reproduce" for now, pending updates.

> Exceptions in DAGScheduler.updateAccumulators
> -
>
> Key: SPARK-6086
> URL: https://issues.apache.org/jira/browse/SPARK-6086
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler, Spark Core, SQL
>Affects Versions: 1.3.0
>Reporter: Kai Zeng
>Priority: Critical
>
> Class Cast Exceptions in DAGScheduler.updateAccumulators, when DAGScheduler 
> is collecting status from tasks. These exceptions happen occasionally, 
> especially when there are many stages in a job.
> Application code: 
> https://github.com/kai-zeng/spark/blob/accum-bug/examples/src/main/scala/org/apache/spark/examples/sql/hive/SQLSuite.scala
> Script used: ./bin/spark-submit --class 
> org.apache.spark.examples.sql.hive.SQLSuite 
> examples/target/scala-2.10/spark-examples-1.3.0-SNAPSHOT-hadoop1.0.4.jar 
> benchmark-cache 6
> There are two types of error messages:
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to 
> scala.collection.TraversableOnce
>   at 
> org.apache.spark.GrowableAccumulableParam.addInPlace(Accumulators.scala:188)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}
> {code}
> java.lang.ClassCastException: scala.None$ cannot be cast to java.lang.Integer
>   at scala.runtime.BoxesRunTime.unboxToInt(BoxesRunTime.java:106)
>   at 
> org.apache.spark.AccumulatorParam$IntAccumulatorParam$.addInPlace(Accumulators.scala:263)
>   at org.apache.spark.Accumulable.$plus$plus$eq(Accumulators.scala:82)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:340)
>   at 
> org.apache.spark.Accumulators$$anonfun$add$2.apply(Accumulators.scala:335)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.Accumulators$.add(Accumulators.scala:335)
>   at 
> org.apache.spark.scheduler.DAGScheduler.updateAccumulators(DAGScheduler.scala:892)
>   at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskCompletion(DAGScheduler.scala:1000)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1398)
>   at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:1362)
>   at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:48)
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Affects Version/s: 1.5.0

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)

Patrick Wendell created SPARK-10650:
---

 Summary: Spark docs include test and other extra classes
 Key: SPARK-10650
 URL: https://issues.apache.org/jira/browse/SPARK-10650
 Project: Spark
  Issue Type: Bug
  Components: Documentation
Reporter: Patrick Wendell
Assignee: Andrew Or






--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Alex Rovner (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791150#comment-14791150
 ] 

Alex Rovner commented on SPARK-3978:


[~barge.nilesh] What version of Spark have you tested with?

> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Nilesh Barge (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791141#comment-14791141
 ] 

Nilesh Barge commented on SPARK-3978:
-

Thanks for resolving this, I also verified on my end and now it is working 
fine


> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10645) Bivariate Statistics for continuous vs. continuous

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10645:
--
Component/s: SQL
 ML

> Bivariate Statistics for continuous vs. continuous
> --
>
> Key: SPARK-10645
> URL: https://issues.apache.org/jira/browse/SPARK-10645
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Jihong MA
>
> this is an umbrella jira, which covers Bivariate Statistics for continuous 
> vs. continuous columns, including covariance, Pearson's correlation, 
> Spearman's correlation (for both continuous & categorical).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10623:


Assignee: Zhan Zhang  (was: Apache Spark)

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Alan Braithwaite (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Alan Braithwaite updated SPARK-10647:
-
Issue Type: Improvement  (was: New Feature)

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>Reporter: Alan Braithwaite
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10050) Support collecting data of MapType in DataFrame

2015-09-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman updated SPARK-10050:
--
Assignee: Sun Rui

> Support collecting data of MapType in DataFrame
> ---
>
> Key: SPARK-10050
> URL: https://issues.apache.org/jira/browse/SPARK-10050
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
>Assignee: Sun Rui
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-10649:
-

 Summary: Streaming jobs unexpectedly inherits job group, job 
descriptions from context starting thread
 Key: SPARK-10649
 URL: https://issues.apache.org/jira/browse/SPARK-10649
 Project: Spark
  Issue Type: Bug
  Components: Streaming
Affects Versions: 1.5.0, 1.4.1, 1.3.1
Reporter: Tathagata Das
Assignee: Tathagata Das


The job group, job descriptions and scheduler pool information is passed 
through thread local properties, and get inherited by child threads. In case of 
spark streaming, the streaming jobs inherit these properties from the thread 
that called streamingContext.start(). This may not make sense. 

1. Job group: This is mainly used for cancelling a group of jobs together. It 
does not make sense to cancel streaming jobs like this, as the effect will be 
unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()

2. Job description: This is used to pass on nice text descriptions for jobs to 
show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-3978) Schema change on Spark-Hive (Parquet file format) table not working

2015-09-16 Thread Nilesh Barge (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-3978?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791171#comment-14791171
 ] 

Nilesh Barge commented on SPARK-3978:
-

I tested with the latest Spark 1.5 release... 
I got the source 
(http://www.apache.org/dyn/closer.lua/spark/spark-1.5.0/spark-1.5.0.tgz) and 
then build with "mvn -Phadoop-2.6 -Dhadoop.version=2.6.0 -Phive 
-Phive-thriftserver -DskipTests clean package" command... and then ran my 
original tests...


> Schema change on Spark-Hive (Parquet file format) table not working
> ---
>
> Key: SPARK-3978
> URL: https://issues.apache.org/jira/browse/SPARK-3978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0
>Reporter: Nilesh Barge
>Assignee: Alex Rovner
> Fix For: 1.5.0
>
>
> On following releases: 
> Spark 1.1.0 (built using sbt/sbt -Dhadoop.version=2.2.0 -Phive assembly) , 
> Apache HDFS 2.2 
> Spark job is able to create/add/read data in hive, parquet formatted, tables 
> using HiveContext. 
> But, after changing schema, spark job is not able to read data and throws 
> following exception: 
> java.lang.ArrayIndexOutOfBoundsException: 2 
> at 
> org.apache.hadoop.hive.ql.io.parquet.serde.ArrayWritableObjectInspector.getStructFieldData(ArrayWritableObjectInspector.java:127)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:284)
>  
> at 
> org.apache.spark.sql.hive.HadoopTableReader$$anonfun$fillObject$1.apply(TableReader.scala:278)
>  
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$$anon$11.next(Iterator.scala:328) 
> at scala.collection.Iterator$class.foreach(Iterator.scala:727) 
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1157) 
> at 
> scala.collection.generic.Growable$class.$plus$plus$eq(Growable.scala:48) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:103) 
> at 
> scala.collection.mutable.ArrayBuffer.$plus$plus$eq(ArrayBuffer.scala:47) 
> at 
> scala.collection.TraversableOnce$class.to(TraversableOnce.scala:273) 
> at scala.collection.AbstractIterator.to(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toBuffer(TraversableOnce.scala:265) 
> at scala.collection.AbstractIterator.toBuffer(Iterator.scala:1157) 
> at 
> scala.collection.TraversableOnce$class.toArray(TraversableOnce.scala:252) 
> at scala.collection.AbstractIterator.toArray(Iterator.scala:1157) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at org.apache.spark.rdd.RDD$$anonfun$16.apply(RDD.scala:774) 
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at 
> org.apache.spark.SparkContext$$anonfun$runJob$4.apply(SparkContext.scala:1121)
>  
> at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:62) 
> at org.apache.spark.scheduler.Task.run(Task.scala:54) 
> at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:177) 
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>  
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>  
> at java.lang.Thread.run(Thread.java:744)
> code snippet in short: 
> hiveContext.sql("CREATE EXTERNAL TABLE IF NOT EXISTS people_table (name 
> String, age INT) ROW FORMAT SERDE 'parquet.hive.serde.ParquetHiveSerDe' 
> STORED AS INPUTFORMAT 'parquet.hive.DeprecatedParquetInputFormat' 
> OUTPUTFORMAT 'parquet.hive.DeprecatedParquetOutputFormat'"); 
> hiveContext.sql("INSERT INTO TABLE people_table SELECT name, age FROM 
> temp_table_people1"); 
> hiveContext.sql("SELECT * FROM people_table"); //Here, data read was 
> successful.  
> hiveContext.sql("ALTER TABLE people_table ADD COLUMNS (gender STRING)"); 
> hiveContext.sql("SELECT * FROM people_table"); //Not able to read existing 
> data and ArrayIndexOutOfBoundsException is thrown.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-9794) ISO DateTime parser is too strict

2015-09-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-9794:
---
Assignee: Kevin Cox

> ISO DateTime parser is too strict
> -
>
> Key: SPARK-9794
> URL: https://issues.apache.org/jira/browse/SPARK-9794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0
>Reporter: Alex Angelini
>Assignee: Kevin Cox
> Fix For: 1.6.0
>
>
> The DateTime parser requires 3 millisecond digits, but that is not part of 
> the official ISO8601 spec.
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132
> https://en.wikipedia.org/wiki/ISO_8601
> This results in the following exception when trying to parse datetime columns
> {code}
> java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00"
> {code}
> [~joshrosen] [~rxin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-9794) ISO DateTime parser is too strict

2015-09-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-9794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin resolved SPARK-9794.

   Resolution: Fixed
Fix Version/s: 1.6.0

> ISO DateTime parser is too strict
> -
>
> Key: SPARK-9794
> URL: https://issues.apache.org/jira/browse/SPARK-9794
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.2, 1.3.1, 1.4.1, 1.5.0
>Reporter: Alex Angelini
>Assignee: Kevin Cox
> Fix For: 1.6.0
>
>
> The DateTime parser requires 3 millisecond digits, but that is not part of 
> the official ISO8601 spec.
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala#L132
> https://en.wikipedia.org/wiki/ISO_8601
> This results in the following exception when trying to parse datetime columns
> {code}
> java.text.ParseException: Unparseable date: "0001-01-01T00:00:00GMT-00:00"
> {code}
> [~joshrosen] [~rxin]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-10050) Support collecting data of MapType in DataFrame

2015-09-16 Thread Shivaram Venkataraman (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10050?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shivaram Venkataraman resolved SPARK-10050.
---
   Resolution: Fixed
Fix Version/s: 1.6.0

Issue resolved by pull request 8711
[https://github.com/apache/spark/pull/8711]

> Support collecting data of MapType in DataFrame
> ---
>
> Key: SPARK-10050
> URL: https://issues.apache.org/jira/browse/SPARK-10050
> Project: Spark
>  Issue Type: Sub-task
>  Components: SparkR
>Reporter: Sun Rui
> Fix For: 1.6.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Patrick Wendell (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-10650:

Target Version/s: 1.5.1

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Andrew Or
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6513) Add zipWithUniqueId (and other RDD APIs) to RDDApi

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-6513.
---
Resolution: Won't Fix

> Add zipWithUniqueId (and other RDD APIs) to RDDApi
> --
>
> Key: SPARK-6513
> URL: https://issues.apache.org/jira/browse/SPARK-6513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.3.0
> Environment: Windows 7 64bit, Scala 2.11.6, JDK 1.7.0_21 (though I 
> don't think it's relevant)
>Reporter: Eran Medan
>Priority: Minor
>
> It will be nice if we could treat a Dataframe just like an RDD (wherever it 
> makes sense) 
> *Worked in 1.2.1*
> {code}
>  val sqlContext = new HiveContext(sc)
>  import sqlContext._
>  val jsonRDD = sqlContext.jsonFile(jsonFilePath)
>  jsonRDD.registerTempTable("jsonTable")
>  val jsonResult = sql(s"select * from jsonTable")
>  val foo = jsonResult.zipWithUniqueId().map {
>case (Row(...), uniqueId) => // do something useful
>...
>  }
>  foo.registerTempTable("...")
> {code}
> *Stopped working in 1.3.0* 
> {code}   
> jsonResult.zipWithUniqueId() //since RDDApi doesn't implement that method
> {code}
> **Not working workaround:**
> although this might give me an {{RDD\[Row\]}}:
> {code}
> jsonResult.rdd.zipWithUniqueId()  
> {code}
> Now this won't work obviously since {{RDD\[Row\]}} does not have a 
> {{registerTempTable}} method of course
> {code}
>  foo.registerTempTable("...")
> {code}
> (see related SO question: 
> http://stackoverflow.com/questions/29243186/is-this-a-regression-bug-in-spark-1-3)
> EDIT: changed from issue to enhancement request 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-7841) Spark build should not use lib_managed for dependencies

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-7841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791063#comment-14791063
 ] 

Josh Rosen commented on SPARK-7841:
---

I agree that we can probably fix this, but note that we'll have to do something 
about how lib_managed is used in the dev/mima script.

> Spark build should not use lib_managed for dependencies
> ---
>
> Key: SPARK-7841
> URL: https://issues.apache.org/jira/browse/SPARK-7841
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.1
>Reporter: Iulian Dragos
>  Labels: easyfix, sbt
>
> - unnecessary duplication (I will have those libraries under ./m2, via maven 
> anyway)
> - every time I call make-distribution I lose lib_managed (via mvn clean 
> install) and have to wait to download again all jars next time I use sbt
> - Eclipse does not handle relative paths very well (source attachments from 
> lib_managed don’t always work)
> - it's not the default configuration. If we stray from defaults I think there 
> should be a clear advantage.
> Digging through history, the only reference to `retrieveManaged := true` I 
> found was in f686e3d, from July 2011 ("Initial work on converting build to 
> SBT 0.10.1"). My guess this is purely an accident of porting the build form 
> Sbt 0.7.x and trying to keep the old project layout.
> If there are reasons for keeping it, please comment (I didn't get any answers 
> on the [dev mailing 
> list|http://apache-spark-developers-list.1001551.n3.nabble.com/Why-use-quot-lib-managed-quot-for-the-Sbt-build-td12361.html])



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-6504) Cannot read Parquet files generated from different versions at once

2015-09-16 Thread Michael Armbrust (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-6504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Michael Armbrust resolved SPARK-6504.
-
   Resolution: Fixed
Fix Version/s: 1.3.1

This should be fixed.  Please reopen if you are still having problems.

> Cannot read Parquet files generated from different versions at once
> ---
>
> Key: SPARK-6504
> URL: https://issues.apache.org/jira/browse/SPARK-6504
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.1
>Reporter: Marius Soutier
> Fix For: 1.3.1
>
>
> When trying to read Parquet files generated by Spark 1.1.1 and 1.2.1 at the 
> same time via 
> `sqlContext.parquetFile("fileFrom1.1.parqut,fileFrom1.2.parquet")` an 
> exception occurs:
> could not merge metadata: key org.apache.spark.sql.parquet.row.metadata has 
> conflicting values: 
> [{"type":"struct","fields":[{"name":"date","type":"string","nullable":true,"metadata":{}},{"name":"account","type":"string","nullable":true,"metadata":{}},{"name":"impressions","type":"long","nullable":false,"metadata":{}},{"name":"cost","type":"double","nullable":false,"metadata":{}},{"name":"clicks","type":"long","nullable":false,"metadata":{}},{"name":"conversions","type":"long","nullable":false,"metadata":{}},{"name":"orderValue","type":"double","nullable":false,"metadata":{}}]},
>  StructType(List(StructField(date,StringType,true), 
> StructField(account,StringType,true), 
> StructField(impressions,LongType,false), StructField(cost,DoubleType,false), 
> StructField(clicks,LongType,false), StructField(conversions,LongType,false), 
> StructField(orderValue,DoubleType,false)))]
> The Schema is exactly equal.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10648:


Assignee: Apache Spark

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>Assignee: Apache Spark
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791081#comment-14791081
 ] 

Apache Spark commented on SPARK-10648:
--

User 'travishegner' has created a pull request for this issue:
https://github.com/apache/spark/pull/8780

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10648:


Assignee: (was: Apache Spark)

> Spark-SQL JDBC fails to set a default precision and scale when they are not 
> defined in an oracle schema.
> 
>
> Key: SPARK-10648
> URL: https://issues.apache.org/jira/browse/SPARK-10648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
> Environment: using oracle 11g, ojdbc7.jar
>Reporter: Travis Hegner
>
> Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
> scala app, I am getting an exception "Overflowed precision". Some times I 
> would get the exception "Unscaled value too large for precision".
> This issue likely affects older versions as well, but this was the version I 
> verified it on.
> I narrowed it down to the fact that the schema detection system was trying to 
> set the precision to 0, and the scale to -127.
> I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10649:


Assignee: Tathagata Das  (was: Apache Spark)

> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> The job group, job descriptions and scheduler pool information is passed 
> through thread local properties, and get inherited by child threads. In case 
> of spark streaming, the streaming jobs inherit these properties from the 
> thread that called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791109#comment-14791109
 ] 

Apache Spark commented on SPARK-10649:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/8781

> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> The job group, job descriptions and scheduler pool information is passed 
> through thread local properties, and get inherited by child threads. In case 
> of spark streaming, the streaming jobs inherit these properties from the 
> thread that called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10646) Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. categorical

2015-09-16 Thread Jihong MA (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jihong MA updated SPARK-10646:
--
Component/s: SQL
 ML

> Bivariate Statistics: Pearson's Chi-Squared Test for categorical vs. 
> categorical
> 
>
> Key: SPARK-10646
> URL: https://issues.apache.org/jira/browse/SPARK-10646
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, SQL
>Reporter: Jihong MA
>
> Pearson's chi-squared goodness of fit test for observed against the expected 
> distribution.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10626) Create a Java friendly method for randomRDD & RandomDataGenerator on RandomRDDs.

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10626:


Assignee: Apache Spark

> Create a Java friendly method for randomRDD & RandomDataGenerator on 
> RandomRDDs.
> 
>
> Key: SPARK-10626
> URL: https://issues.apache.org/jira/browse/SPARK-10626
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: holdenk
>Assignee: Apache Spark
>Priority: Minor
>
> SPARK-3136 added a large number of functions for creating Java RandomRDDs, 
> but for people that want to use custom RandomDataGenerators we should make a 
> Java friendly method.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10648) Spark-SQL JDBC fails to set a default precision and scale when they are not defined in an oracle schema.

2015-09-16 Thread Travis Hegner (JIRA)

Travis Hegner created SPARK-10648:
-

 Summary: Spark-SQL JDBC fails to set a default precision and scale 
when they are not defined in an oracle schema.
 Key: SPARK-10648
 URL: https://issues.apache.org/jira/browse/SPARK-10648
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
 Environment: using oracle 11g, ojdbc7.jar
Reporter: Travis Hegner


Using oracle 11g as a datasource with ojdbc7.jar. When importing data into a 
scala app, I am getting an exception "Overflowed precision". Some times I would 
get the exception "Unscaled value too large for precision".

This issue likely affects older versions as well, but this was the version I 
verified it on.

I narrowed it down to the fact that the schema detection system was trying to 
set the precision to 0, and the scale to -127.

I have a proposed pull request to follow.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10649:


Assignee: Apache Spark  (was: Tathagata Das)

> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> The job group, job descriptions and scheduler pool information is passed 
> through thread local properties, and get inherited by child threads. In case 
> of spark streaming, the streaming jobs inherit these properties from the 
> thread that called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791179#comment-14791179
 ] 

Apache Spark commented on SPARK-10623:
--

User 'zhzhan' has created a pull request for this issue:
https://github.com/apache/spark/pull/8783

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10623:


Assignee: Apache Spark  (was: Zhan Zhang)

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Apache Spark
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10623) turning on predicate pushdown throws nonsuch element exception when RDD is empty

2015-09-16 Thread Reynold Xin (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Reynold Xin updated SPARK-10623:

Target Version/s: 1.6.0, 1.5.1

> turning on predicate pushdown throws nonsuch element exception when RDD is 
> empty 
> -
>
> Key: SPARK-10623
> URL: https://issues.apache.org/jira/browse/SPARK-10623
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Reporter: Ram Sriharsha
>Assignee: Zhan Zhang
>
> Turning on predicate pushdown for ORC datasources results in a 
> NoSuchElementException:
> scala> val df = sqlContext.sql("SELECT name FROM people WHERE age < 15")
> df: org.apache.spark.sql.DataFrame = [name: string]
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "true")
> scala> df.explain
> == Physical Plan ==
> java.util.NoSuchElementException
> Disabling the pushdown makes things work again:
> scala> sqlContext.setConf("spark.sql.orc.filterPushdown", "false")
> scala> df.explain
> == Physical Plan ==
> Project [name#6]
>  Filter (age#7 < 15)
>   Scan 
> OrcRelation[file:/home/mydir/spark-1.5.0-SNAPSHOT/test/people][name#6,age#7]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Description: 
Saw many failures recently in master build. See attached CSV for a full list. 
Most of the error messages are:

{code}
Can't find 2 executors before 1 milliseconds elapsed
{code}
.



  was:
Saw many failures recently in master build. See attached CSV for a full list. 
Most of the error messages are: Can't find 2 executors before 1 
milliseconds elapsed
.




> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list. 
> Most of the error messages are:
> {code}
> Can't find 2 executors before 1 milliseconds elapsed
> {code}
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791354#comment-14791354
 ] 

Apache Spark commented on SPARK-10381:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8790

> Infinite loop when OutputCommitCoordination is enabled and 
> OutputCommitter.commitTask throws exception
> --
>
> Key: SPARK-10381
> URL: https://issues.apache.org/jira/browse/SPARK-10381
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.6.0, 1.5.1
>
>
> When speculative execution is enabled, consider a scenario where the 
> authorized committer of a particular output partition fails during the 
> OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator 
> is supposed to release that committer's exclusive lock on committing once 
> that task fails. However, due to a unit mismatch the lock will not be 
> released, causing Spark to go into an infinite retry loop.
> This bug was masked by the fact that the OutputCommitCoordinator does not 
> have enough end-to-end tests (the current tests use many mocks). Other 
> factors contributing to this bug are the fact that we have many 
> similarly-named identifiers that have different semantics but the same data 
> types (e.g. attemptNumber and taskAttemptId, with inconsistent variable 
> naming which makes them difficult to distinguish).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791370#comment-14791370
 ] 

Saisai Shao commented on SPARK-10644:
-

Does you jobs have dependencies? That is saying the 4th job relies on the first 
3 jobs to be finished and get results.

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10653) Remove unnecessary things from SparkEnv

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791388#comment-14791388
 ] 

Josh Rosen commented on SPARK-10653:


Note that SparkEnv is technically a developer API, but all of its fields point 
to things which are non-developer-API. Thus I feel that there's not a 
compatibility concern here, but others might disagree.

> Remove unnecessary things from SparkEnv
> ---
>
> Key: SPARK-10653
> URL: https://issues.apache.org/jira/browse/SPARK-10653
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.0.0
>Reporter: Andrew Or
>
> As of the writing of this message, there are at least two things that can be 
> removed from it:
> {code}
> @DeveloperApi
> class SparkEnv (
> val executorId: String,
> private[spark] val rpcEnv: RpcEnv,
> val serializer: Serializer,
> val closureSerializer: Serializer,
> val cacheManager: CacheManager,
> val mapOutputTracker: MapOutputTracker,
> val shuffleManager: ShuffleManager,
> val broadcastManager: BroadcastManager,
> val blockTransferService: BlockTransferService, // this one can go
> val blockManager: BlockManager,
> val securityManager: SecurityManager,
> val httpFileServer: HttpFileServer,
> val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
> val metricsSystem: MetricsSystem,
> val shuffleMemoryManager: ShuffleMemoryManager,
> val executorMemoryManager: ExecutorMemoryManager, // this can go
> val outputCommitCoordinator: OutputCommitCoordinator,
> val conf: SparkConf) extends Logging {
>   ...
> }
> {code}
> We should avoid adding to this infinite list of things in SparkEnv's 
> constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Affects Version/s: 1.4.1
   1.5.0

> Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be 
> documented
> ---
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied

2015-09-16 Thread Thomas Graves (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791438#comment-14791438
 ] 

Thomas Graves commented on SPARK-10640:
---

yes 1.5 history server reading 1.5.0 logs.  I'm not as worried about forward 
compatibility but it would be nice if we handled and put blank or unknown for 
values like this so it will at least be viewable. 

> Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
> --
>
> Key: SPARK-10640
> URL: https://issues.apache.org/jira/browse/SPARK-10640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I'm seeing an exception from the spark history server trying to read a 
> history file:
> scala.MatchError: TaskCommitDenied (of class java.lang.String)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531)
> at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Balagopal Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791434#comment-14791434
 ] 

Balagopal Nair edited comment on SPARK-10644 at 9/17/15 1:51 AM:
-

No. These are independent jobs running under different SparkContexts.
Sorry about not being clear enough before... I'm trying share the same cluster 
between varrious applications. This issue is related to scheduling across 
applications and not within the same application.


was (Author: nbalagopal):
No. These are independent jobs running under different SparkContexts

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Peng Cheng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791252#comment-14791252
 ] 

Peng Cheng commented on SPARK-10625:


A pull request has been send that contains 2 extra unit tests and a simple fix:
https://github.com/apache/spark/pull/8785

Can you help me validating it and merge in 1.5.1?

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Labels: flaky-test  (was: )

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
>  Labels: flaky-test
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list. 
> Most of the error messages are:
> {code}
> Can't find 2 executors before 1 milliseconds elapsed
> {code}
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10649) Streaming jobs unexpectedly inherits job group, job descriptions from context starting thread

2015-09-16 Thread Tathagata Das (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tathagata Das updated SPARK-10649:
--
Description: 
The job group, and job descriptions information is passed through thread local 
properties, and get inherited by child threads. In case of spark streaming, the 
streaming jobs inherit these properties from the thread that called 
streamingContext.start(). This may not make sense. 

1. Job group: This is mainly used for cancelling a group of jobs together. It 
does not make sense to cancel streaming jobs like this, as the effect will be 
unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()

2. Job description: This is used to pass on nice text descriptions for jobs to 
show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.


  was:
The job group, job descriptions and scheduler pool information is passed 
through thread local properties, and get inherited by child threads. In case of 
spark streaming, the streaming jobs inherit these properties from the thread 
that called streamingContext.start(). This may not make sense. 

1. Job group: This is mainly used for cancelling a group of jobs together. It 
does not make sense to cancel streaming jobs like this, as the effect will be 
unpredictable. And its not a valid usecase any way, to cancel a streaming 
context, call streamingContext.stop()

2. Job description: This is used to pass on nice text descriptions for jobs to 
show up in the UI. The job description of the thread that calls 
streamingContext.start() is not useful for all the streaming jobs, as it does 
not make sense for all of the streaming jobs to have the same description, and 
the description may or may not be related to streaming.



> Streaming jobs unexpectedly inherits job group, job descriptions from context 
> starting thread
> -
>
> Key: SPARK-10649
> URL: https://issues.apache.org/jira/browse/SPARK-10649
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> The job group, and job descriptions information is passed through thread 
> local properties, and get inherited by child threads. In case of spark 
> streaming, the streaming jobs inherit these properties from the thread that 
> called streamingContext.start(). This may not make sense. 
> 1. Job group: This is mainly used for cancelling a group of jobs together. It 
> does not make sense to cancel streaming jobs like this, as the effect will be 
> unpredictable. And its not a valid usecase any way, to cancel a streaming 
> context, call streamingContext.stop()
> 2. Job description: This is used to pass on nice text descriptions for jobs 
> to show up in the UI. The job description of the thread that calls 
> streamingContext.start() is not useful for all the streaming jobs, as it does 
> not make sense for all of the streaming jobs to have the same description, 
> and the description may or may not be related to streaming.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10381) Infinite loop when OutputCommitCoordination is enabled and OutputCommitter.commitTask throws exception

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10381?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791347#comment-14791347
 ] 

Apache Spark commented on SPARK-10381:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8789

> Infinite loop when OutputCommitCoordination is enabled and 
> OutputCommitter.commitTask throws exception
> --
>
> Key: SPARK-10381
> URL: https://issues.apache.org/jira/browse/SPARK-10381
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.3.1, 1.4.1, 1.5.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Critical
> Fix For: 1.6.0, 1.5.1
>
>
> When speculative execution is enabled, consider a scenario where the 
> authorized committer of a particular output partition fails during the 
> OutputCommitter.commitTask() call. In this case, the OutputCommitCoordinator 
> is supposed to release that committer's exclusive lock on committing once 
> that task fails. However, due to a unit mismatch the lock will not be 
> released, causing Spark to go into an infinite retry loop.
> This bug was masked by the fact that the OutputCommitCoordinator does not 
> have enough end-to-end tests (the current tests use many mocks). Other 
> factors contributing to this bug are the fact that we have many 
> similarly-named identifiers that have different semantics but the same data 
> types (e.g. attemptNumber and taskAttemptId, with inconsistent variable 
> naming which makes them difficult to distinguish).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10653) Remove unnecessary things from SparkEnv

2015-09-16 Thread Andrew Or (JIRA)

Andrew Or created SPARK-10653:
-

 Summary: Remove unnecessary things from SparkEnv
 Key: SPARK-10653
 URL: https://issues.apache.org/jira/browse/SPARK-10653
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 1.0.0
Reporter: Andrew Or


As of the writing of this message, there are at least two things that can be 
removed from it:
{code}
@DeveloperApi
class SparkEnv (
val executorId: String,
private[spark] val rpcEnv: RpcEnv,
val serializer: Serializer,
val closureSerializer: Serializer,
val cacheManager: CacheManager,
val mapOutputTracker: MapOutputTracker,
val shuffleManager: ShuffleManager,
val broadcastManager: BroadcastManager,
val blockTransferService: BlockTransferService, // this one can go
val blockManager: BlockManager,
val securityManager: SecurityManager,
val httpFileServer: HttpFileServer,
val sparkFilesDir: String, // this one maybe? It's only used in 1 place.
val metricsSystem: MetricsSystem,
val shuffleMemoryManager: ShuffleMemoryManager,
val executorMemoryManager: ExecutorMemoryManager, // this can go
val outputCommitCoordinator: OutputCommitCoordinator,
val conf: SparkConf) extends Logging {
  ...
}
{code}
We should avoid adding to this infinite list of things in SparkEnv's 
constructors if they're not needed.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10652:


Assignee: Tathagata Das  (was: Apache Spark)

> Set good job descriptions for streaming related jobs
> 
>
> Key: SPARK-10652
> URL: https://issues.apache.org/jira/browse/SPARK-10652
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Job descriptions will help distinguish jobs of one batch from the other in 
> the Jobs and Stages pages in the Spark UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10652:


Assignee: Apache Spark  (was: Tathagata Das)

> Set good job descriptions for streaming related jobs
> 
>
> Key: SPARK-10652
> URL: https://issues.apache.org/jira/browse/SPARK-10652
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Apache Spark
>
> Job descriptions will help distinguish jobs of one batch from the other in 
> the Jobs and Stages pages in the Spark UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791357#comment-14791357
 ] 

Apache Spark commented on SPARK-10652:
--

User 'tdas' has created a pull request for this issue:
https://github.com/apache/spark/pull/8791

> Set good job descriptions for streaming related jobs
> 
>
> Key: SPARK-10652
> URL: https://issues.apache.org/jira/browse/SPARK-10652
> Project: Spark
>  Issue Type: Improvement
>  Components: Streaming
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Tathagata Das
>Assignee: Tathagata Das
>
> Job descriptions will help distinguish jobs of one batch from the other in 
> the Jobs and Stages pages in the Spark UI



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10635) pyspark - running on a different host

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10635?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791389#comment-14791389
 ] 

Josh Rosen commented on SPARK-10635:


[~davies], do you think we should support this? This seems like a 
hard-to-support feature, so I'm inclined to say that this issue is "Won't Fix" 
as currently described.

> pyspark - running on a different host
> -
>
> Key: SPARK-10635
> URL: https://issues.apache.org/jira/browse/SPARK-10635
> Project: Spark
>  Issue Type: Improvement
>Reporter: Ben Duffield
>
> At various points we assume we only ever talk to a driver on the same host.
> e.g. 
> https://github.com/apache/spark/blob/v1.4.1/core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala#L615
> We use pyspark to connect to an existing driver (i.e. do not let pyspark 
> launch the driver itself, but instead construct the SparkContext with the 
> gateway and jsc arguments.
> There are a few reasons for this, but essentially it's to allow more 
> flexibility when running in AWS.
> Before 1.3.1 we were able to monkeypatch around this:  
> {code}
> def _load_from_socket(port, serializer):
> sock = socket.socket()
> sock.settimeout(3)
> try:
> sock.connect((host, port))
> rf = sock.makefile("rb", 65536)
> for item in serializer.load_stream(rf):
> yield item
> finally:
> sock.close()
> pyspark.rdd._load_from_socket = _load_from_socket
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Balagopal Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791434#comment-14791434
 ] 

Balagopal Nair commented on SPARK-10644:


No. These are independent jobs running under different SparkContexts

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Attachment: BroadcastSuiteFailures.csv

> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10651:
--
Description: 
Saw many failures recently in master build. See attached CSV for a full list. 
Most of the error messages are: Can't find 2 executors before 1 
milliseconds elapsed
.



  was:
Saw many failures recently in master build. See attached CSV for a full list.




> Flaky test: BroadcastSuite
> --
>
> Key: SPARK-10651
> URL: https://issues.apache.org/jira/browse/SPARK-10651
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Affects Versions: 1.6.0
>Reporter: Xiangrui Meng
>Assignee: Shixiong Zhu
>Priority: Blocker
> Attachments: BroadcastSuiteFailures.csv
>
>
> Saw many failures recently in master build. See attached CSV for a full list. 
> Most of the error messages are: Can't find 2 executors before 1 
> milliseconds elapsed
> .



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10651) Flaky test: BroadcastSuite

2015-09-16 Thread Xiangrui Meng (JIRA)

Xiangrui Meng created SPARK-10651:
-

 Summary: Flaky test: BroadcastSuite
 Key: SPARK-10651
 URL: https://issues.apache.org/jira/browse/SPARK-10651
 Project: Spark
  Issue Type: Bug
  Components: Spark Core, Tests
Affects Versions: 1.6.0
Reporter: Xiangrui Meng
Assignee: Shixiong Zhu
Priority: Blocker


Saw many failures recently in master build. See attached CSV for a full list.





--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10058:
--
Component/s: Tests

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core, Tests
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
>

[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10639:


Assignee: Apache Spark

> Need to convert UDAF's result from scala to sql type
> 
>
> Key: SPARK-10639
> URL: https://issues.apache.org/jira/browse/SPARK-10639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Assignee: Apache Spark
>Priority: Blocker
>
> We are missing a conversion at 
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10639) Need to convert UDAF's result from scala to sql type

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791302#comment-14791302
 ] 

Apache Spark commented on SPARK-10639:
--

User 'yhuai' has created a pull request for this issue:
https://github.com/apache/spark/pull/8788

> Need to convert UDAF's result from scala to sql type
> 
>
> Key: SPARK-10639
> URL: https://issues.apache.org/jira/browse/SPARK-10639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Blocker
>
> We are missing a conversion at 
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10639) Need to convert UDAF's result from scala to sql type

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10639:


Assignee: (was: Apache Spark)

> Need to convert UDAF's result from scala to sql type
> 
>
> Key: SPARK-10639
> URL: https://issues.apache.org/jira/browse/SPARK-10639
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Yin Huai
>Priority: Blocker
>
> We are missing a conversion at 
> https://github.com/apache/spark/blob/branch-1.5/sql/core/src/main/scala/org/apache/spark/sql/execution/aggregate/udaf.scala#L427.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10640:
---
Affects Version/s: 1.3.0
   1.4.0
 Target Version/s: 1.5.1
 Priority: Critical  (was: Major)

> Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
> --
>
> Key: SPARK-10640
> URL: https://issues.apache.org/jira/browse/SPARK-10640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> I'm seeing an exception from the spark history server trying to read a 
> history file:
> scala.MatchError: TaskCommitDenied (of class java.lang.String)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531)
> at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10640) Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10640?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791416#comment-14791416
 ] 

Josh Rosen commented on SPARK-10640:


This is a 1.5.0 history server reading 1.5.0 logs? In principle we also have 
this bug when trying to read 1.5.x logs with a 1.4.x history server. I'm going 
to mark this as a 1.5.1 critical bug to make sure it gets fixed there. This 
probably affects 1.3.x and 1.4.x, too, so I'm going to update the affected 
versions.

> Spark history server fails to parse taskEndReasonFromJson TaskCommitDenied
> --
>
> Key: SPARK-10640
> URL: https://issues.apache.org/jira/browse/SPARK-10640
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.3.0, 1.4.0, 1.5.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> I'm seeing an exception from the spark history server trying to read a 
> history file:
> scala.MatchError: TaskCommitDenied (of class java.lang.String)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndReasonFromJson(JsonProtocol.scala:775)
> at 
> org.apache.spark.util.JsonProtocol$.taskEndFromJson(JsonProtocol.scala:531)
> at 
> org.apache.spark.util.JsonProtocol$.sparkEventFromJson(JsonProtocol.scala:488)
> at 
> org.apache.spark.scheduler.ReplayListenerBus.replay(ReplayListenerBus.scala:58)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$replay(FsHistoryProvider.scala:457)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:292)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$10.apply(FsHistoryProvider.scala:289)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.TraversableLike$$anonfun$flatMap$1.apply(TraversableLike.scala:251)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
> at 
> scala.collection.TraversableLike$class.flatMap(TraversableLike.scala:251)
> at scala.collection.AbstractTraversable.flatMap(Traversable.scala:105)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider.org$apache$spark$deploy$history$FsHistoryProvider$$mergeApplicationListing(FsHistoryProvider.scala:289)
> at 
> org.apache.spark.deploy.history.FsHistoryProvider$$anonfun$checkForLogs$1$$anon$2.run(FsHistoryProvider.scala:210)
> at 
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2015-09-16 Thread Suresh Thalamati (JIRA)

Suresh Thalamati created SPARK-10655:


 Summary: Enhance DB2 dialect to handle XML, and DECIMAL , and 
DECFLOAT
 Key: SPARK-10655
 URL: https://issues.apache.org/jira/browse/SPARK-10655
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 1.5.0
Reporter: Suresh Thalamati


Default type mapping does not work when reading from DB2 table that contains  
XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10655) Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT

2015-09-16 Thread Suresh Thalamati (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791440#comment-14791440
 ] 

Suresh Thalamati commented on SPARK-10655:
--

I am working on pull request for this issue.

> Enhance DB2 dialect to handle XML, and DECIMAL , and DECFLOAT
> -
>
> Key: SPARK-10655
> URL: https://issues.apache.org/jira/browse/SPARK-10655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 1.5.0
>Reporter: Suresh Thalamati
>
> Default type mapping does not work when reading from DB2 table that contains  
> XML,  DECFLOAT  for READ , and DECIMAL type for write. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Saisai Shao (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791439#comment-14791439
 ] 

Saisai Shao commented on SPARK-10644:
-

So what's the cluster manager you use, standalone, mesos or Yarn? There 
shouldn't have such problem is resource is enough as far as I know.

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10058:
--
Priority: Blocker  (was: Critical)

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at

[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10650:


Assignee: Apache Spark  (was: Michael Armbrust)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Apache Spark
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791286#comment-14791286
 ] 

Xiangrui Meng commented on SPARK-10058:
---

Changed the priority to Blocker since this failed master builds frequently.

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Test
>  Components: Spark Core
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
>

[jira] [Assigned] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10650:


Assignee: Michael Armbrust  (was: Apache Spark)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791285#comment-14791285
 ] 

Apache Spark commented on SPARK-10650:
--

User 'marmbrus' has created a pull request for this issue:
https://github.com/apache/spark/pull/8787

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10058) Flaky test: HeartbeatReceiverSuite: normal heartbeat

2015-09-16 Thread Xiangrui Meng (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10058?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiangrui Meng updated SPARK-10058:
--
Issue Type: Bug  (was: Test)

> Flaky test: HeartbeatReceiverSuite: normal heartbeat
> 
>
> Key: SPARK-10058
> URL: https://issues.apache.org/jira/browse/SPARK-10058
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Tests
>Reporter: Davies Liu
>Assignee: Andrew Or
>Priority: Blocker
>  Labels: flaky-test
>
> https://amplab.cs.berkeley.edu/jenkins/view/Spark-QA-Test/job/Spark-1.5-SBT/116/AMPLAB_JENKINS_BUILD_PROFILE=hadoop2.2,label=spark-test/testReport/junit/org.apache.spark/HeartbeatReceiverSuite/normal_heartbeat/
> {code}
> Error Message
> 3 did not equal 2
> Stacktrace
> sbt.ForkMain$ForkError: 3 did not equal 2
>   at 
> org.scalatest.Assertions$class.newAssertionFailedException(Assertions.scala:500)
>   at 
> org.scalatest.FunSuite.newAssertionFailedException(FunSuite.scala:1555)
>   at 
> org.scalatest.Assertions$AssertionsHelper.macroAssert(Assertions.scala:466)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply$mcV$sp(HeartbeatReceiverSuite.scala:104)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.apache.spark.HeartbeatReceiverSuite$$anonfun$2.apply(HeartbeatReceiverSuite.scala:97)
>   at 
> org.scalatest.Transformer$$anonfun$apply$1.apply$mcV$sp(Transformer.scala:22)
>   at org.scalatest.OutcomeOf$class.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
>   at org.scalatest.Transformer.apply(Transformer.scala:22)
>   at org.scalatest.Transformer.apply(Transformer.scala:20)
>   at org.scalatest.FunSuiteLike$$anon$1.apply(FunSuiteLike.scala:166)
>   at org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:42)
>   at 
> org.scalatest.FunSuiteLike$class.invokeWithFixture$1(FunSuiteLike.scala:163)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTest$1.apply(FunSuiteLike.scala:175)
>   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
>   at org.scalatest.FunSuiteLike$class.runTest(FunSuiteLike.scala:175)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterEach$$super$runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterEach$class.runTest(BeforeAndAfterEach.scala:255)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.runTest(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$runTests$1.apply(FunSuiteLike.scala:208)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:413)
>   at 
> org.scalatest.SuperEngine$$anonfun$traverseSubNodes$1$1.apply(Engine.scala:401)
>   at scala.collection.immutable.List.foreach(List.scala:318)
>   at org.scalatest.SuperEngine.traverseSubNodes$1(Engine.scala:401)
>   at 
> org.scalatest.SuperEngine.org$scalatest$SuperEngine$$runTestsInBranch(Engine.scala:396)
>   at org.scalatest.SuperEngine.runTestsImpl(Engine.scala:483)
>   at org.scalatest.FunSuiteLike$class.runTests(FunSuiteLike.scala:208)
>   at org.scalatest.FunSuite.runTests(FunSuite.scala:1555)
>   at org.scalatest.Suite$class.run(Suite.scala:1424)
>   at 
> org.scalatest.FunSuite.org$scalatest$FunSuiteLike$$super$run(FunSuite.scala:1555)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at 
> org.scalatest.FunSuiteLike$$anonfun$run$1.apply(FunSuiteLike.scala:212)
>   at org.scalatest.SuperEngine.runImpl(Engine.scala:545)
>   at org.scalatest.FunSuiteLike$class.run(FunSuiteLike.scala:212)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.org$scalatest$BeforeAndAfterAll$$super$run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.BeforeAndAfterAll$class.liftedTree1$1(BeforeAndAfterAll.scala:257)
>   at 
> org.scalatest.BeforeAndAfterAll$class.run(BeforeAndAfterAll.scala:256)
>   at 
> org.apache.spark.HeartbeatReceiverSuite.run(HeartbeatReceiverSuite.scala:41)
>   at 
> org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:462)
>   at 
> org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:671)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:294)
>   at sbt.ForkMain$Run$2.call(ForkMain.java:284)
>   at java.util.concurrent.FutureTask.run(FutureTask.java:262)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at

[jira] [Commented] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791425#comment-14791425
 ] 

Apache Spark commented on SPARK-10654:
--

User 'rezazadeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/8792

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10654:


Assignee: Apache Spark

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>Assignee: Apache Spark
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10644) Applications wait even if free executors are available

2015-09-16 Thread Balagopal Nair (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10644?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791443#comment-14791443
 ] 

Balagopal Nair commented on SPARK-10644:


Standalone cluster manager. I've verified this behaviour again now. 

> Applications wait even if free executors are available
> --
>
> Key: SPARK-10644
> URL: https://issues.apache.org/jira/browse/SPARK-10644
> Project: Spark
>  Issue Type: Bug
>  Components: Scheduler
>Affects Versions: 1.5.0
> Environment: RHEL 6.5 64 bit
>Reporter: Balagopal Nair
>
> Number of workers: 21
> Number of executors: 63
> Steps to reproduce:
> 1. Run 4 jobs each with max cores set to 10
> 2. The first 3 jobs run with 10 each. (30 executors consumed so far)
> 3. The 4 th job waits even though there are 33 idle executors.
> The reason is that a job will not get executors unless 
> the total number of EXECUTORS in use < the number of WORKERS
> If there are executors available, resources should be allocated to the 
> pending job.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Issue Type: Bug  (was: Improvement)

> Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be 
> documented
> ---
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Bug
>  Components: Mesos
>Affects Versions: 1.4.1, 1.5.0
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be documented

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Summary: Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs 
should be documented  (was: Rename property spark.deploy.zookeeper.dir to 
spark.mesos.deploy.zookeeper.dir)

> Mesos HA mode misuses spark.deploy.zookeeper.dir property; configs should be 
> documented
> ---
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Component/s: Mesos

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Josh Rosen (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791387#comment-14791387
 ] 

Josh Rosen commented on SPARK-10647:


The spark.deploy.zookeeper.* properties are used by the standalone mode's HA 
recovery features 
(https://spark.apache.org/docs/latest/spark-standalone.html#high-availability). 
I think the correct fix here is to update the Mesos code to use 
spark.deploy.mesos.zookeeper.dir 
(https://github.com/apache/spark/pull/5144/files#diff-3c5e5516915ada1d89f1259de069R97).
 We should also update the Mesos documentation to mention these configurations, 
since they don't appear to be documented anywhere.


[~tnachen], I'm going to assign the doc updates and bugfixes to you.

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10647) Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir

2015-09-16 Thread Josh Rosen (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen updated SPARK-10647:
---
Assignee: Timothy Chen

> Rename property spark.deploy.zookeeper.dir to spark.mesos.deploy.zookeeper.dir
> --
>
> Key: SPARK-10647
> URL: https://issues.apache.org/jira/browse/SPARK-10647
> Project: Spark
>  Issue Type: Improvement
>  Components: Mesos
>Reporter: Alan Braithwaite
>Assignee: Timothy Chen
>Priority: Minor
>
> This property doesn't match up with the other properties surrounding it, 
> namely:
> spark.mesos.deploy.zookeeper.url
> and
> spark.mesos.deploy.recoveryMode
> Since it's also a property specific to mesos, it makes sense to be under that 
> hierarchy as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791420#comment-14791420
 ] 

Apache Spark commented on SPARK-10625:
--

User 'tribbloid' has created a pull request for this issue:
https://github.com/apache/spark/pull/8785

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10625:


Assignee: (was: Apache Spark)

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Reza Zadeh (JIRA)

Reza Zadeh created SPARK-10654:
--

 Summary: Add columnSimilarities to IndexedRowMatrix
 Key: SPARK-10654
 URL: https://issues.apache.org/jira/browse/SPARK-10654
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Reporter: Reza Zadeh


Add columnSimilarities to IndexedRowMatrix.

In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10625) Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds unserializable objects into connection properties

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10625:


Assignee: Apache Spark

> Spark SQL JDBC read/write is unable to handle JDBC Drivers that adds 
> unserializable objects into connection properties
> --
>
> Key: SPARK-10625
> URL: https://issues.apache.org/jira/browse/SPARK-10625
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.4.1, 1.5.0
> Environment: Ubuntu 14.04
>Reporter: Peng Cheng
>Assignee: Apache Spark
>  Labels: jdbc, spark, sparksql
>
> Some JDBC drivers (e.g. SAP HANA) tries to optimize connection pooling by 
> adding new objects into the connection properties, which is then reused by 
> Spark to be deployed to workers. When some of these new objects are unable to 
> be serializable it will trigger an org.apache.spark.SparkException: Task not 
> serializable. The following test code snippet demonstrate this problem by 
> using a modified H2 driver:
>   test("INSERT to JDBC Datasource with UnserializableH2Driver") {
> object UnserializableH2Driver extends org.h2.Driver {
>   override def connect(url: String, info: Properties): Connection = {
> val result = super.connect(url, info)
> info.put("unserializableDriver", this)
> result
>   }
>   override def getParentLogger: Logger = ???
> }
> import scala.collection.JavaConversions._
> val oldDrivers = 
> DriverManager.getDrivers.filter(_.acceptsURL("jdbc:h2:")).toSeq
> oldDrivers.foreach{
>   DriverManager.deregisterDriver
> }
> DriverManager.registerDriver(UnserializableH2Driver)
> sql("INSERT INTO TABLE PEOPLE1 SELECT * FROM PEOPLE")
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", properties).count)
> assert(2 === sqlContext.read.jdbc(url1, "TEST.PEOPLE1", 
> properties).collect()(0).length)
> DriverManager.deregisterDriver(UnserializableH2Driver)
> oldDrivers.foreach{
>   DriverManager.registerDriver
> }
>   }



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-10650) Spark docs include test and other extra classes

2015-09-16 Thread Andrew Or (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Or updated SPARK-10650:
--
Target Version/s: 1.6.0, 1.5.1  (was: 1.5.1)

> Spark docs include test and other extra classes
> ---
>
> Key: SPARK-10650
> URL: https://issues.apache.org/jira/browse/SPARK-10650
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 1.5.0
>Reporter: Patrick Wendell
>Assignee: Michael Armbrust
>Priority: Critical
>
> In 1.5.0 there are some extra classes in the Spark docs - including a bunch 
> of test classes. We need to figure out what commit introduced those and fix 
> it. The obvious things like genJavadoc version have not changed.
> http://spark.apache.org/docs/1.4.1/api/java/org/apache/spark/streaming/ 
> [before]
> http://spark.apache.org/docs/1.5.0/api/java/org/apache/spark/streaming/ 
> [after]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10652) Set good job descriptions for streaming related jobs

2015-09-16 Thread Tathagata Das (JIRA)

Tathagata Das created SPARK-10652:
-

 Summary: Set good job descriptions for streaming related jobs
 Key: SPARK-10652
 URL: https://issues.apache.org/jira/browse/SPARK-10652
 Project: Spark
  Issue Type: Improvement
  Components: Streaming
Affects Versions: 1.5.0, 1.4.1
Reporter: Tathagata Das
Assignee: Tathagata Das


Job descriptions will help distinguish jobs of one batch from the other in the 
Jobs and Stages pages in the Spark UI




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10654) Add columnSimilarities to IndexedRowMatrix

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10654:


Assignee: (was: Apache Spark)

> Add columnSimilarities to IndexedRowMatrix
> --
>
> Key: SPARK-10654
> URL: https://issues.apache.org/jira/browse/SPARK-10654
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Reporter: Reza Zadeh
>
> Add columnSimilarities to IndexedRowMatrix.
> In another JIRA adding rowSimilarities to IndexedRowMatrix, tracked by 
> SPARK-4823



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-10656) select(df(*)) fails when a column has special characters

2015-09-16 Thread Nick Pritchard (JIRA)

Nick Pritchard created SPARK-10656:
--

 Summary: select(df(*)) fails when a column has special characters
 Key: SPARK-10656
 URL: https://issues.apache.org/jira/browse/SPARK-10656
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 1.5.0
Reporter: Nick Pritchard


Best explained with this example:
{code}
val df = sqlContext.read.json(sqlContext.sparkContext.makeRDD(
  """{"a.b": "c", "d": "e" }""" :: Nil))
df.select("*").show() //successful
df.select(df("*")).show() //throws exception
df.withColumnRenamed("d", "f").show() //also fails, possibly related
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Apache Spark (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14791549#comment-14791549
 ] 

Apache Spark commented on SPARK-10657:
--

User 'JoshRosen' has created a pull request for this issue:
https://github.com/apache/spark/pull/8793

> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SSH-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-10657) Remove legacy SCP-based Jenkins log archiving code

2015-09-16 Thread Apache Spark (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-10657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-10657:


Assignee: Apache Spark  (was: Josh Rosen)

> Remove legacy SCP-based Jenkins log archiving code
> --
>
> Key: SPARK-10657
> URL: https://issues.apache.org/jira/browse/SPARK-10657
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Reporter: Josh Rosen
>Assignee: Apache Spark
>
> As of https://issues.apache.org/jira/browse/SPARK-7561, we no longer need to 
> use our custom SCP-based mechanism for archiving Jenkins logs on the master 
> machine; this has been superseded by the use of a Jenkins plugin which 
> archives the logs and provides public viewing of them.
> We should remove the legacy log syncing code, since this is a blocker to 
> disabling Worker -> Master SSH on Jenkins.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

< 1 2 3 >

101 - 200 of 223 matches

Mail list logo