[jira] [Resolved] (SPARK-3996) Shade Jetty in Spark deliverables

2015-02-01 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3996.

Resolution: Fixed

I've merged a new patch so closing this for now.

> Shade Jetty in Spark deliverables
> -
>
> Key: SPARK-3996
> URL: https://issues.apache.org/jira/browse/SPARK-3996
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Mingyu Kim
>Assignee: Patrick Wendell
> Fix For: 1.3.0
>
>
> We'd like to use Spark in a Jetty 9 server, and it's causing a version 
> conflict. Given that Spark's dependency on Jetty is light, it'd be a good 
> idea to shade this dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5197) Support external shuffle service in fine-grained mode on mesos cluster

2015-01-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5197?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5197:
---
Fix Version/s: (was: 1.3.0)

> Support external shuffle service in fine-grained mode on mesos cluster
> --
>
> Key: SPARK-5197
> URL: https://issues.apache.org/jira/browse/SPARK-5197
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Mesos, Shuffle
>Reporter: Jongyoul Lee
>
> I think dynamic allocation is almost satisfied on mesos' fine-grained mode, 
> which already offers resources dynamically, and returns automatically when a 
> task is finished. It, however, doesn't have a mechanism on support external 
> shuffle service like yarn's way, which is AuxiliaryService. Because mesos 
> doesn't support AusiliaryService, we think a different way to do this.
> - Launching a shuffle service like a spark job on same cluster
> -- Pros
> --- Support multi-tenant environment
> --- Almost same way like yarn
> -- Cons
> --- Control long running 'background' job - service - when mesos runs
> --- Satisfy all slave - or host - to have one shuffle service all the time
> - Launching jobs within shuffle service
> -- Pros
> --- Easy to implement
> --- Don't consider whether shuffle service exists or not.
> -- Cons
> --- exists multiple shuffle services under multi-tenant environment
> --- Control shuffle service port dynamically on multi-user environment
> In my opinion, the first one is better idea to support external shuffle 
> service. Please leave comments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-01-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1517:
---
Target Version/s: 1.4.0

> Publish nightly snapshots of documentation, maven artifacts, and binary builds
> --
>
> Key: SPARK-1517
> URL: https://issues.apache.org/jira/browse/SPARK-1517
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>        Reporter: Patrick Wendell
>Priority: Blocker
>
> Should be pretty easy to do with Jenkins. The only thing I can think of that 
> would be tricky is to set up credentials so that jenkins can publish this 
> stuff somewhere on apache infra.
> Ideally we don't want to have to put a private key on every jenkins box 
> (since they are otherwise pretty stateless). One idea is to encrypt these 
> credentials with a passphrase and post them somewhere publicly visible. Then 
> the jenkins build can download the credentials provided we set a passphrase 
> in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-1517) Publish nightly snapshots of documentation, maven artifacts, and binary builds

2015-01-30 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-1517:
---
Target Version/s:   (was: 1.3.0)

> Publish nightly snapshots of documentation, maven artifacts, and binary builds
> --
>
> Key: SPARK-1517
> URL: https://issues.apache.org/jira/browse/SPARK-1517
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Project Infra
>        Reporter: Patrick Wendell
>Priority: Blocker
>
> Should be pretty easy to do with Jenkins. The only thing I can think of that 
> would be tricky is to set up credentials so that jenkins can publish this 
> stuff somewhere on apache infra.
> Ideally we don't want to have to put a private key on every jenkins box 
> (since they are otherwise pretty stateless). One idea is to encrypt these 
> credentials with a passphrase and post them somewhere publicly visible. Then 
> the jenkins build can download the credentials provided we set a passphrase 
> in an environment variable in jenkins. There may be simpler solutions as well.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-01-29 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14298075#comment-14298075
 ] 

Patrick Wendell commented on SPARK-5492:


/cc [~sandyr]

> Thread statistics can break with older Hadoop versions
> --
>
> Key: SPARK-5492
> URL: https://issues.apache.org/jira/browse/SPARK-5492
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>    Reporter: Patrick Wendell
>Priority: Blocker
>
> {code}
>  java.lang.ClassNotFoundException: 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
> at scala.Option.orElse(Option.scala:257)
> {code}
> I think the issue is we need to catch ClassNotFoundException here:
> https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144
> However, I'm really confused how this didn't fail our unit tests, since we 
> explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5492:
---
Priority: Blocker  (was: Major)

> Thread statistics can break with older Hadoop versions
> --
>
> Key: SPARK-5492
> URL: https://issues.apache.org/jira/browse/SPARK-5492
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>        Reporter: Patrick Wendell
>Priority: Blocker
>
> {code}
>  java.lang.ClassNotFoundException: 
> org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:191)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
> at 
> org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
> at scala.Option.orElse(Option.scala:257)
> {code}
> I think the issue is we need to catch ClassNotFoundException here:
> https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144
> However, I'm really confused how this didn't fail our unit tests, since we 
> explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-5492) Thread statistics can break with older Hadoop versions

2015-01-29 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5492:
--

 Summary: Thread statistics can break with older Hadoop versions
 Key: SPARK-5492
 URL: https://issues.apache.org/jira/browse/SPARK-5492
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Reporter: Patrick Wendell


{code}
 java.lang.ClassNotFoundException: 
org.apache.hadoop.fs.FileSystem$Statistics$StatisticsData
at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:191)
at 
org.apache.spark.deploy.SparkHadoopUtil.getFileSystemThreadStatisticsMethod(SparkHadoopUtil.scala:180)
at 
org.apache.spark.deploy.SparkHadoopUtil.getFSBytesReadOnThreadCallback(SparkHadoopUtil.scala:139)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:120)
at 
org.apache.spark.rdd.NewHadoopRDD$$anon$1$$anonfun$2.apply(NewHadoopRDD.scala:118)
at scala.Option.orElse(Option.scala:257)
{code}

I think the issue is we need to catch ClassNotFoundException here:
https://github.com/apache/spark/blob/b1b35ca2e440df40b253bf967bb93705d355c1c0/core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala#L144

However, I'm really confused how this didn't fail our unit tests, since we 
explicitly tried to test this.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-3996) Shade Jetty in Spark deliverables

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-3996:


This was causing compiler failures in the master build, so I reverted it. I 
think it's the same issue we had with the guava patch, so I just need to go and 
add explicit dependencies.

> Shade Jetty in Spark deliverables
> -
>
> Key: SPARK-3996
> URL: https://issues.apache.org/jira/browse/SPARK-3996
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Mingyu Kim
>Assignee: Patrick Wendell
> Fix For: 1.3.0
>
>
> We'd like to use Spark in a Jetty 9 server, and it's causing a version 
> conflict. Given that Spark's dependency on Jetty is light, it'd be a good 
> idea to shade this dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3778) newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3778:
---
Priority: Critical  (was: Major)

> newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
> -
>
> Key: SPARK-3778
> URL: https://issues.apache.org/jira/browse/SPARK-3778
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>Priority: Critical
>
> The newAPIHadoopRDD routine doesn't properly add the credentials to the conf 
> to be able to access secure hdfs.
> Note that newAPIHadoopFile does handle these because the 
> org.apache.hadoop.mapreduce.Job automatically adds it for you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-3778) newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3778?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-3778:
---
Target Version/s: 1.3.0  (was: 1.1.1, 1.2.0)

> newAPIHadoopRDD doesn't properly pass credentials for secure hdfs on yarn
> -
>
> Key: SPARK-3778
> URL: https://issues.apache.org/jira/browse/SPARK-3778
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Thomas Graves
>Assignee: Thomas Graves
>
> The newAPIHadoopRDD routine doesn't properly add the credentials to the conf 
> to be able to access secure hdfs.
> Note that newAPIHadoopFile does handle these because the 
> org.apache.hadoop.mapreduce.Job automatically adds it for you.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-3996) Shade Jetty in Spark deliverables

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3996.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Patrick Wendell  (was: Matthew Cheah)

Okay we merged this into master, let's see how it goes.

> Shade Jetty in Spark deliverables
> -
>
> Key: SPARK-3996
> URL: https://issues.apache.org/jira/browse/SPARK-3996
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 1.0.2, 1.1.0
>Reporter: Mingyu Kim
>Assignee: Patrick Wendell
> Fix For: 1.3.0
>
>
> We'd like to use Spark in a Jetty 9 server, and it's causing a version 
> conflict. Given that Spark's dependency on Jetty is light, it'd be a good 
> idea to shade this dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5466) Build Error caused by Guava shading in Spark

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5466.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Marcelo Vanzin

Thanks [~vanzin] for quickly fixing this!

> Build Error caused by Guava shading in Spark
> 
>
> Key: SPARK-5466
> URL: https://issues.apache.org/jira/browse/SPARK-5466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Jian Zhou
>Assignee: Marcelo Vanzin
>Priority: Blocker
> Fix For: 1.3.0
>
>
> Guava is shaded inside spark-core itself.
> https://github.com/apache/spark/commit/37a5e272f898e946c09c2e7de5d1bda6f27a8f39
> This causes build error in multiple components, including Graph/MLLib/SQL, 
> when package com.google.common on the classpath incompatible with the version 
> used when compiling Utils.class
> [error] bad symbolic reference. A signature in Utils.class refers to term util
> [error] in package com.google.common which is not available.
> [error] It may be completely missing from the current classpath, or the 
> version on
> [error] the classpath might be incompatible with the version used when 
> compiling Utils.class.
> [error] 
> [error]  while compiling: 
> /spark/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
> [error] during phase: erasure
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5466) Build Error caused by Guava shading in Spark

2015-01-29 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296573#comment-14296573
 ] 

Patrick Wendell commented on SPARK-5466:


Okay Maven is reproducing this now even without zinc: 
https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-Maven-pre-YARN/1393/hadoop.version=1.0.4,label=centos/console

It appears that it might only be happening with certain parameterizations of 
the build:

{code}
[ERROR] bad symbolic reference. A signature in Utils.class refers to term util
in package com.google.common which is not available.
It may be completely missing from the current classpath, or the version on
the classpath might be incompatible with the version used when compiling 
Utils.class.
[ERROR] 
 while compiling: 
/home/jenkins/workspace/Spark-Master-Maven-pre-YARN/hadoop.version/1.0.4/label/centos/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
{code}

> Build Error caused by Guava shading in Spark
> 
>
> Key: SPARK-5466
> URL: https://issues.apache.org/jira/browse/SPARK-5466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Jian Zhou
>Priority: Blocker
>
> Guava is shaded inside spark-core itself.
> https://github.com/apache/spark/commit/37a5e272f898e946c09c2e7de5d1bda6f27a8f39
> This causes build error in multiple components, including Graph/MLLib/SQL, 
> when package com.google.common on the classpath incompatible with the version 
> used when compiling Utils.class
> [error] bad symbolic reference. A signature in Utils.class refers to term util
> [error] in package com.google.common which is not available.
> [error] It may be completely missing from the current classpath, or the 
> version on
> [error] the classpath might be incompatible with the version used when 
> compiling Utils.class.
> [error] 
> [error]  while compiling: 
> /spark/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
> [error] during phase: erasure
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4923) Add Developer API to REPL to allow re-publishing the REPL jar

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4923:
---
Fix Version/s: 1.3.0

> Add Developer API to REPL to allow re-publishing the REPL jar
> -
>
> Key: SPARK-4923
> URL: https://issues.apache.org/jira/browse/SPARK-4923
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Shell
>Affects Versions: 1.2.0
>Reporter: Peng Cheng
>Assignee: Chip Senkbeil
>Priority: Critical
>  Labels: shell
> Fix For: 1.3.0
>
> Attachments: 
> SPARK-4923__Maven_build_should_keep_publishing_spark-repl.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Spark-repl installation and deployment has been discontinued (see 
> SPARK-3452). But its in the dependency list of a few projects that extends 
> its initialization process.
> Please remove the 'skip' setting in spark-repl and make it an 'official' API 
> to encourage more platform to integrate with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4114) Use stable Hive API (if one exists) for communication with Metastore

2015-01-29 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296562#comment-14296562
 ] 

Patrick Wendell commented on SPARK-4114:


Dropping target version 1.3 becasue we decided this was too high cost. 
Basically there is no good way to integrate with the hive metastore service 
compatibly across versions.

> Use stable Hive API (if one exists) for communication with Metastore
> 
>
> Key: SPARK-4114
> URL: https://issues.apache.org/jira/browse/SPARK-4114
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>    Reporter: Patrick Wendell
>Priority: Blocker
>
> If one exists, we should use a stable API for our communication with the Hive 
> metastore. Specifically, we don't want to have to support compiling against 
> multiple versions of the Hive library to support users with different 
> versions of the Hive metastore.
> I think this is what HCatalog API's are intended for, but I don't know enough 
> about Hive and HCatalog to be sure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4628) Put external projects and examples behind a build flag

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4628:
---
Priority: Major  (was: Blocker)

> Put external projects and examples behind a build flag
> --
>
> Key: SPARK-4628
> URL: https://issues.apache.org/jira/browse/SPARK-4628
> Project: Spark
>  Issue Type: Improvement
>    Reporter: Patrick Wendell
>
> This is something we talked about doing for convenience, but I'm escalating 
> this based on realizing today that some of our external projects depend on 
> code that is not in maven central. I.e. if one of these dependencies is taken 
> down (as happened recently with mqtt), all Spark builds will fail.
> The proposal here is simple, have a profile -Pexternal-projects that enables 
> these. This can follow the exact pattern of -Pkinesis-asl which was disabled 
> by default due to a license issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4114) Use stable Hive API (if one exists) for communication with Metastore

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4114:
---
Target Version/s:   (was: 1.3.0)

> Use stable Hive API (if one exists) for communication with Metastore
> 
>
> Key: SPARK-4114
> URL: https://issues.apache.org/jira/browse/SPARK-4114
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>        Reporter: Patrick Wendell
>Priority: Blocker
>
> If one exists, we should use a stable API for our communication with the Hive 
> metastore. Specifically, we don't want to have to support compiling against 
> multiple versions of the Hive library to support users with different 
> versions of the Hive metastore.
> I think this is what HCatalog API's are intended for, but I don't know enough 
> about Hive and HCatalog to be sure.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4628) Put external projects and examples behind a build flag

2015-01-29 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4628:
---
Target Version/s:   (was: 1.3.0)

> Put external projects and examples behind a build flag
> --
>
> Key: SPARK-4628
> URL: https://issues.apache.org/jira/browse/SPARK-4628
> Project: Spark
>  Issue Type: Improvement
>    Reporter: Patrick Wendell
>Priority: Blocker
>
> This is something we talked about doing for convenience, but I'm escalating 
> this based on realizing today that some of our external projects depend on 
> code that is not in maven central. I.e. if one of these dependencies is taken 
> down (as happened recently with mqtt), all Spark builds will fail.
> The proposal here is simple, have a profile -Pexternal-projects that enables 
> these. This can follow the exact pattern of -Pkinesis-asl which was disabled 
> by default due to a license issue.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2476) Have sbt-assembly include runtime dependencies in jar

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2476.

Resolution: Not a Problem

[~srowen] Nope, I think we found a workaround.

> Have sbt-assembly include runtime dependencies in jar
> -
>
> Key: SPARK-2476
> URL: https://issues.apache.org/jira/browse/SPARK-2476
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>        Reporter: Patrick Wendell
>Assignee: Prashant Sharma
>Priority: Minor
>
> If possible, we should try to contribute the ability to include 
> runtime-scoped dependencies in the assembly jar created with sbt-assembly.
> Currently in only reads compile-scoped dependencies:
> https://github.com/sbt/sbt-assembly/blob/master/src/main/scala/sbtassembly/Plugin.scala#L495



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-2487) Follow up from SBT build refactor (i.e. SPARK-1776)

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-2487.

Resolution: Fixed

> Follow up from SBT build refactor (i.e. SPARK-1776)
> ---
>
> Key: SPARK-2487
> URL: https://issues.apache.org/jira/browse/SPARK-2487
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>        Reporter: Patrick Wendell
>
> This is to track follw up issues relating to SPARK-1776, which was a major 
> re-factoring of the SBT build in Spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5466) Build Error caused by Guava shading in Spark

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5466:
---
Component/s: Build

> Build Error caused by Guava shading in Spark
> 
>
> Key: SPARK-5466
> URL: https://issues.apache.org/jira/browse/SPARK-5466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Jian Zhou
>Priority: Blocker
>
> Guava is shaded inside spark-core itself.
> https://github.com/apache/spark/commit/37a5e272f898e946c09c2e7de5d1bda6f27a8f39
> This causes build error in multiple components, including Graph/MLLib/SQL, 
> when package com.google.common on the classpath incompatible with the version 
> used when compiling Utils.class
> [error] bad symbolic reference. A signature in Utils.class refers to term util
> [error] in package com.google.common which is not available.
> [error] It may be completely missing from the current classpath, or the 
> version on
> [error] the classpath might be incompatible with the version used when 
> compiling Utils.class.
> [error] 
> [error]  while compiling: 
> /spark/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
> [error] during phase: erasure
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5466) Build Error caused by Guava shading in Spark

2015-01-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296483#comment-14296483
 ] 

Patrick Wendell commented on SPARK-5466:


Also - [~srowen] can you reproduce this if you do not use Zinc?

> Build Error caused by Guava shading in Spark
> 
>
> Key: SPARK-5466
> URL: https://issues.apache.org/jira/browse/SPARK-5466
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Jian Zhou
>Priority: Blocker
>
> Guava is shaded inside spark-core itself.
> https://github.com/apache/spark/commit/37a5e272f898e946c09c2e7de5d1bda6f27a8f39
> This causes build error in multiple components, including Graph/MLLib/SQL, 
> when package com.google.common on the classpath incompatible with the version 
> used when compiling Utils.class
> [error] bad symbolic reference. A signature in Utils.class refers to term util
> [error] in package com.google.common which is not available.
> [error] It may be completely missing from the current classpath, or the 
> version on
> [error] the classpath might be incompatible with the version used when 
> compiling Utils.class.
> [error] 
> [error]  while compiling: 
> /spark/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
> [error] during phase: erasure
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5466) Build Error caused by Guava shading in Spark

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5466:
---
Priority: Blocker  (was: Major)

> Build Error caused by Guava shading in Spark
> 
>
> Key: SPARK-5466
> URL: https://issues.apache.org/jira/browse/SPARK-5466
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Jian Zhou
>Priority: Blocker
>
> Guava is shaded inside spark-core itself.
> https://github.com/apache/spark/commit/37a5e272f898e946c09c2e7de5d1bda6f27a8f39
> This causes build error in multiple components, including Graph/MLLib/SQL, 
> when package com.google.common on the classpath incompatible with the version 
> used when compiling Utils.class
> [error] bad symbolic reference. A signature in Utils.class refers to term util
> [error] in package com.google.common which is not available.
> [error] It may be completely missing from the current classpath, or the 
> version on
> [error] the classpath might be incompatible with the version used when 
> compiling Utils.class.
> [error] 
> [error]  while compiling: 
> /spark/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
> [error] during phase: erasure
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5466) Build Error caused by Guava shading in Spark

2015-01-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296482#comment-14296482
 ] 

Patrick Wendell commented on SPARK-5466:


I sent [~vanzin] and e-mail today about this. Guess I'm not the only one seeing 
it. I was using zinc on OSX... are you guys using that too? I set up a zinc 
maven build on Jenkins and it worked just fine.

> Build Error caused by Guava shading in Spark
> 
>
> Key: SPARK-5466
> URL: https://issues.apache.org/jira/browse/SPARK-5466
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.3.0
>Reporter: Jian Zhou
>
> Guava is shaded inside spark-core itself.
> https://github.com/apache/spark/commit/37a5e272f898e946c09c2e7de5d1bda6f27a8f39
> This causes build error in multiple components, including Graph/MLLib/SQL, 
> when package com.google.common on the classpath incompatible with the version 
> used when compiling Utils.class
> [error] bad symbolic reference. A signature in Utils.class refers to term util
> [error] in package com.google.common which is not available.
> [error] It may be completely missing from the current classpath, or the 
> version on
> [error] the classpath might be incompatible with the version used when 
> compiling Utils.class.
> [error] 
> [error]  while compiling: 
> /spark/graphx/src/main/scala/org/apache/spark/graphx/util/BytecodeUtils.scala
> [error] during phase: erasure
> [error]  library version: version 2.10.4
> [error] compiler version: version 2.10.4



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2015-01-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296479#comment-14296479
 ] 

Patrick Wendell edited comment on SPARK-4049 at 1/29/15 6:58 AM:
-

[~skrasser] Yes - I agree that behavior is just confusing. One idea would be to 
have a "bit map" so to speak where you can't be 100% unless you have every 
partition cached. And you can never go over 100%.


was (Author: pwendell):
[~skrasser] Yes - I agree that behavior is just confusing. One idea would be to 
have a "bit map" so to speak where you can be 100% unless you have every 
partition cached. And you can never go over 100%.

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4049) Storage web UI "fraction cached" shows as > 100%

2015-01-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4049?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14296479#comment-14296479
 ] 

Patrick Wendell commented on SPARK-4049:


[~skrasser] Yes - I agree that behavior is just confusing. One idea would be to 
have a "bit map" so to speak where you can be 100% unless you have every 
partition cached. And you can never go over 100%.

> Storage web UI "fraction cached" shows as > 100%
> 
>
> Key: SPARK-4049
> URL: https://issues.apache.org/jira/browse/SPARK-4049
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Priority: Minor
>
> In the Storage tab of the Spark Web UI, I saw a case where the "Fraction 
> Cached" was greater than 100%:
> !http://i.imgur.com/Gm2hEeL.png!



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5471) java.lang.NumberFormatException: For input string:

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5471.

Resolution: Not a Problem

Resolving per your own comment.

>  java.lang.NumberFormatException: For input string: 
> 
>
> Key: SPARK-5471
> URL: https://issues.apache.org/jira/browse/SPARK-5471
> Project: Spark
>  Issue Type: New Feature
>Affects Versions: 1.2.0
> Environment: Spark 1.2.0 Maven 
>Reporter: DeepakVohra
>
> Naive Bayes Classifier generates exception with sample_naive_bayes_data.txt
> java.lang.NumberFormatException: For input string: "0,1"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)
>   at scala.collection.immutable.StringOps.toDouble(StringOps.scala:31)
>   at 
> org.apache.spark.mllib.util.MLUtils$$anonfun$4.apply(MLUtils.scala:79)
>   at 
> org.apache.spark.mllib.util.MLUtils$$anonfun$4.apply(MLUtils.scala:77)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
>   at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/01/28 21:13:57 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 
> localhost): java.lang.NumberFormatException: For input string: "0,1"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.java:1250)
>   at java.lang.Double.parseDouble(Double.java:540)
>   at 
> scala.collection.immutable.StringLike$class.toDouble(StringLike.scala:232)
>   at scala.collection.immutable.StringOps.toDouble(StringOps.scala:31)
>   at 
> org.apache.spark.mllib.util.MLUtils$$anonfun$4.apply(MLUtils.scala:79)
>   at 
> org.apache.spark.mllib.util.MLUtils$$anonfun$4.apply(MLUtils.scala:77)
>   at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
>   at 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:249)
>   at 
> org.apache.spark.CacheManager.putInBlockManager(CacheManager.scala:163)
>   at org.apache.spark.CacheManager.getOrCompute(CacheManager.scala:70)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:228)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:263)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:230)
>   at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:61)
>   at org.apache.spark.scheduler.Task.run(Task.scala:56)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:196)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> 15/01/28 21:13:57 ERROR TaskSetManager: Task 0 in stage 0.0 failed 1 times; 
> aborting job
> 15/01/28 21:13:57 INFO TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks 
> have all completed, from pool 
> 15/01/28 21:13:57 INFO TaskSchedulerImpl: Cancelling stage 0
> 15/01/28 21:13:57 INFO DAGScheduler: Job 0 failed: reduce at 
> MLUtils.scala:96, took 1.180869 s
> Exception in thread "main" org.apache.spark.SparkException: Job aborted due 
> to stage failure: Task 0 in stage 0.0 failed 1 times, most recent failure: 
> Lost task 0.0 in stage 0.0 (TID 0, localhost): 
> java.lang.NumberFormatException: For input string: "0,1"
>   at 
> sun.misc.FloatingDecimal.readJavaFormatString(FloatingDecimal.ja

Re: spark akka fork : is the source anywhere?

2015-01-28 Thread Patrick Wendell
It's maintained here:

https://github.com/pwendell/akka/tree/2.2.3-shaded-proto

Over time, this is something that would be great to get rid of, per rxin

On Wed, Jan 28, 2015 at 3:33 PM, Reynold Xin  wrote:
> Hopefully problems like this will go away entirely in the next couple of
> releases. https://issues.apache.org/jira/browse/SPARK-5293
>
>
>
> On Wed, Jan 28, 2015 at 3:12 PM, jay vyas 
> wrote:
>
>> Hi spark. Where is akka coming from in spark ?
>>
>> I see the distribution referenced is a spark artifact... but not in the
>> apache namespace.
>>
>>  org.spark-project.akka
>>  2.3.4-spark
>>
>> Clearly this is a deliberate thought out change (See SPARK-1812), but its
>> not clear where 2.3.4 spark is coming from and who is maintaining its
>> release?
>>
>> --
>> jay vyas
>>
>> PS
>>
>> I've had some conversations with will benton as well about this, and its
>> clear that some modifications to akka are needed, or else a protobug error
>> occurs, which amount to serialization incompatibilities, hence if one wants
>> to build spark from sources, the patched akka is required (or else, manual
>> patching needs to be done)...
>>
>> 15/01/28 22:58:10 ERROR ActorSystemImpl: Uncaught fatal error from thread
>> [sparkWorker-akka.remote.default-remote-dispatcher-6] shutting down
>> ActorSystem [sparkWorker] java.lang.VerifyError: class
>> akka.remote.WireFormats$AkkaControlMessage overrides final method
>> getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-1934) "this" reference escape to "selectorThread" during construction in ConnectionManager

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-1934?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-1934.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Sean Owen

> "this" reference escape to "selectorThread" during construction in 
> ConnectionManager
> 
>
> Key: SPARK-1934
> URL: https://issues.apache.org/jira/browse/SPARK-1934
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Reporter: Shixiong Zhu
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 1.3.0
>
>
> `selectorThread` starts in the construction of 
> `org.apache.spark.network.ConnectionManager`, which may cause 
> `writeRunnableStarted` and `readRunnableStarted` are uninitialized before 
> them are used.
> Indirectly, `BlockManager.this` also escape since it calls `new 
> ConnectionManager(...)` and will be used in some threads of 
> `ConnectionManager`. Some threads may view an uninitialized `BlockManager`.
> In summary, it's dangerous and hard to analyse the correctness of 
> concurrency. Such escape should be avoided.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5188) make-distribution.sh should support curl, not only wget to get Tachyon

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5188:
---
Fix Version/s: 1.3.0

> make-distribution.sh should support curl, not only wget to get Tachyon
> --
>
> Key: SPARK-5188
> URL: https://issues.apache.org/jira/browse/SPARK-5188
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Fix For: 1.3.0
>
>
> When we use `make-distribution.sh` with `--with-tachyon` option, Tachyon will 
> be downloaded by `wget` command but some systems don't have `wget` by default 
> (MacOS X doesn't have).
> Other scripts like build/mvn, build/sbt support not only `wget` but also 
> `curl` so `make-distribution.sh` should support `curl` too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5188) make-distribution.sh should support curl, not only wget to get Tachyon

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5188.

Resolution: Fixed
  Assignee: Kousuke Saruta

> make-distribution.sh should support curl, not only wget to get Tachyon
> --
>
> Key: SPARK-5188
> URL: https://issues.apache.org/jira/browse/SPARK-5188
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>
> When we use `make-distribution.sh` with `--with-tachyon` option, Tachyon will 
> be downloaded by `wget` command but some systems don't have `wget` by default 
> (MacOS X doesn't have).
> Other scripts like build/mvn, build/sbt support not only `wget` but also 
> `curl` so `make-distribution.sh` should support `curl` too.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5458) Refer to aggregateByKey instead of combineByKey in docs

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5458.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Sandy Ryza

> Refer to aggregateByKey instead of combineByKey in docs
> ---
>
> Key: SPARK-5458
> URL: https://issues.apache.org/jira/browse/SPARK-5458
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
>Priority: Trivial
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Yes - it fixes that issue.

On Wed, Jan 28, 2015 at 2:17 AM, Aniket  wrote:
> Hi Patrick,
>
> I am wondering if this version will address issues around certain artifacts
> not getting published in 1.2 which are gating people to migrate to 1.2. One
> such issue is https://issues.apache.org/jira/browse/SPARK-5144
>
> Thanks,
> Aniket
>
> On Wed Jan 28 2015 at 15:39:43 Patrick Wendell [via Apache Spark Developers
> List]  wrote:
>
>> Minor typo in the above e-mail - the tag is named v1.2.1-rc2 (not
>> v1.2.1-rc1).
>>
>> On Wed, Jan 28, 2015 at 2:06 AM, Patrick Wendell <[hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=10318&i=0>> wrote:
>>
>> > Please vote on releasing the following candidate as Apache Spark version
>> 1.2.1!
>> >
>> > The tag to be voted on is v1.2.1-rc1 (commit b77f876):
>> >
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b77f87673d1f9f03d4c83cf583158227c551359b
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://people.apache.org/~pwendell/spark-1.2.1-rc2/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1062/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-1.2.1-rc2-docs/
>> >
>> > Changes from rc1:
>> > This has no code changes from RC1. Only minor changes to the release
>> script.
>> >
>> > Please vote on releasing this package as Apache Spark 1.2.1!
>> >
>> > The vote is open until  Saturday, January 31, at 10:04 UTC and passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 1.2.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > For a list of fixes in this release, see http://s.apache.org/Mpn.
>> >
>> > To learn more about Apache Spark, please see
>> > http://spark.apache.org/
>>
>> -
>> To unsubscribe, e-mail: [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=10318&i=1>
>> For additional commands, e-mail: [hidden email]
>> <http:///user/SendEmail.jtp?type=node&node=10318&i=2>
>>
>>
>>
>> --
>>  If you reply to this email, your message will be added to the discussion
>> below:
>>
>> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-1-RC2-tp10317p10318.html
>>  To start a new topic under Apache Spark Developers List, email
>> ml-node+s1001551n1...@n3.nabble.com
>> To unsubscribe from Apache Spark Developers List, click here
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=1&code=YW5pa2V0LmJoYXRuYWdhckBnbWFpbC5jb218MXwxMzE3NTAzMzQz>
>> .
>> NAML
>> <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer&id=instant_html%21nabble%3Aemail.naml&base=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace&breadcrumbs=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml>
>>
>
>
>
>
> --
> View this message in context: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/VOTE-Release-Apache-Spark-1-2-1-RC2-tp10317p10320.html
> Sent from the Apache Spark Developers List mailing list archive at Nabble.com.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-5144) spark-yarn module should be published

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5144.

Resolution: Duplicate

> spark-yarn module should be published
> -
>
> Key: SPARK-5144
> URL: https://issues.apache.org/jira/browse/SPARK-5144
> Project: Spark
>  Issue Type: Bug
>  Components: YARN
>Affects Versions: 1.2.0
>Reporter: Aniket Bhatnagar
>
> We disabled publishing of certain modules in SPARK-3452. One of such modules 
> is spark-yarn. This breaks applications that submit spark jobs 
> programatically with master set as yarn-client. This is because SparkContext 
> is dependent on classes from yarn-client module to submit the YARN 
> application. 
> Here is the stack trace that you get if you submit the spark job without 
> yarn-client dependency:
> 2015-01-07 14:39:22,799 [pool-10-thread-13] [info] o.a.s.s.MemoryStore - 
> MemoryStore started with capacity 731.7 MB
> Exception in thread "pool-10-thread-13" java.lang.ExceptionInInitializerError
> at org.apache.spark.util.Utils$.getSparkOrYarnConfig(Utils.scala:1784)
> at org.apache.spark.storage.BlockManager.(BlockManager.scala:105)
> at org.apache.spark.storage.BlockManager.(BlockManager.scala:180)
> at org.apache.spark.SparkEnv$.create(SparkEnv.scala:292)
> at org.apache.spark.SparkEnv$.createDriverEnv(SparkEnv.scala:159)
> at org.apache.spark.SparkContext.(SparkContext.scala:232)
> at com.myimpl.Server:23)
> at scala.util.Success$$anonfun$map$1.apply(Try.scala:236)
> at scala.util.Try$.apply(Try.scala:191)
> at scala.util.Success.map(Try.scala:236)
> at com.myimpl.FutureTry$$anonfun$1.apply(FutureTry.scala:23)
> at com.myimpl.FutureTry$$anonfun$1.apply(FutureTry.scala:23)
> at scala.util.Success$$anonfun$map$1.apply(Try.scala:236)
> at scala.util.Try$.apply(Try.scala:191)
> at scala.util.Success.map(Try.scala:236)
> at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
> at scala.concurrent.Future$$anonfun$map$1.apply(Future.scala:235)
> at scala.concurrent.impl.CallbackRunnable.run(Promise.scala:32)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
> Caused by: org.apache.spark.SparkException: Unable to load YARN support
> at 
> org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:199)
> at org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala:194)
> at org.apache.spark.deploy.SparkHadoopUtil$.(SparkHadoopUtil.scala)
> ... 27 more
> Caused by: java.lang.ClassNotFoundException: 
> org.apache.spark.deploy.yarn.YarnSparkHadoopUtil
> at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
> at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
> at java.security.AccessController.doPrivileged(Native Method)
> at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
> at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
> at java.lang.Class.forName0(Native Method)
> at java.lang.Class.forName(Class.java:190)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$.liftedTree1$1(SparkHadoopUtil.scala:195)
> ... 29 more



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5415) Upgrade sbt to 0.13.7

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5415?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5415.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Ryan Williams

> Upgrade sbt to 0.13.7
> -
>
> Key: SPARK-5415
> URL: https://issues.apache.org/jira/browse/SPARK-5415
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Ryan Williams
>Assignee: Ryan Williams
>Priority: Minor
> Fix For: 1.3.0
>
>
> Spark currently uses sbt {{0.13.6}}, which has a regression related to 
> processing parent POM's in Maven projects.
> {{0.13.7}} does not have this issue (though it's unclear whether it was fixed 
> intentionally), so I'd like to bump up one version.
> I ran into this while locally building a Spark assembly against a 
> locally-built "metrics" JAR dependency; {{0.13.6}} could not build Spark but 
> {{0.13.7}} worked fine. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5341) Support maven coordinates in spark-shell and spark-submit

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5341:
---
Priority: Critical  (was: Major)

> Support maven coordinates in spark-shell and spark-submit
> -
>
> Key: SPARK-5341
> URL: https://issues.apache.org/jira/browse/SPARK-5341
> Project: Spark
>  Issue Type: New Feature
>  Components: Deploy, Spark Shell
>Reporter: Burak Yavuz
>Priority: Critical
>
> This feature will allow users to provide the maven coordinates of jars they 
> wish to use in their spark application. Coordinates can be a comma-delimited 
> list and be supplied like:
> ```spark-submit --maven org.apache.example.a,org.apache.example.b```
> This feature will also be added to spark-shell (where it is more critical to 
> have this feature)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Minor typo in the above e-mail - the tag is named v1.2.1-rc2 (not v1.2.1-rc1).

On Wed, Jan 28, 2015 at 2:06 AM, Patrick Wendell  wrote:
> Please vote on releasing the following candidate as Apache Spark version 
> 1.2.1!
>
> The tag to be voted on is v1.2.1-rc1 (commit b77f876):
> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b77f87673d1f9f03d4c83cf583158227c551359b
>
> The release files, including signatures, digests, etc. can be found at:
> http://people.apache.org/~pwendell/spark-1.2.1-rc2/
>
> Release artifacts are signed with the following key:
> https://people.apache.org/keys/committer/pwendell.asc
>
> The staging repository for this release can be found at:
> https://repository.apache.org/content/repositories/orgapachespark-1062/
>
> The documentation corresponding to this release can be found at:
> http://people.apache.org/~pwendell/spark-1.2.1-rc2-docs/
>
> Changes from rc1:
> This has no code changes from RC1. Only minor changes to the release script.
>
> Please vote on releasing this package as Apache Spark 1.2.1!
>
> The vote is open until  Saturday, January 31, at 10:04 UTC and passes
> if a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Spark 1.2.1
> [ ] -1 Do not release this package because ...
>
> For a list of fixes in this release, see http://s.apache.org/Mpn.
>
> To learn more about Apache Spark, please see
> http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.2.1 (RC2)

2015-01-28 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1!

The tag to be voted on is v1.2.1-rc1 (commit b77f876):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=b77f87673d1f9f03d4c83cf583158227c551359b

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.2.1-rc2/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1062/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.2.1-rc2-docs/

Changes from rc1:
This has no code changes from RC1. Only minor changes to the release script.

Please vote on releasing this package as Apache Spark 1.2.1!

The vote is open until  Saturday, January 31, at 10:04 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.2.1
[ ] -1 Do not release this package because ...

For a list of fixes in this release, see http://s.apache.org/Mpn.

To learn more about Apache Spark, please see
http://spark.apache.org/

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[RESULT] [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-28 Thread Patrick Wendell
This vote is cancelled in favor of RC2.

On Tue, Jan 27, 2015 at 4:20 PM, Reynold Xin  wrote:
> +1
>
> Tested on Mac OS X
>
> On Tue, Jan 27, 2015 at 12:35 PM, Krishna Sankar 
> wrote:
>>
>> +1
>> 1. Compiled OSX 10.10 (Yosemite) OK Total time: 12:55 min
>>  mvn clean package -Pyarn -Dyarn.version=2.6.0 -Phadoop-2.4
>> -Dhadoop.version=2.6.0 -Phive -DskipTests
>> 2. Tested pyspark, mlib - running as well as compare results with 1.1.x &
>> 1.2.0
>> 2.1. statistics OK
>> 2.2. Linear/Ridge/Laso Regression OK
>> 2.3. Decision Tree, Naive Bayes OK
>> 2.4. KMeans OK
>>Center And Scale OK
>>Fixed : org.apache.spark.SparkException in zip !
>> 2.5. rdd operations OK
>>State of the Union Texts - MapReduce, Filter,sortByKey (word count)
>> 2.6. recommendation OK
>>
>> Cheers
>> 
>>
>> On Mon, Jan 26, 2015 at 11:02 PM, Patrick Wendell 
>> wrote:
>>
>> > Please vote on releasing the following candidate as Apache Spark version
>> > 1.2.1!
>> >
>> > The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3):
>> >
>> >
>> > https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec
>> >
>> > The release files, including signatures, digests, etc. can be found at:
>> > http://people.apache.org/~pwendell/spark-1.2.1-rc1/
>> >
>> > Release artifacts are signed with the following key:
>> > https://people.apache.org/keys/committer/pwendell.asc
>> >
>> > The staging repository for this release can be found at:
>> > https://repository.apache.org/content/repositories/orgapachespark-1061/
>> >
>> > The documentation corresponding to this release can be found at:
>> > http://people.apache.org/~pwendell/spark-1.2.1-rc1-docs/
>> >
>> > Please vote on releasing this package as Apache Spark 1.2.1!
>> >
>> > The vote is open until Friday, January 30, at 07:00 UTC and passes
>> > if a majority of at least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Spark 1.2.1
>> > [ ] -1 Do not release this package because ...
>> >
>> > For a list of fixes in this release, see http://s.apache.org/Mpn.
>> >
>> > To learn more about Apache Spark, please see
>> > http://spark.apache.org/
>> >
>> > - Patrick
>> >
>> > -
>> > To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> > For additional commands, e-mail: dev-h...@spark.apache.org
>> >
>> >
>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Commented] (SPARK-5428) Declare the 'assembly' module at the bottom of the element in the parent POM

2015-01-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294935#comment-14294935
 ] 

Patrick Wendell commented on SPARK-5428:


[~tzolov] Do you mind explaining a bit more why you want to coerce this to 
occur last? What does this allow you to do?

> Declare the 'assembly' module at the bottom of the  element in the 
> parent POM
> --
>
> Key: SPARK-5428
> URL: https://issues.apache.org/jira/browse/SPARK-5428
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, Deploy
>Reporter: Christian Tzolov
>Priority: Trivial
>  Labels: assembly, maven, pom
>
> For multiple-modules projects, Maven follows those execution order rules:
> http://maven.apache.org/guides/mini/guide-multiple-modules.html
> If no explicit dependencies are declared Maven will follow the order declared 
> in the  element.  
> Because the 'assembly' module is responsible to aggregate build artifacts 
> from other modules/project it make sense to be run last in the execution 
> chain. 
> At the moment the 'assembly' stays before modules like 'examples' which makes 
> it impossible to generate DEP package that contains the examples jar. 
> IMHO the 'assembly' needs to be kept at the bottom of the  list.   



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5420) Cross-langauge load/store functions for creating and saving DataFrames

2015-01-28 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14294928#comment-14294928
 ] 

Patrick Wendell commented on SPARK-5420:


How about just load and store then?

> Cross-langauge load/store functions for creating and saving DataFrames
> --
>
> Key: SPARK-5420
> URL: https://issues.apache.org/jira/browse/SPARK-5420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>    Reporter: Patrick Wendell
>
> We should have standard API's for loading or saving a table from a data 
> store. One idea:
> {code}
> df = sc.loadTable("path.to.DataSource", {"a": "b", "c": "d"})
> sc.storeTable("path.to.DataSouce", {"a":"b", "c":"d"})
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4809) Improve Guava shading in Spark

2015-01-28 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4809.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Marcelo Vanzin

> Improve Guava shading in Spark
> --
>
> Key: SPARK-4809
> URL: https://issues.apache.org/jira/browse/SPARK-4809
> Project: Spark
>  Issue Type: Improvement
>Affects Versions: 1.2.0
>Reporter: Marcelo Vanzin
>Assignee: Marcelo Vanzin
> Fix For: 1.3.0
>
>
> As part of SPARK-2848, we started shading Guava to help with projects that 
> want to use Spark but use an incompatible version of Guava.
> The approach used there is a little sub-optimal, though. It makes it tricky, 
> especially, to run unit tests in your project when those need to use 
> spark-core APIs.
> We should make the shading more transparent so that it's easier to use 
> spark-core, with or without an explicit Guava dependency.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5441) SerDeUtil Pair RDD to python conversion doesn't accept empty RDDs

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5441:
---
Target Version/s: 1.3.0

> SerDeUtil Pair RDD to python conversion doesn't accept empty RDDs
> -
>
> Key: SPARK-5441
> URL: https://issues.apache.org/jira/browse/SPARK-5441
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Michael Nazario
>Assignee: Michael Nazario
>
> SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD rely on rdd.first() 
> which throws an exception if the RDD is empty. We should be able to handle 
> the empty RDD case because this doesn't prevent a valid RDD from being 
> created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5441) SerDeUtil Pair RDD to python conversion doesn't accept empty RDDs

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5441:
---
Assignee: Michael Nazario

> SerDeUtil Pair RDD to python conversion doesn't accept empty RDDs
> -
>
> Key: SPARK-5441
> URL: https://issues.apache.org/jira/browse/SPARK-5441
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.1.1, 1.2.0
>Reporter: Michael Nazario
>Assignee: Michael Nazario
>
> SerDeUtil.pairRDDToPython and SerDeUtil.pythonToPairRDD rely on rdd.first() 
> which throws an exception if the RDD is empty. We should be able to handle 
> the empty RDD case because this doesn't prevent a valid RDD from being 
> created.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Friendly reminder/request to help with reviews!

2015-01-27 Thread Patrick Wendell
Hey All,

Just a reminder, as always around release time we have a very large
volume of patches show up near the deadline.

One thing that can help us maximize the number of patches we get in is
to have community involvement in performing code reviews. And in
particular, doing a thorough review and signing off on a patch with
LGTM can substantially increase the odds we can merge a patch
confidently.

If you are newer to Spark, finding a single area of the codebase to
focus on can still provide a lot of value to the project in the
reviewing process.

Cheers and good luck with everyone on work for this release.

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-5199) Input metrics should show up for InputFormats that return CombineFileSplits

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5199?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5199.

   Resolution: Fixed
Fix Version/s: 1.3.0

> Input metrics should show up for InputFormats that return CombineFileSplits
> ---
>
> Key: SPARK-5199
> URL: https://issues.apache.org/jira/browse/SPARK-5199
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Reporter: Sandy Ryza
>Assignee: Sandy Ryza
> Fix For: 1.3.0
>
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Okay - we've resolved all issues with the signatures and keys.
However, I'll leave the current vote open for a bit to solicit
additional feedback.

On Tue, Jan 27, 2015 at 10:43 AM, Sean McNamara
 wrote:
> Sounds good, that makes sense.
>
> Cheers,
>
> Sean
>
>> On Jan 27, 2015, at 11:35 AM, Patrick Wendell  wrote:
>>
>> Hey Sean,
>>
>> Right now we don't publish every 2.11 binary to avoid combinatorial
>> explosion of the number of build artifacts we publish (there are other
>> parameters such as whether hive is included, etc). We can revisit this
>> in future feature releases, but .1 releases like this are reserved for
>> bug fixes.
>>
>> - Patrick
>>
>> On Tue, Jan 27, 2015 at 10:31 AM, Sean McNamara
>>  wrote:
>>> We're using spark on scala 2.11 /w hadoop2.4.  Would it be practical / make 
>>> sense to build a bin version of spark against scala 2.11 for versions other 
>>> than just hadoop1 at this time?
>>>
>>> Cheers,
>>>
>>> Sean
>>>
>>>
>>>> On Jan 27, 2015, at 12:04 AM, Patrick Wendell  wrote:
>>>>
>>>> Please vote on releasing the following candidate as Apache Spark version 
>>>> 1.2.1!
>>>>
>>>> The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3):
>>>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec
>>>>
>>>> The release files, including signatures, digests, etc. can be found at:
>>>> http://people.apache.org/~pwendell/spark-1.2.1-rc1/
>>>>
>>>> Release artifacts are signed with the following key:
>>>> https://people.apache.org/keys/committer/pwendell.asc
>>>>
>>>> The staging repository for this release can be found at:
>>>> https://repository.apache.org/content/repositories/orgapachespark-1061/
>>>>
>>>> The documentation corresponding to this release can be found at:
>>>> http://people.apache.org/~pwendell/spark-1.2.1-rc1-docs/
>>>>
>>>> Please vote on releasing this package as Apache Spark 1.2.1!
>>>>
>>>> The vote is open until Friday, January 30, at 07:00 UTC and passes
>>>> if a majority of at least 3 +1 PMC votes are cast.
>>>>
>>>> [ ] +1 Release this package as Apache Spark 1.2.1
>>>> [ ] -1 Do not release this package because ...
>>>>
>>>> For a list of fixes in this release, see http://s.apache.org/Mpn.
>>>>
>>>> To learn more about Apache Spark, please see
>>>> http://spark.apache.org/
>>>>
>>>> - Patrick
>>>>
>>>> -
>>>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>>>> For additional commands, e-mail: dev-h...@spark.apache.org
>>>>
>>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-5299) Is http://www.apache.org/dist/spark/KEYS out of date?

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5299.

Resolution: Fixed

Okay I've now added every key ever used to publish a Spark release (I think) 
this comes from myself, Xiangrui Meng, Andrew Or, and Tathagata Das

> Is http://www.apache.org/dist/spark/KEYS out of date?
> -
>
> Key: SPARK-5299
> URL: https://issues.apache.org/jira/browse/SPARK-5299
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Reporter: David Shaw
>    Assignee: Patrick Wendell
>
> The keys contained in http://www.apache.org/dist/spark/KEYS do not appear to 
> match the keys used to sign the releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-5299) Is http://www.apache.org/dist/spark/KEYS out of date?

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell reopened SPARK-5299:


Actually I need to deal with past releases as well, so re-opening.

> Is http://www.apache.org/dist/spark/KEYS out of date?
> -
>
> Key: SPARK-5299
> URL: https://issues.apache.org/jira/browse/SPARK-5299
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Reporter: David Shaw
>    Assignee: Patrick Wendell
>
> The keys contained in http://www.apache.org/dist/spark/KEYS do not appear to 
> match the keys used to sign the releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5299) Is http://www.apache.org/dist/spark/KEYS out of date?

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5299.

Resolution: Fixed

Thanks I've fixed this.

> Is http://www.apache.org/dist/spark/KEYS out of date?
> -
>
> Key: SPARK-5299
> URL: https://issues.apache.org/jira/browse/SPARK-5299
> Project: Spark
>  Issue Type: Question
>  Components: Deploy
>Reporter: David Shaw
>    Assignee: Patrick Wendell
>
> The keys contained in http://www.apache.org/dist/spark/KEYS do not appear to 
> match the keys used to sign the releases.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Hey Sean,

Right now we don't publish every 2.11 binary to avoid combinatorial
explosion of the number of build artifacts we publish (there are other
parameters such as whether hive is included, etc). We can revisit this
in future feature releases, but .1 releases like this are reserved for
bug fixes.

- Patrick

On Tue, Jan 27, 2015 at 10:31 AM, Sean McNamara
 wrote:
> We're using spark on scala 2.11 /w hadoop2.4.  Would it be practical / make 
> sense to build a bin version of spark against scala 2.11 for versions other 
> than just hadoop1 at this time?
>
> Cheers,
>
> Sean
>
>
>> On Jan 27, 2015, at 12:04 AM, Patrick Wendell  wrote:
>>
>> Please vote on releasing the following candidate as Apache Spark version 
>> 1.2.1!
>>
>> The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3):
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-1.2.1-rc1/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1061/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-1.2.1-rc1-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.2.1!
>>
>> The vote is open until Friday, January 30, at 07:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.2.1
>> [ ] -1 Do not release this package because ...
>>
>> For a list of fixes in this release, see http://s.apache.org/Mpn.
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> - Patrick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>
>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-5308) MD5 / SHA1 hash format doesn't match standard Maven output

2015-01-27 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5308.

   Resolution: Fixed
Fix Version/s: 1.2.1
   1.3.0
 Assignee: Sean Owen

> MD5 / SHA1 hash format doesn't match standard Maven output
> --
>
> Key: SPARK-5308
> URL: https://issues.apache.org/jira/browse/SPARK-5308
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.2.0
>Reporter: Kuldeep
>Assignee: Sean Owen
>Priority: Minor
> Fix For: 1.3.0, 1.2.1
>
>
> https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.2.0/spark-core_2.10-1.2.0.pom.md5
> The above does not look like a proper md5 which is causing failure in some 
> build tools like leiningen.
> https://github.com/technomancy/leiningen/issues/1802
> Compare this with 1.1.0 release
> https://repo1.maven.org/maven2/org/apache/spark/spark-core_2.10/1.1.0/spark-core_2.10-1.1.0.pom.md5



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Yes - the key issue is just due to me creating new keys this time
around. Anyways let's take another stab at this. In the mean time,
please don't hesitate to test the release itself.

- Patrick

On Tue, Jan 27, 2015 at 10:00 AM, Sean Owen  wrote:
> Got it. Ignore the SHA512 issue since these aren't somehow expected by
> a policy or Maven to be in a certain format. Just wondered if the
> difference was intended.
>
> The Maven way of generated the SHA1 hashes is to set this on the
> install plugin, AFAIK, although I'm not sure if the intent was to hash
> files that Maven didn't create:
>
> 
> true
> 
>
> As for the key issue, I think it's just a matter of uploading the new
> key in both places.
>
> We should all of course test the release anyway.
>
> On Tue, Jan 27, 2015 at 5:55 PM, Patrick Wendell  wrote:
>> Hey Sean,
>>
>> The release script generates hashes in two places (take a look a bit
>> further down in the script), one for the published artifacts and the
>> other for the binaries. In the case of the binaries we use SHA512
>> because, AFAIK, the ASF does not require you to use SHA1 and SHA512 is
>> better. In the case of the published Maven artifacts we use SHA1
>> because my understanding is this is what Maven requires. However, it
>> does appear that the format is now one that maven cannot parse.
>>
>> Anyways, it seems fine to just change the format of the hash per your PR.
>>
>> - Patrick
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: [VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-27 Thread Patrick Wendell
Hey Sean,

The release script generates hashes in two places (take a look a bit
further down in the script), one for the published artifacts and the
other for the binaries. In the case of the binaries we use SHA512
because, AFAIK, the ASF does not require you to use SHA1 and SHA512 is
better. In the case of the published Maven artifacts we use SHA1
because my understanding is this is what Maven requires. However, it
does appear that the format is now one that maven cannot parse.

Anyways, it seems fine to just change the format of the hash per your PR.

- Patrick

On Tue, Jan 27, 2015 at 5:00 AM, Sean Owen  wrote:
> I think there are several signing / hash issues that should be fixed
> before this release.
>
> Hashes:
>
> http://issues.apache.org/jira/browse/SPARK-5308
> https://github.com/apache/spark/pull/4161
>
> The hashes here are correct, but have two issues:
>
> As noted in the JIRA, the format of the hash file is "nonstandard" --
> at least, doesn't match what Maven outputs, and apparently which tools
> like Leiningen expect, which is just the hash with no file name or
> spaces. There are two ways to fix that: different command-line tools
> (see PR), or, just ask Maven to generate these hashes (a different,
> easy PR).
>
> However, is the script I modified above used to generate these hashes?
> It's generating SHA1 sums, but the output in this release candidate
> has (correct) SHA512 sums.
>
> This may be more than a nuisance, since last time for some reason
> Maven Central did not register the project hashes.
>
> http://search.maven.org/#artifactdetails%7Corg.apache.spark%7Cspark-core_2.10%7C1.2.0%7Cjar
> does not show them but they exist:
> http://www.us.apache.org/dist/spark/spark-1.2.0/
>
> It may add up to a problem worth rooting out before this release.
>
>
> Signing:
>
> As noted in https://issues.apache.org/jira/browse/SPARK-5299 there are
> two signing keys in
> https://people.apache.org/keys/committer/pwendell.asc (9E4FE3AF,
> 00799F7E) but only one is in http://www.apache.org/dist/spark/KEYS
>
> However, these artifacts seem to be signed by FC8ED089 which isn't in either.
>
> Details details, but I'd say non-binding -1 at the moment.
>
>
> On Tue, Jan 27, 2015 at 7:02 AM, Patrick Wendell  wrote:
>> Please vote on releasing the following candidate as Apache Spark version 
>> 1.2.1!
>>
>> The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3):
>> https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec
>>
>> The release files, including signatures, digests, etc. can be found at:
>> http://people.apache.org/~pwendell/spark-1.2.1-rc1/
>>
>> Release artifacts are signed with the following key:
>> https://people.apache.org/keys/committer/pwendell.asc
>>
>> The staging repository for this release can be found at:
>> https://repository.apache.org/content/repositories/orgapachespark-1061/
>>
>> The documentation corresponding to this release can be found at:
>> http://people.apache.org/~pwendell/spark-1.2.1-rc1-docs/
>>
>> Please vote on releasing this package as Apache Spark 1.2.1!
>>
>> The vote is open until Friday, January 30, at 07:00 UTC and passes
>> if a majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Spark 1.2.1
>> [ ] -1 Do not release this package because ...
>>
>> For a list of fixes in this release, see http://s.apache.org/Mpn.
>>
>> To learn more about Apache Spark, please see
>> http://spark.apache.org/
>>
>> - Patrick
>>
>> -
>> To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
>> For additional commands, e-mail: dev-h...@spark.apache.org
>>

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[VOTE] Release Apache Spark 1.2.1 (RC1)

2015-01-26 Thread Patrick Wendell
Please vote on releasing the following candidate as Apache Spark version 1.2.1!

The tag to be voted on is v1.2.1-rc1 (commit 3e2d7d3):
https://git-wip-us.apache.org/repos/asf?p=spark.git;a=commit;h=3e2d7d310b76c293b9ac787f204e6880f508f6ec

The release files, including signatures, digests, etc. can be found at:
http://people.apache.org/~pwendell/spark-1.2.1-rc1/

Release artifacts are signed with the following key:
https://people.apache.org/keys/committer/pwendell.asc

The staging repository for this release can be found at:
https://repository.apache.org/content/repositories/orgapachespark-1061/

The documentation corresponding to this release can be found at:
http://people.apache.org/~pwendell/spark-1.2.1-rc1-docs/

Please vote on releasing this package as Apache Spark 1.2.1!

The vote is open until Friday, January 30, at 07:00 UTC and passes
if a majority of at least 3 +1 PMC votes are cast.

[ ] +1 Release this package as Apache Spark 1.2.1
[ ] -1 Do not release this package because ...

For a list of fixes in this release, see http://s.apache.org/Mpn.

To learn more about Apache Spark, please see
http://spark.apache.org/

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Created] (SPARK-5420) Create cross-langauge load/store functions for creating and saving DataFrames

2015-01-26 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-5420:
--

 Summary: Create cross-langauge load/store functions for creating 
and saving DataFrames
 Key: SPARK-5420
 URL: https://issues.apache.org/jira/browse/SPARK-5420
 Project: Spark
  Issue Type: Sub-task
Reporter: Patrick Wendell


We should have standard API's for loading or saving a table from a data store. 
One idea:

{code}
df = sc.loadTable("path.to.DataSource", {"a": "b", "c": "d"})
sc.storeTable("path.to.DataSouce", {"a":"b", "c":"d"})
{code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5420) Cross-langauge load/store functions for creating and saving DataFrames

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5420:
---
Summary: Cross-langauge load/store functions for creating and saving 
DataFrames  (was: Create cross-langauge load/store functions for creating and 
saving DataFrames)

> Cross-langauge load/store functions for creating and saving DataFrames
> --
>
> Key: SPARK-5420
> URL: https://issues.apache.org/jira/browse/SPARK-5420
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>        Reporter: Patrick Wendell
>
> We should have standard API's for loading or saving a table from a data 
> store. One idea:
> {code}
> df = sc.loadTable("path.to.DataSource", {"a": "b", "c": "d"})
> sc.storeTable("path.to.DataSouce", {"a":"b", "c":"d"})
> {code}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5052) com.google.common.base.Optional binary has a wrong method signatures

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5052:
---
Assignee: Elmer Garduno

> com.google.common.base.Optional binary has a wrong method signatures
> 
>
> Key: SPARK-5052
> URL: https://issues.apache.org/jira/browse/SPARK-5052
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Elmer Garduno
>Assignee: Elmer Garduno
> Fix For: 1.3.0
>
>
> PR https://github.com/apache/spark/pull/1813 shaded Guava jar file and moved 
> Guava classes to package org.spark-project.guava when Spark is built by Maven.
> When a user jar uses the actual com.google.common.base.Optional 
> transform(com.google.common.base.Function); method from Guava,  a 
> java.lang.NoSuchMethodError: 
> com.google.common.base.Optional.transform(Lcom/google/common/base/Function;)Lcom/google/common/base/Optional;
>  is thrown.
> The reason seems to be that the Optional class included on 
> spark-assembly-1.2.0-hadoop1.0.4.jar has an incorrect method signature that 
> includes the shaded class as an argument:
> Expected:
> javap -classpath 
> target/scala-2.10/googlegenomics-spark-examples-assembly-1.0.jar 
> com.google.common.base.Optional
>   public abstract  
> com.google.common.base.Optional 
> transform(com.google.common.base.Function);
> Found:
> javap -classpath lib/spark-assembly-1.2.0-hadoop1.0.4.jar 
> com.google.common.base.Optional
>   public abstract  
> com.google.common.base.Optional 
> transform(org.spark-project.guava.common.base.Function);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5052) com.google.common.base.Optional binary has a wrong method signatures

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5052.

   Resolution: Fixed
Fix Version/s: 1.3.0

> com.google.common.base.Optional binary has a wrong method signatures
> 
>
> Key: SPARK-5052
> URL: https://issues.apache.org/jira/browse/SPARK-5052
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.2.0
>Reporter: Elmer Garduno
> Fix For: 1.3.0
>
>
> PR https://github.com/apache/spark/pull/1813 shaded Guava jar file and moved 
> Guava classes to package org.spark-project.guava when Spark is built by Maven.
> When a user jar uses the actual com.google.common.base.Optional 
> transform(com.google.common.base.Function); method from Guava,  a 
> java.lang.NoSuchMethodError: 
> com.google.common.base.Optional.transform(Lcom/google/common/base/Function;)Lcom/google/common/base/Optional;
>  is thrown.
> The reason seems to be that the Optional class included on 
> spark-assembly-1.2.0-hadoop1.0.4.jar has an incorrect method signature that 
> includes the shaded class as an argument:
> Expected:
> javap -classpath 
> target/scala-2.10/googlegenomics-spark-examples-assembly-1.0.jar 
> com.google.common.base.Optional
>   public abstract  
> com.google.common.base.Optional 
> transform(com.google.common.base.Function);
> Found:
> javap -classpath lib/spark-assembly-1.2.0-hadoop1.0.4.jar 
> com.google.common.base.Optional
>   public abstract  
> com.google.common.base.Optional 
> transform(org.spark-project.guava.common.base.Function);



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: renaming SchemaRDD -> DataFrame

2015-01-26 Thread Patrick Wendell
One thing potentially not clear from this e-mail, there will be a 1:1
correspondence where you can get an RDD to/from a DataFrame.

On Mon, Jan 26, 2015 at 2:18 PM, Reynold Xin  wrote:
> Hi,
>
> We are considering renaming SchemaRDD -> DataFrame in 1.3, and wanted to
> get the community's opinion.
>
> The context is that SchemaRDD is becoming a common data format used for
> bringing data into Spark from external systems, and used for various
> components of Spark, e.g. MLlib's new pipeline API. We also expect more and
> more users to be programming directly against SchemaRDD API rather than the
> core RDD API. SchemaRDD, through its less commonly used DSL originally
> designed for writing test cases, always has the data-frame like API. In
> 1.3, we are redesigning the API to make the API usable for end users.
>
>
> There are two motivations for the renaming:
>
> 1. DataFrame seems to be a more self-evident name than SchemaRDD.
>
> 2. SchemaRDD/DataFrame is actually not going to be an RDD anymore (even
> though it would contain some RDD functions like map, flatMap, etc), and
> calling it Schema*RDD* while it is not an RDD is highly confusing. Instead.
> DataFrame.rdd will return the underlying RDD for all RDD methods.
>
>
> My understanding is that very few users program directly against the
> SchemaRDD API at the moment, because they are not well documented. However,
> oo maintain backward compatibility, we can create a type alias DataFrame
> that is still named SchemaRDD. This will maintain source compatibility for
> Scala. That said, we will have to update all existing materials to use
> DataFrame rather than SchemaRDD.

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-4147) Reduce log4j dependency

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4147:
---
Fix Version/s: 1.2.1
   1.3.0

> Reduce log4j dependency
> ---
>
> Key: SPARK-4147
> URL: https://issues.apache.org/jira/browse/SPARK-4147
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tobias Pfeiffer
>Assignee: Sean Owen
> Fix For: 1.3.0, 1.2.1
>
>
> spark-core has a hard dependency on log4j, which shouldn't be necessary since 
> slf4j is used. I tried to exclude slf4j-log4j12 and log4j dependencies in my 
> sbt file.
> Excluding org.slf4j.slf4j-log4j12 works fine if logback is on the classpath. 
> However, removing the log4j dependency fails because in 
> https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/Logging.scala#L121
>  a static method of org.apache.log4j.LogManager is accessed *even if* log4j 
> is not in use.
> I guess removing all dependencies on log4j may be a bigger task, but it would 
> be a great help if the access to LogManager would be done only if log4j use 
> was detected before. (This is a 2-line change.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4147) Reduce log4j dependency

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4147.

Resolution: Fixed

> Reduce log4j dependency
> ---
>
> Key: SPARK-4147
> URL: https://issues.apache.org/jira/browse/SPARK-4147
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tobias Pfeiffer
>Assignee: Sean Owen
> Fix For: 1.3.0, 1.2.1
>
>
> spark-core has a hard dependency on log4j, which shouldn't be necessary since 
> slf4j is used. I tried to exclude slf4j-log4j12 and log4j dependencies in my 
> sbt file.
> Excluding org.slf4j.slf4j-log4j12 works fine if logback is on the classpath. 
> However, removing the log4j dependency fails because in 
> https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/Logging.scala#L121
>  a static method of org.apache.log4j.LogManager is accessed *even if* log4j 
> is not in use.
> I guess removing all dependencies on log4j may be a bigger task, but it would 
> be a great help if the access to LogManager would be done only if log4j use 
> was detected before. (This is a 2-line change.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4147) Reduce log4j dependency

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4147:
---
Affects Version/s: 1.2.0

> Reduce log4j dependency
> ---
>
> Key: SPARK-4147
> URL: https://issues.apache.org/jira/browse/SPARK-4147
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Tobias Pfeiffer
>Assignee: Sean Owen
>
> spark-core has a hard dependency on log4j, which shouldn't be necessary since 
> slf4j is used. I tried to exclude slf4j-log4j12 and log4j dependencies in my 
> sbt file.
> Excluding org.slf4j.slf4j-log4j12 works fine if logback is on the classpath. 
> However, removing the log4j dependency fails because in 
> https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/Logging.scala#L121
>  a static method of org.apache.log4j.LogManager is accessed *even if* log4j 
> is not in use.
> I guess removing all dependencies on log4j may be a bigger task, but it would 
> be a great help if the access to LogManager would be done only if log4j use 
> was detected before. (This is a 2-line change.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4147) Reduce log4j dependency

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4147:
---
Assignee: Sean Owen

> Reduce log4j dependency
> ---
>
> Key: SPARK-4147
> URL: https://issues.apache.org/jira/browse/SPARK-4147
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Core
>Affects Versions: 1.1.0
>Reporter: Tobias Pfeiffer
>Assignee: Sean Owen
>
> spark-core has a hard dependency on log4j, which shouldn't be necessary since 
> slf4j is used. I tried to exclude slf4j-log4j12 and log4j dependencies in my 
> sbt file.
> Excluding org.slf4j.slf4j-log4j12 works fine if logback is on the classpath. 
> However, removing the log4j dependency fails because in 
> https://github.com/apache/spark/blob/v1.1.0/core/src/main/scala/org/apache/spark/Logging.scala#L121
>  a static method of org.apache.log4j.LogManager is accessed *even if* log4j 
> is not in use.
> I guess removing all dependencies on log4j may be a bigger task, but it would 
> be a great help if the access to LogManager would be done only if log4j use 
> was detected before. (This is a 2-line change.)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5339) build/mvn doesn't work because of invalid URL for maven's tgz.

2015-01-26 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5339.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Kousuke Saruta

> build/mvn doesn't work because of invalid URL for maven's tgz.
> --
>
> Key: SPARK-5339
> URL: https://issues.apache.org/jira/browse/SPARK-5339
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Blocker
> Fix For: 1.3.0
>
>
> build/mvn will automatically download tarball of maven. But currently, the 
> URL is invalid. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Upcoming Spark 1.2.1 RC

2015-01-21 Thread Patrick Wendell
Hey All,

I am planning to cut a 1.2.1 RC soon and wanted to notify people.

There are a handful of important fixes in the 1.2.1 branch
(http://s.apache.org/Mpn) particularly for Spark SQL. There was also
an issue publishing some of our artifacts with 1.2.0 and this release
would fix it for downstream projects.

You can track outstanding 1.2.1 blocker issues here at
http://s.apache.org/2v2 - I'm guessing all remaining blocker issues
will be fixed today.

I think we have a good handle on the remaining outstanding fixes, but
please let me know if you think there are severe outstanding fixes
that need to be backported into this branch or are not tracked above.

Thanks!
- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Resolved] (SPARK-3958) Possible stream-corruption issues in TorrentBroadcast

2015-01-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-3958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-3958.

  Resolution: Fixed
Target Version/s:   (was: 1.2.1)

At this point I'm not aware of people still hitting this set of issues in newer 
releases, so per discussion with [~joshrosen], I'd like to close this. Please 
comment on this JIRA if you are having some variant of this issue in a newer 
version of Spark, and we'll continue to investigate.

> Possible stream-corruption issues in TorrentBroadcast
> -
>
> Key: SPARK-3958
> URL: https://issues.apache.org/jira/browse/SPARK-3958
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.2.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
> Attachments: spark_ex.logs
>
>
> TorrentBroadcast deserialization sometimes fails with decompression errors, 
> which are most likely caused by stream-corruption exceptions.  For example, 
> this can manifest itself as a Snappy PARSING_ERROR when deserializing a 
> broadcasted task:
> {code}
> 14/10/14 17:20:55.016 DEBUG BlockManager: Getting local block broadcast_8
> 14/10/14 17:20:55.016 DEBUG BlockManager: Block broadcast_8 not registered 
> locally
> 14/10/14 17:20:55.016 INFO TorrentBroadcast: Started reading broadcast 
> variable 8
> 14/10/14 17:20:55.017 INFO TorrentBroadcast: Reading broadcast variable 8 
> took 5.3433E-5 s
> 14/10/14 17:20:55.017 ERROR Executor: Exception in task 2.0 in stage 8.0 (TID 
> 18)
> java.io.IOException: PARSING_ERROR(2)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:84)
>   at org.xerial.snappy.SnappyNative.uncompressedLength(Native Method)
>   at org.xerial.snappy.Snappy.uncompressedLength(Snappy.java:594)
>   at 
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:125)
>   at 
> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>   at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58)
>   at 
> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast$.unBlockifyObject(TorrentBroadcast.scala:216)
>   at 
> org.apache.spark.broadcast.TorrentBroadcast.readObject(TorrentBroadcast.scala:170)
>   at sun.reflect.GeneratedMethodAccessor92.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> java.io.ObjectStreamClass.invokeReadObject(ObjectStreamClass.java:1017)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1893)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at 
> java.io.ObjectInputStream.defaultReadFields(ObjectInputStream.java:1990)
>   at java.io.ObjectInputStream.readSerialData(ObjectInputStream.java:1915)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1798)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.deserialize(JavaSerializer.scala:87)
>   at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:164)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
>   at java.lang.Thread.run(Thread.java:745)
> {code}
> SPARK-3630 is an umbrella ticket for investigating all causes of these Kryo 
> and Snappy deserialization errors.  This ticket is for a more 
> narrowly-focused exploration of the TorrentBroadcast version of these errors, 
> since the similar errors that we've seen in sort-based shuffle seem to be 
> explained by a different cause (see SPARK-3948).



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-4105) FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based shuffle

2015-01-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4105.

  Resolution: Fixed
Target Version/s:   (was: 1.2.1)

At this point I'm not aware of people still hitting this set of issues in newer 
releases, so per discussion with [~joshrosen], I'd like to close this. Please 
comment on this JIRA if you are having some variant of this issue in a newer 
version of Spark, and we'll continue to investigate.

> FAILED_TO_UNCOMPRESS(5) errors when fetching shuffle data with sort-based 
> shuffle
> -
>
> Key: SPARK-4105
> URL: https://issues.apache.org/jira/browse/SPARK-4105
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 1.2.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Blocker
>
> We have seen non-deterministic {{FAILED_TO_UNCOMPRESS(5)}} errors during 
> shuffle read.  Here's a sample stacktrace from an executor:
> {code}
> 14/10/23 18:34:11 ERROR Executor: Exception in task 1747.3 in stage 11.0 (TID 
> 33053)
> java.io.IOException: FAILED_TO_UNCOMPRESS(5)
>   at org.xerial.snappy.SnappyNative.throw_error(SnappyNative.java:78)
>   at org.xerial.snappy.SnappyNative.rawUncompress(Native Method)
>   at org.xerial.snappy.Snappy.rawUncompress(Snappy.java:391)
>   at org.xerial.snappy.Snappy.uncompress(Snappy.java:427)
>   at 
> org.xerial.snappy.SnappyInputStream.readFully(SnappyInputStream.java:127)
>   at 
> org.xerial.snappy.SnappyInputStream.readHeader(SnappyInputStream.java:88)
>   at org.xerial.snappy.SnappyInputStream.(SnappyInputStream.java:58)
>   at 
> org.apache.spark.io.SnappyCompressionCodec.compressedInputStream(CompressionCodec.scala:128)
>   at 
> org.apache.spark.storage.BlockManager.wrapForCompression(BlockManager.scala:1090)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:116)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator$$anon$1$$anonfun$onBlockFetchSuccess$1.apply(ShuffleBlockFetcherIterator.scala:115)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:243)
>   at 
> org.apache.spark.storage.ShuffleBlockFetcherIterator.next(ShuffleBlockFetcherIterator.scala:52)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.util.CompletionIterator.hasNext(CompletionIterator.scala:30)
>   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:39)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:327)
>   at 
> org.apache.spark.util.collection.ExternalAppendOnlyMap.insertAll(ExternalAppendOnlyMap.scala:129)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:159)
>   at 
> org.apache.spark.rdd.CoGroupedRDD$$anonfun$compute$5.apply(CoGroupedRDD.scala:158)
>   at 
> scala.collection.TraversableLike$WithFilter$$anonfun$foreach$1.apply(TraversableLike.scala:772)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at 
> scala.collection.TraversableLike$WithFilter.foreach(TraversableLike.scala:771)
>   at org.apache.spark.rdd.CoGroupedRDD.compute(CoGroupedRDD.scala:158)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.MappedValuesRDD.compute(MappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.rdd.FlatMappedValuesRDD.compute(FlatMappedValuesRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at org.apache.spark.rdd.MappedRDD.compute(MappedRDD.scala:31)
>   at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:262)
>   at org.apache.spark.rdd.RDD.iterator(RDD.scala:229)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.scala:68)
>   at 
> org.apache.spark.scheduler.ShuffleMapTask.runTask(ShuffleMapTask.

[jira] [Commented] (SPARK-4939) Python updateStateByKey example hang in local mode

2015-01-21 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286234#comment-14286234
 ] 

Patrick Wendell commented on SPARK-4939:


[~tdas] [~davies] [~kayousterhout] Because there is still discussion about 
this, and this is modifying a very complex component in Spark, I'm not going to 
block on this for 1.2.1. Once we merge a patch we can decide whether to put it 
into 1.2 based on what the final patch looks like. It is definitely 
inconvenient that this doesn't work in local mode, but much less of a problem 
than introducing a bug in the scheduler for production cluster workloads.

As a workaround we could suggest running this example with local-cluster.

> Python updateStateByKey example hang in local mode
> --
>
> Key: SPARK-4939
> URL: https://issues.apache.org/jira/browse/SPARK-4939
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core, Streaming
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4939) Python updateStateByKey example hang in local mode

2015-01-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4939:
---
Target Version/s: 1.3.0  (was: 1.3.0, 1.2.1)

> Python updateStateByKey example hang in local mode
> --
>
> Key: SPARK-4939
> URL: https://issues.apache.org/jira/browse/SPARK-4939
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core, Streaming
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
>




--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5275) pyspark.streaming is not included in assembly jar

2015-01-21 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5275:
---
Fix Version/s: 1.2.1
   1.3.0

> pyspark.streaming is not included in assembly jar
> -
>
> Key: SPARK-5275
> URL: https://issues.apache.org/jira/browse/SPARK-5275
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
> Fix For: 1.3.0, 1.2.1
>
>
> The pyspark.streaming is not included in assembly jar of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
Yep,

I think it's only useful (and likely to be maintained) if we actually
use this on Jenkins. So that was my proposal. Basically give people a
docker file so they can understand exactly what versions of everything
we use for our reference build. And if they don't want to use docker
directly, this will at least serve as an up-to-date list of
packages/versions they should try to install locally in whatever
environment they have.

- Patrick

On Wed, Jan 21, 2015 at 5:42 AM, Will Benton  wrote:
> - Original Message -----
>> From: "Patrick Wendell" 
>> To: "Sean Owen" 
>> Cc: "dev" , "jay vyas" , 
>> "Paolo Platter"
>> , "Nicholas Chammas" 
>> , "Will Benton" 
>> Sent: Wednesday, January 21, 2015 2:09:35 AM
>> Subject: Re: Standardized Spark dev environment
>
>> But the issue is when users can't reproduce Jenkins failures.
>
> Yeah, to answer Sean's question, this was part of the problem I was trying to 
> solve.  The other part was teasing out differences between the Fedora Java 
> environment and a more conventional Java environment.  I agree with Sean (and 
> I think this is your suggestion as well, Patrick) that making the environment 
> Jenkins runs a standard image that is available for public consumption would 
> be useful in general.
>
>
>
> best,
> wb

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-21 Thread Patrick Wendell
> If the goal is a reproducible test environment then I think that is what
> Jenkins is. Granted you can only ask it for a test. But presumably you get
> the same result if you start from the same VM image as Jenkins and run the
> same steps.

But the issue is when users can't reproduce Jenkins failures. We don't
publish anywhere what the exact set of packages and versions is that
is installed on Jenkins. And it can change since it's a shared
infrastructure with other projects. So why not publish this manifest
as a docker file and then have it run on jenkins using that image? My
point is that this "VM image + steps" is not public anywhere.

> I bet it is not hard to set up and maintain. I bet it is easier than a VM.
> But unless Jenkins is using it aren't we just making another different
> standard build env in an effort to standardize? If it is not the same then
> it loses value as being exactly the same as the reference build env. Has a
> problem come up that this solves?

Right now the reference build env is an AMI I created and keep adding
stuff to when Spark gets new dependencies (e.g. the version of ruby we
need to create the docs, new python stats libraries, etc). So if we
had a docker image, then I would use that for making the RC's as well
and it could serve as a definitive reference for people who want to
understand exactly what set of things they need to build Spark.

>
> If the goal is just easing developer set up then what does a Docker image do
> - what does it set up for me? I don't know of stuff I need set up on OS X
> for me beyond the IDE.

There are actually a good number of packages you need to do a full
build of Spark including a compliant python version, Java version,
certain python packages, ruby and jekyll stuff for the docs, etc
(mentioned a bit earlier).

- Patrick

-
To unsubscribe, e-mail: dev-unsubscr...@spark.apache.org
For additional commands, e-mail: dev-h...@spark.apache.org



[jira] [Updated] (SPARK-5297) File Streams do not work with custom key/values

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5297:
---
Assignee: Saisai Shao

> File Streams do not work with custom key/values
> ---
>
> Key: SPARK-5297
> URL: https://issues.apache.org/jira/browse/SPARK-5297
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Leonidas Fegaras
>Assignee: Saisai Shao
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> The following code:
> {code}
> stream_context.>fileStream(directory)
> .foreachRDD(new Function,Void>() {
>  public Void call ( JavaPairRDD rdd ) throws Exception {
>  for ( Tuple2 x: rdd.collect() )
>  System.out.println("# "+x._1+" "+x._2);
>  return null;
>  }
>   });
> stream_context.start();
> stream_context.awaitTermination();
> {code}
> for custom (serializable) classes K and V compiles fine but gives an error
> when I drop a new hadoop sequence file in the directory:
> {quote}
> 15/01/17 09:13:59 ERROR scheduler.JobScheduler: Error generating jobs for 
> time 1421507639000 ms
> java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.hadoop.mapreduce.InputFormat
>   at 
> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:236)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$filesToRDD(FileInputDStream.scala:234)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:128)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:296)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
>   at scala.Option.orElse(Option.scala:257)
> {quote}
> The same classes K and V work fine for non-streaming Spark:
> {code}
> spark_context.newAPIHadoopFile(path,F.class,K.class,SequenceFileInputFormat.class,conf)
> {code}
> also streaming works fine for TextFileInputFormat.
> The issue is that class manifests are erased to object in the Java file 
> stream constructor, but those are relied on downstream when creating the 
> Hadoop RDD that backs each batch of the file stream.
> https://github.com/apache/spark/blob/v1.2.0/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala#L263
> https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/SparkContext.scala#L753



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5297) File Streams do not work with custom key/values

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5297:
---
Fix Version/s: 1.3.0

> File Streams do not work with custom key/values
> ---
>
> Key: SPARK-5297
> URL: https://issues.apache.org/jira/browse/SPARK-5297
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Leonidas Fegaras
>Assignee: Saisai Shao
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> The following code:
> {code}
> stream_context.>fileStream(directory)
> .foreachRDD(new Function,Void>() {
>  public Void call ( JavaPairRDD rdd ) throws Exception {
>  for ( Tuple2 x: rdd.collect() )
>  System.out.println("# "+x._1+" "+x._2);
>  return null;
>  }
>   });
> stream_context.start();
> stream_context.awaitTermination();
> {code}
> for custom (serializable) classes K and V compiles fine but gives an error
> when I drop a new hadoop sequence file in the directory:
> {quote}
> 15/01/17 09:13:59 ERROR scheduler.JobScheduler: Error generating jobs for 
> time 1421507639000 ms
> java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.hadoop.mapreduce.InputFormat
>   at 
> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:236)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$filesToRDD(FileInputDStream.scala:234)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:128)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:296)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
>   at scala.Option.orElse(Option.scala:257)
> {quote}
> The same classes K and V work fine for non-streaming Spark:
> {code}
> spark_context.newAPIHadoopFile(path,F.class,K.class,SequenceFileInputFormat.class,conf)
> {code}
> also streaming works fine for TextFileInputFormat.
> The issue is that class manifests are erased to object in the Java file 
> stream constructor, but those are relied on downstream when creating the 
> Hadoop RDD that backs each batch of the file stream.
> https://github.com/apache/spark/blob/v1.2.0/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala#L263
> https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/SparkContext.scala#L753



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5297) File Streams do not work with custom key/values

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5297:
---
Labels: backport-needed  (was: )

> File Streams do not work with custom key/values
> ---
>
> Key: SPARK-5297
> URL: https://issues.apache.org/jira/browse/SPARK-5297
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Leonidas Fegaras
>Assignee: Saisai Shao
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> The following code:
> {code}
> stream_context.>fileStream(directory)
> .foreachRDD(new Function,Void>() {
>  public Void call ( JavaPairRDD rdd ) throws Exception {
>  for ( Tuple2 x: rdd.collect() )
>  System.out.println("# "+x._1+" "+x._2);
>  return null;
>  }
>   });
> stream_context.start();
> stream_context.awaitTermination();
> {code}
> for custom (serializable) classes K and V compiles fine but gives an error
> when I drop a new hadoop sequence file in the directory:
> {quote}
> 15/01/17 09:13:59 ERROR scheduler.JobScheduler: Error generating jobs for 
> time 1421507639000 ms
> java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.hadoop.mapreduce.InputFormat
>   at 
> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:236)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$filesToRDD(FileInputDStream.scala:234)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:128)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:296)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
>   at scala.Option.orElse(Option.scala:257)
> {quote}
> The same classes K and V work fine for non-streaming Spark:
> {code}
> spark_context.newAPIHadoopFile(path,F.class,K.class,SequenceFileInputFormat.class,conf)
> {code}
> also streaming works fine for TextFileInputFormat.
> The issue is that class manifests are erased to object in the Java file 
> stream constructor, but those are relied on downstream when creating the 
> Hadoop RDD that backs each batch of the file stream.
> https://github.com/apache/spark/blob/v1.2.0/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala#L263
> https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/SparkContext.scala#L753



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5297) File Streams do not work with custom key/values

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5297:
---
Target Version/s: 1.3.0, 1.2.1

> File Streams do not work with custom key/values
> ---
>
> Key: SPARK-5297
> URL: https://issues.apache.org/jira/browse/SPARK-5297
> Project: Spark
>  Issue Type: Bug
>  Components: Streaming
>Affects Versions: 1.2.0
>Reporter: Leonidas Fegaras
>Assignee: Saisai Shao
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> The following code:
> {code}
> stream_context.>fileStream(directory)
> .foreachRDD(new Function,Void>() {
>  public Void call ( JavaPairRDD rdd ) throws Exception {
>  for ( Tuple2 x: rdd.collect() )
>  System.out.println("# "+x._1+" "+x._2);
>  return null;
>  }
>   });
> stream_context.start();
> stream_context.awaitTermination();
> {code}
> for custom (serializable) classes K and V compiles fine but gives an error
> when I drop a new hadoop sequence file in the directory:
> {quote}
> 15/01/17 09:13:59 ERROR scheduler.JobScheduler: Error generating jobs for 
> time 1421507639000 ms
> java.lang.ClassCastException: java.lang.Object cannot be cast to 
> org.apache.hadoop.mapreduce.InputFormat
>   at 
> org.apache.spark.rdd.NewHadoopRDD.getPartitions(NewHadoopRDD.scala:91)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:205)
>   at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:203)
>   at scala.Option.getOrElse(Option.scala:120)
>   at org.apache.spark.rdd.RDD.partitions(RDD.scala:203)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:236)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream$$anonfun$3.apply(FileInputDStream.scala:234)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:34)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.org$apache$spark$streaming$dstream$FileInputDStream$$filesToRDD(FileInputDStream.scala:234)
>   at 
> org.apache.spark.streaming.dstream.FileInputDStream.compute(FileInputDStream.scala:128)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:296)
>   at 
> org.apache.spark.streaming.dstream.DStream$$anonfun$getOrCompute$1.apply(DStream.scala:288)
>   at scala.Option.orElse(Option.scala:257)
> {quote}
> The same classes K and V work fine for non-streaming Spark:
> {code}
> spark_context.newAPIHadoopFile(path,F.class,K.class,SequenceFileInputFormat.class,conf)
> {code}
> also streaming works fine for TextFileInputFormat.
> The issue is that class manifests are erased to object in the Java file 
> stream constructor, but those are relied on downstream when creating the 
> Hadoop RDD that backs each batch of the file stream.
> https://github.com/apache/spark/blob/v1.2.0/streaming/src/main/scala/org/apache/spark/streaming/api/java/JavaStreamingContext.scala#L263
> https://github.com/apache/spark/blob/v1.2.0/core/src/main/scala/org/apache/spark/SparkContext.scala#L753



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



Re: Standardized Spark dev environment

2015-01-20 Thread Patrick Wendell
To respond to the original suggestion by Nick. I always thought it
would be useful to have a Docker image on which we run the tests and
build releases, so that we could have a consistent environment that
other packagers or people trying to exhaustively run Spark tests could
replicate (or at least look at) to understand exactly how we recommend
building Spark. Sean - do you think that is too high of overhead?

In terms of providing images that we encourage as standard deployment
images of Spark and want to make portable across environments, that's
a much larger project and one with higher associated maintenance
overhead. So I'd be interested in seeing that evolve as its own
project (spark-deploy) or something associated with bigtop, etc.

- Patrick

On Tue, Jan 20, 2015 at 10:30 PM, Paolo Platter
 wrote:
> Hi all,
> I also tried the docker way and it works well.
> I suggest to look at sequenceiq/spark dockers, they are very active on that 
> field.
>
> Paolo
>
> Inviata dal mio Windows Phone
> 
> Da: jay vyas
> Inviato: 21/01/2015 04:45
> A: Nicholas Chammas
> Cc: Will Benton; Spark dev 
> list
> Oggetto: Re: Standardized Spark dev environment
>
> I can comment on both...  hi will and nate :)
>
> 1) Will's Dockerfile solution is  the most  simple direct solution to the
> dev environment question : its a  efficient way to build and develop spark
> environments for dev/test..  It would be cool to put that Dockerfile
> (and/or maybe a shell script which uses it) in the top level of spark as
> the build entry point.  For total platform portability, u could wrap in a
> vagrantfile to launch a lightweight vm, so that windows worked equally
> well.
>
> 2) However, since nate mentioned  vagrant and bigtop, i have to chime in :)
> the vagrant recipes in bigtop are a nice reference deployment of how to
> deploy spark in a heterogenous hadoop style environment, and tighter
> integration testing w/ bigtop for spark releases would be lovely !  The
> vagrant stuff use puppet to deploy an n node VM or docker based cluster, in
> which users can easily select components (including
> spark,yarn,hbase,hadoop,etc...) by simnply editing a YAML file :
> https://github.com/apache/bigtop/blob/master/bigtop-deploy/vm/vagrant-puppet/vagrantconfig.yaml
> As nate said, it would be alot of fun to get more cross collaboration
> between the spark and bigtop communities.   Input on how we can better
> integrate spark (wether its spork, hbase integration, smoke tests aroudn
> the mllib stuff, or whatever, is always welcome )
>
>
>
>
>
>
> On Tue, Jan 20, 2015 at 10:21 PM, Nicholas Chammas <
> nicholas.cham...@gmail.com> wrote:
>
>> How many profiles (hadoop / hive /scala) would this development environment
>> support ?
>>
>> As many as we want. We probably want to cover a good chunk of the build
>> matrix  that Spark
>> officially supports.
>>
>> What does this provide, concretely?
>>
>> It provides a reliable way to create a "good" Spark development
>> environment. Roughly speaking, this probably should mean an environment
>> that matches Jenkins, since that's where we run "official" testing and
>> builds.
>>
>> For example, Spark has to run on Java 6 and Python 2.6. When devs build and
>> run Spark locally, we can make sure they're doing it on these versions of
>> the languages with a simple vagrant up.
>>
>> Nate, could you comment on how something like this would relate to the
>> Bigtop effort?
>>
>> http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>>
>> Will, that's pretty sweet. I tried something similar a few months ago as an
>> experiment to try building/testing Spark within a container. Here's the
>> shell script I used > >
>> against the base CentOS Docker image to setup an environment ready to build
>> and test Spark.
>>
>> We want to run Spark unit tests within containers on Jenkins, so it might
>> make sense to develop a single Docker image that can be used as both a "dev
>> environment" as well as execution container on Jenkins.
>>
>> Perhaps that's the approach to take instead of looking into Vagrant.
>>
>> Nick
>>
>> On Tue Jan 20 2015 at 8:22:41 PM Will Benton  wrote:
>>
>> Hey Nick,
>> >
>> > I did something similar with a Docker image last summer; I haven't
>> updated
>> > the images to cache the dependencies for the current Spark master, but it
>> > would be trivial to do so:
>> >
>> > http://chapeau.freevariable.com/2014/08/jvm-test-docker.html
>> >
>> >
>> > best,
>> > wb
>> >
>> >
>> > - Original Message -
>> > > From: "Nicholas Chammas" 
>> > > To: "Spark dev list" 
>> > > Sent: Tuesday, January 20, 2015 6:13:31 PM
>> > > Subject: Standardized Spark dev environment
>> > >
>> > > What do y'all think of creating a standardized Spark dev

[jira] [Updated] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4296:
---
Fix Version/s: 1.2.1
   1.3.0

> Throw "Expression not in GROUP BY" when using same expression in group by 
> clause and  select clause
> ---
>
> Key: SPARK-4296
> URL: https://issues.apache.org/jira/browse/SPARK-4296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Shixiong Zhu
>Assignee: Cheng Lian
>Priority: Blocker
> Fix For: 1.3.0, 1.2.1
>
>
> When the input data has a complex structure, using same expression in group 
> by clause and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), 
> Person("Jim", Birthday("1980-02-28"
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people 
> group by upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression 
> not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date 
> AS date#9) AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at 
> ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and 
> `Upper(birthday#1.date AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias 
> expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-4296) Throw "Expression not in GROUP BY" when using same expression in group by clause and select clause

2015-01-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285263#comment-14285263
 ] 

Patrick Wendell commented on SPARK-4296:


Note this was fixed in https://github.com/apache/spark/pull/3987 in the 1.2 
branch (per discussion with [~lian cheng]).

> Throw "Expression not in GROUP BY" when using same expression in group by 
> clause and  select clause
> ---
>
> Key: SPARK-4296
> URL: https://issues.apache.org/jira/browse/SPARK-4296
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.1.0, 1.1.1, 1.2.0
>Reporter: Shixiong Zhu
>Assignee: Cheng Lian
>Priority: Blocker
>
> When the input data has a complex structure, using same expression in group 
> by clause and  select clause will throw "Expression not in GROUP BY".
> {code:java}
> val sqlContext = new org.apache.spark.sql.SQLContext(sc)
> import sqlContext.createSchemaRDD
> case class Birthday(date: String)
> case class Person(name: String, birthday: Birthday)
> val people = sc.parallelize(List(Person("John", Birthday("1990-01-22")), 
> Person("Jim", Birthday("1980-02-28"
> people.registerTempTable("people")
> val year = sqlContext.sql("select count(*), upper(birthday.date) from people 
> group by upper(birthday.date)")
> year.collect
> {code}
> Here is the plan of year:
> {code:java}
> SchemaRDD[3] at RDD at SchemaRDD.scala:105
> == Query Plan ==
> == Physical Plan ==
> org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Expression 
> not in GROUP BY: Upper(birthday#1.date AS date#9) AS c1#3, tree:
> Aggregate [Upper(birthday#1.date)], [COUNT(1) AS c0#2L,Upper(birthday#1.date 
> AS date#9) AS c1#3]
>  Subquery people
>   LogicalRDD [name#0,birthday#1], MapPartitionsRDD[1] at mapPartitions at 
> ExistingRDD.scala:36
> {code}
> The bug is the equality test for `Upper(birthday#1.date)` and 
> `Upper(birthday#1.date AS date#9)`.
> Maybe Spark SQL needs a mechanism to compare Alias expression and non-Alias 
> expression.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Issue Comment Deleted] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Comment: was deleted

(was: Note that in the 1.2 branch this was fixed by 
https://github.com/apache/spark/pull/3987 (per discussion with [~lian cheng]).)

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(S

[jira] [Commented] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285262#comment-14285262
 ] 

Patrick Wendell commented on SPARK-4959:


Excuse my last comment, it was on the wrong JIRA.

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Fix Version/s: (was: 1.2.1)

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

[jira] [Comment Edited] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285258#comment-14285258
 ] 

Patrick Wendell edited comment on SPARK-4959 at 1/21/15 6:47 AM:
-

Note that in the 1.2 branch this was fixed by 
https://github.com/apache/spark/pull/3987 (per discussion with [~lian cheng]).


was (Author: pwendell):
Note that in the 1.2 branch this was fixed by 
https://github.com/apache/spark/pull/3987 (per discussion with @cheng lian).

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0, 1.2.1
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecu

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Fix Version/s: 1.2.1

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0, 1.2.1
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)

[jira] [Commented] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285258#comment-14285258
 ] 

Patrick Wendell commented on SPARK-4959:


Note that in the 1.2 branch this was fixed by 
https://github.com/apache/spark/pull/3987 (per discussion with @cheng lian).

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0, 1.2.1
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecu

[jira] [Resolved] (SPARK-5275) pyspark.streaming is not included in assembly jar

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5275?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5275.

Resolution: Fixed
  Assignee: Davies Liu

> pyspark.streaming is not included in assembly jar
> -
>
> Key: SPARK-5275
> URL: https://issues.apache.org/jira/browse/SPARK-5275
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Davies Liu
>Assignee: Davies Liu
>Priority: Blocker
>
> The pyspark.streaming is not included in assembly jar of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4996) Memory leak?

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4996:
---
Priority: Major  (was: Blocker)

> Memory leak?
> 
>
> Key: SPARK-4996
> URL: https://issues.apache.org/jira/browse/SPARK-4996
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: uncleGen
>
> When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, 
> everything is OK In spark 1.1.1 with the same resource setting. And, when I 
> increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job 
> can complete successfully. The above two job are running with default spark 
> configurations. Following is the detailed log.
> {code}
> 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as 
> failed. Exit status: 143. Diagnostics: Container is running beyond physical 
> memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB 
> of 23.1 GB virtual memory used. Killing container.
> {code}
> {code}
>  Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index
>  (No such file or directory)
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:120)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:354)
>   at 
> io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor

[jira] [Commented] (SPARK-4996) Memory leak?

2015-01-20 Thread Patrick Wendell (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-4996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14285255#comment-14285255
 ] 

Patrick Wendell commented on SPARK-4996:


I'm de-escalating this right now because it's not clear what the actual issue 
is.

> Memory leak?
> 
>
> Key: SPARK-4996
> URL: https://issues.apache.org/jira/browse/SPARK-4996
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.2.0
>Reporter: uncleGen
>
> When I migrate my job from spark 1.1.1 to spark 1.2, it failed. However, 
> everything is OK In spark 1.1.1 with the same resource setting. And, when I 
> increase the memory settings properly (1.2x ~ 1.5x, in my situation), the job 
> can complete successfully. The above two job are running with default spark 
> configurations. Following is the detailed log.
> {code}
> 14-12-29 19:16:11 INFO [Reporter] YarnAllocationHandler: Container marked as 
> failed. Exit status: 143. Diagnostics: Container is running beyond physical 
> memory limits. Current usage: 11.3 GB of 11 GB physical memory used; 11.8 GB 
> of 23.1 GB virtual memory used. Killing container.
> {code}
> {code}
>  Caused by: java.lang.RuntimeException: java.io.FileNotFoundException: 
> /disk2/mapred/tmp/usercache/testUser/appcache/application_1400565786114_343609/spark-local-20141229190526-d76b/35/shuffle_3_12_0.index
>  (No such file or directory)
>   at java.io.FileInputStream.open(Native Method)
>   at java.io.FileInputStream.(FileInputStream.java:120)
>   at 
> org.apache.spark.shuffle.IndexShuffleBlockManager.getBlockData(IndexShuffleBlockManager.scala:109)
>   at 
> org.apache.spark.storage.BlockManager.getBlockData(BlockManager.scala:305)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer$$anonfun$2.apply(NettyBlockRpcServer.scala:57)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
>   at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
>   at 
> org.apache.spark.network.netty.NettyBlockRpcServer.receive(NettyBlockRpcServer.scala:57)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.processRpcRequest(TransportRequestHandler.java:124)
>   at 
> org.apache.spark.network.server.TransportRequestHandler.handle(TransportRequestHandler.java:97)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:91)
>   at 
> org.apache.spark.network.server.TransportChannelHandler.channelRead0(TransportChannelHandler.java:44)
>   at 
> io.netty.channel.SimpleChannelInboundHandler.channelRead(SimpleChannelInboundHandler.java:105)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:163)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:333)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:319)
>   at 
> io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:787)
>   at 
> io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:130)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:511)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:468)
>   at 
> io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:382)
>   at io.netty.channel.nio.NioEventLo

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Target Version/s: 1.3.0, 1.2.1  (was: 1.3.0)

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>   at org.apach

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Assignee: Cheng Hao

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Assignee: Cheng Hao
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>   at o

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Fix Version/s: 1.3.0

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Priority: Blocker
>  Labels: backport-needed
> Fix For: 1.3.0
>
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>   at org.apache.spark.sql.Schem

[jira] [Updated] (SPARK-4959) Attributes are case sensitive when using a select query from a projection

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4959:
---
Priority: Blocker  (was: Critical)

> Attributes are case sensitive when using a select query from a projection
> -
>
> Key: SPARK-4959
> URL: https://issues.apache.org/jira/browse/SPARK-4959
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: Andy Konwinski
>Priority: Blocker
>  Labels: backport-needed
>
> Per [~marmbrus], see this line of code, where we should be using an attribute 
> map
>  
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala#L147
> To reproduce, i ran the following in the Spark shell:
> {code}
> import sqlContext._
> sql("drop table if exists test")
> sql("create table test (col1 string)")
> sql("""insert into table test select "hi" from prejoined limit 1""")
> val projection = "col1".attr.as(Symbol("CaseSensitiveColName")) :: 
> "col1".attr.as(Symbol("CaseSensitiveColName2")) :: Nil
> sqlContext.table("test").select(projection:_*).registerTempTable("test2")
> # This succeeds.
> sql("select CaseSensitiveColName from test2").first()
> # This fails with java.util.NoSuchElementException: key not found: 
> casesensitivecolname#23046
> sql("select casesensitivecolname from test2").first()
> {code}
> The full stack trace printed for the final command that is failing: 
> {code}
> java.util.NoSuchElementException: key not found: casesensitivecolname#23046
>   at scala.collection.MapLike$class.default(MapLike.scala:228)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.default(AttributeMap.scala:29)
>   at scala.collection.MapLike$class.apply(MapLike.scala:141)
>   at 
> org.apache.spark.sql.catalyst.expressions.AttributeMap.apply(AttributeMap.scala:29)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
>   at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at 
> org.apache.spark.sql.hive.execution.HiveTableScan.(HiveTableScan.scala:57)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$$anonfun$14.apply(HiveStrategies.scala:221)
>   at 
> org.apache.spark.sql.SQLContext$SparkPlanner.pruneFilterProject(SQLContext.scala:378)
>   at 
> org.apache.spark.sql.hive.HiveStrategies$HiveTableScans$.apply(HiveStrategies.scala:217)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.planLater(QueryPlanner.scala:54)
>   at 
> org.apache.spark.sql.execution.SparkStrategies$BasicOperators$.apply(SparkStrategies.scala:285)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner$$anonfun$1.apply(QueryPlanner.scala:58)
>   at scala.collection.Iterator$$anon$13.hasNext(Iterator.scala:371)
>   at 
> org.apache.spark.sql.catalyst.planning.QueryPlanner.apply(QueryPlanner.scala:59)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan$lzycompute(SQLContext.scala:418)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.sparkPlan(SQLContext.scala:416)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan$lzycompute(SQLContext.scala:422)
>   at 
> org.apache.spark.sql.SQLContext$QueryExecution.executedPlan(SQLContext.scala:422)
>   at org.apache.spark.sql.SchemaRDD.collect(SchemaRDD.scala:4

[jira] [Resolved] (SPARK-4923) Add Developer API to REPL to allow re-publishing the REPL jar

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-4923.

  Resolution: Fixed
Target Version/s: 1.3.0  (was: 1.3.0, 1.2.1)

I updated the title of this to reflect the work that actually happened in 
Chip's patch. And SPARK-5289 is tracking publishing of the artifacts.

> Add Developer API to REPL to allow re-publishing the REPL jar
> -
>
> Key: SPARK-4923
> URL: https://issues.apache.org/jira/browse/SPARK-4923
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Shell
>Affects Versions: 1.2.0
>Reporter: Peng Cheng
>Assignee: Chip Senkbeil
>Priority: Critical
>  Labels: shell
> Attachments: 
> SPARK-4923__Maven_build_should_keep_publishing_spark-repl.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Spark-repl installation and deployment has been discontinued (see 
> SPARK-3452). But its in the dependency list of a few projects that extends 
> its initialization process.
> Please remove the 'skip' setting in spark-repl and make it an 'official' API 
> to encourage more platform to integrate with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4923) Add Developer API to REPL to allow re-publishing the REPL jar

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4923:
---
Assignee: Chip Senkbeil

> Add Developer API to REPL to allow re-publishing the REPL jar
> -
>
> Key: SPARK-4923
> URL: https://issues.apache.org/jira/browse/SPARK-4923
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Shell
>Affects Versions: 1.2.0
>Reporter: Peng Cheng
>Assignee: Chip Senkbeil
>Priority: Critical
>  Labels: shell
> Attachments: 
> SPARK-4923__Maven_build_should_keep_publishing_spark-repl.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Spark-repl installation and deployment has been discontinued (see 
> SPARK-3452). But its in the dependency list of a few projects that extends 
> its initialization process.
> Please remove the 'skip' setting in spark-repl and make it an 'official' API 
> to encourage more platform to integrate with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4923) Add Developer API to REPL to allow re-publishing the REPL jar

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4923?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4923:
---
Summary: Add Developer API to REPL to allow re-publishing the REPL jar  
(was: Maven build should keep publishing spark-repl)

> Add Developer API to REPL to allow re-publishing the REPL jar
> -
>
> Key: SPARK-4923
> URL: https://issues.apache.org/jira/browse/SPARK-4923
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Spark Shell
>Affects Versions: 1.2.0
>Reporter: Peng Cheng
>Priority: Critical
>  Labels: shell
> Attachments: 
> SPARK-4923__Maven_build_should_keep_publishing_spark-repl.patch
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> Spark-repl installation and deployment has been discontinued (see 
> SPARK-3452). But its in the dependency list of a few projects that extends 
> its initialization process.
> Please remove the 'skip' setting in spark-repl and make it an 'official' API 
> to encourage more platform to integrate with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-5289) Backport publishing of repl, yarn into branch-1.2

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-5289:
---
Fix Version/s: 1.2.1

> Backport publishing of repl, yarn into branch-1.2
> -
>
> Key: SPARK-5289
> URL: https://issues.apache.org/jira/browse/SPARK-5289
> Project: Spark
>  Issue Type: Improvement
>    Reporter: Patrick Wendell
>        Assignee: Patrick Wendell
>Priority: Blocker
> Fix For: 1.2.1
>
>
> In SPARK-3452 we did some clean-up of published artifacts that turned out to 
> adversely affect some users. This has been mostly patched up in master via 
> SPARK-4925 (hive-thritserver) which was backported. For the repl and yarn 
> modules, they were fixed in SPARK-4048 as part of a larger change that only 
> went into master.
> Those pieces should be backported to Spark 1.2 to allow publishing in a 1.2.1 
> release.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5276) pyspark.streaming is not included in assembly jar

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5276.

Resolution: Duplicate

> pyspark.streaming is not included in assembly jar
> -
>
> Key: SPARK-5276
> URL: https://issues.apache.org/jira/browse/SPARK-5276
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 1.2.0, 1.3.0
>Reporter: Davies Liu
>Priority: Blocker
>
> The pyspark.streaming is not included in assembly jar of spark.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-5329) UIWorkloadGenerator should stop SparkContext.

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-5329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell resolved SPARK-5329.

   Resolution: Fixed
Fix Version/s: 1.3.0
 Assignee: Kousuke Saruta

> UIWorkloadGenerator should stop SparkContext.
> -
>
> Key: SPARK-5329
> URL: https://issues.apache.org/jira/browse/SPARK-5329
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 1.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
> Fix For: 1.3.0
>
>
> UIWorkloadGenerator don't stop SparkContext. I ran UIWorkloadGenerator and 
> try to watch the result at WebUI but Jobs are marked as finished.
> It's because SparkContext is not stopped.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-4660) JavaSerializer uses wrong classloader

2015-01-20 Thread Patrick Wendell (JIRA)

 [ 
https://issues.apache.org/jira/browse/SPARK-4660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Patrick Wendell updated SPARK-4660:
---
Assignee: Piotr Kołaczkowski

> JavaSerializer uses wrong classloader
> -
>
> Key: SPARK-4660
> URL: https://issues.apache.org/jira/browse/SPARK-4660
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 1.1.0, 1.1.1
>Reporter: Piotr Kołaczkowski
>Assignee: Piotr Kołaczkowski
>Priority: Critical
> Fix For: 1.3.0, 1.1.2, 1.2.1
>
> Attachments: spark-serializer-classloader.patch
>
>
> During testing we found failures when trying to load some classes of the user 
> application:
> {noformat}
> ERROR 2014-11-29 20:01:56 org.apache.spark.storage.BlockManagerWorker: 
> Exception handling buffer message
> java.lang.ClassNotFoundException: 
> org.apache.spark.demo.HttpReceiverCases$HttpRequest
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
>   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
>   at java.security.AccessController.doPrivileged(Native Method)
>   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
>   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
>   at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
>   at java.lang.Class.forName0(Native Method)
>   at java.lang.Class.forName(Class.java:270)
>   at org.apache.spark.serializer.JavaDeseriali
> zationStream$$anon$1.resolveClass(JavaSerializer.scala:59)
>   at 
> java.io.ObjectInputStream.readNonProxyDesc(ObjectInputStream.java:1612)
>   at java.io.ObjectInputStream.readClassDesc(ObjectInputStream.java:1517)
>   at 
> java.io.ObjectInputStream.readOrdinaryObject(ObjectInputStream.java:1771)
>   at java.io.ObjectInputStream.readObject0(ObjectInputStream.java:1350)
>   at java.io.ObjectInputStream.readObject(ObjectInputStream.java:370)
>   at 
> org.apache.spark.serializer.JavaDeserializationStream.readObject(JavaSerializer.scala:62)
>   at 
> org.apache.spark.serializer.DeserializationStream$$anon$1.getNext(Serializer.scala:133)
>   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:71)
>   at 
> org.apache.spark.storage.MemoryStore.unrollSafely(MemoryStore.scala:235)
>   at 
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:126)
>   at 
> org.apache.spark.storage.MemoryStore.putIterator(MemoryStore.scala:104)
>   at org.apache.spark.storage.MemoryStore.putBytes(MemoryStore.scala:76)
>   at org.apache.spark.storage.BlockManager.doPut(BlockManager.scala:748)
>   at 
> org.apache.spark.storage.BlockManager.putBytes(BlockManager.scala:639)
>   at 
> org.apache.spark.storage.BlockManagerWorker.putBlock(BlockManagerWorker.scala:92)
>   at 
> org.apache.spark.storage.BlockManagerWorker.processBlockMessage(BlockManagerWorker.scala:73)
>   at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:48)
>   at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$2.apply(BlockManagerWorker.scala:48)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at 
> org.apache.spark.storage.BlockMessageArray.foreach(BlockMessageArray.scala:28)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at 
> org.apache.spark.storage.BlockMessageArray.map(BlockMessageArray.scala:28)
>   at 
> org.apache.spark.storage.BlockManagerWorker.onBlockMessageReceive(BlockManagerWorker.scala:48)
>   at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:38)
>   at 
> org.apache.spark.storage.BlockManagerWorker$$anonfun$1.apply(BlockManagerWorker.scala:38)
>   at 
> org.apache.spark.network.ConnectionManager.org$apache$spark$network$ConnectionManager$$handleMessage(ConnectionManager.scala:682)
>   at 
> org.apache.spark.network.ConnectionManager$$anon$10.run(ConnectionManager.scala:520)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> ja

<    7   8   9   10   11   12   13   14   15   16   >