[jira] [Resolved] (SPARK-4486) Improve GradientBoosting APIs and doc

2014-11-20 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-4486. -- Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 3374

[jira] [Updated] (SPARK-4486) Improve GradientBoosting APIs and doc

2014-11-20 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-4486: - Assignee: Xiangrui Meng Improve GradientBoosting APIs and doc

[jira] [Created] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-11-20 Thread Fan Jiang (JIRA)
Fan Jiang created SPARK-4510: Summary: Add k-medoids Partitioning Around Medoids (PAM) algorithm Key: SPARK-4510 URL: https://issues.apache.org/jira/browse/SPARK-4510 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-11-20 Thread Fan Jiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Jiang updated SPARK-4510: - Description: PAM (k-medoids) is more robust to noise and outliers as compared to k-means because it

[jira] [Updated] (SPARK-4494) IDFModel.transform() add support for single vector

2014-11-20 Thread Jean-Philippe Quemener (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jean-Philippe Quemener updated SPARK-4494: -- Affects Version/s: 1.1.0 IDFModel.transform() add support for single vector

[jira] [Updated] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-11-20 Thread Fan Jiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fan Jiang updated SPARK-4510: - Description: PAM (k-medoids) is more robust to noise and outliers as compared to k-means because it

[jira] [Updated] (SPARK-4481) Some comments for `updateStateByKey` are wrong

2014-11-20 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-4481: - Assignee: Shixiong Zhu Some comments for `updateStateByKey` are wrong

[jira] [Resolved] (SPARK-4481) Some comments for `updateStateByKey` are wrong

2014-11-20 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-4481. -- Resolution: Fixed Fix Version/s: 1.2.0 Some comments for `updateStateByKey` are wrong

[jira] [Commented] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219163#comment-14219163 ] Apache Spark commented on SPARK-4510: - User 'fjiang6' has created a pull request for

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2014-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219171#comment-14219171 ] Sean Owen commented on SPARK-4492: -- You don't run apps with java -cp ...; use

[jira] [Commented] (SPARK-4501) Create build/mvn to automatically download maven/zinc/scalac

2014-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4501?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219173#comment-14219173 ] Sean Owen commented on SPARK-4501: -- Automatically setting up zinc sounds like a good idea

[jira] [Commented] (SPARK-4490) Not found RandomGenerator through spark-shell

2014-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219176#comment-14219176 ] Sean Owen commented on SPARK-4490: -- So, in general, you have to supply your own

[jira] [Updated] (SPARK-4511) snappy-1.0.5.3 needs GLIBCXX_3.4.9

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hoelog updated SPARK-4511: -- Description: executors failed with *Could not initialize class org.xerial.snappy.Snappy* or

[jira] [Updated] (SPARK-4511) snappy-1.0.5.3 needs GLIBCXX_3.4.9

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hoelog updated SPARK-4511: -- Description: executors failed with *Could not initialize class org.xerial.snappy.Snappy* or

[jira] [Commented] (SPARK-4499) Problems building and running 1.2 with Hive support

2014-11-20 Thread George Kyriacou (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4499?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219203#comment-14219203 ] George Kyriacou commented on SPARK-4499: Spark 1.2.0 built with the command: mvn

[jira] [Commented] (SPARK-2579) Reading from S3 returns an inconsistent number of items with Spark 0.9.1

2014-11-20 Thread Lars Albertsson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2579?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219212#comment-14219212 ] Lars Albertsson commented on SPARK-2579: Is this problem solved by enabling the

[jira] [Commented] (SPARK-4490) Not found RandomGenerator through spark-shell

2014-11-20 Thread Kai Sasaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219277#comment-14219277 ] Kai Sasaki commented on SPARK-4490: --- I found the reason why this error is occurred. The

[jira] [Commented] (SPARK-4489) JavaPairRDD.collectAsMap from checkpoint RDD may fail with ClassCastException

2014-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219280#comment-14219280 ] Sean Owen commented on SPARK-4489: -- This looks like

[jira] [Commented] (SPARK-4490) Not found RandomGenerator through spark-shell

2014-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219282#comment-14219282 ] Sean Owen commented on SPARK-4490: -- Not sure... if breeze depends on it then it would be

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2014-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219292#comment-14219292 ] sam commented on SPARK-4492: Well I do try to package Spark in my jar (had to fight an epic

[jira] [Commented] (SPARK-4491) Using sbt assembly with spark as dep requires Phd in sbt

2014-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219296#comment-14219296 ] sam commented on SPARK-4491: Could you please provide a magic template sbt build file that

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2014-11-20 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219297#comment-14219297 ] Sean Owen commented on SPARK-4492: -- OK, it may then be that you package some of spark,

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2014-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219347#comment-14219347 ] sam commented on SPARK-4492: // OK, it may then be that you package some of spark, but not the

[jira] [Commented] (SPARK-4492) Exception when following SimpleApp tutorial java.lang.ClassNotFoundException: org.apache.spark.deploy.yarn.YarnSparkHadoopUtil

2014-11-20 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219355#comment-14219355 ] sam commented on SPARK-4492: // 1) Don't embed Hadoop, Spark libraries in your app ... 3) Run

[jira] [Commented] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219356#comment-14219356 ] hoelog commented on SPARK-4354: --- I have the similar problem. the snappy 1.0.5.with-in spark

[jira] [Commented] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219370#comment-14219370 ] hoelog commented on SPARK-4354: --- In my case, I compiled spark 1.1 in the system which has

[jira] [Commented] (SPARK-4511) snappy-1.0.5.3 needs GLIBCXX_3.4.9

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219378#comment-14219378 ] hoelog commented on SPARK-4511: --- similar problem. the prebuild sparks have the snappy 1.0.5

[jira] [Closed] (SPARK-4511) snappy-1.0.5.3 needs GLIBCXX_3.4.9

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] hoelog closed SPARK-4511. - Resolution: Not a Problem to upgrade libstdc++ or to compile with lower version of libstdc++ may help.

[jira] [Comment Edited] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.x

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219370#comment-14219370 ] hoelog edited comment on SPARK-4354 at 11/20/14 2:02 PM: - In my

[jira] [Commented] (SPARK-4354) 14/11/12 09:39:00 WARN TaskSetManager: Lost task 5.0 in stage 0.0 (TID 5, HYD-RNDNW-VFRCO-RCORE2): java.lang.NoClassDefFoundError: Could not initialize class org.xerial

2014-11-20 Thread hoelog (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4354?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219418#comment-14219418 ] hoelog commented on SPARK-4354: --- Changing codec to LZ4 will also help. I also tested it.

[jira] [Created] (SPARK-4512) Unresolved Attribute Exception for sort by

2014-11-20 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4512: Summary: Unresolved Attribute Exception for sort by Key: SPARK-4512 URL: https://issues.apache.org/jira/browse/SPARK-4512 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-4512) Unresolved Attribute Exception for sort by

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4512?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219446#comment-14219446 ] Apache Spark commented on SPARK-4512: - User 'chenghao-intel' has created a pull

[jira] [Commented] (SPARK-1825) Windows Spark fails to work with Linux YARN

2014-11-20 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219490#comment-14219490 ] Ángel Álvarez commented on SPARK-1825: -- I've had the following problems to make

[jira] [Comment Edited] (SPARK-1825) Windows Spark fails to work with Linux YARN

2014-11-20 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219490#comment-14219490 ] Ángel Álvarez edited comment on SPARK-1825 at 11/20/14 3:38 PM:

[jira] [Commented] (SPARK-3837) Warn when YARN is killing containers for exceeding memory limits

2014-11-20 Thread Arun Ahuja (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219544#comment-14219544 ] Arun Ahuja commented on SPARK-3837: --- There might still be a related issue here, in the

[jira] [Comment Edited] (SPARK-3837) Warn when YARN is killing containers for exceeding memory limits

2014-11-20 Thread Arun Ahuja (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3837?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219544#comment-14219544 ] Arun Ahuja edited comment on SPARK-3837 at 11/20/14 4:19 PM: -

[jira] [Commented] (SPARK-4506) Update documentation to clarify whether standalone-cluster mode is now officially supported

2014-11-20 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219570#comment-14219570 ] Andrew Or commented on SPARK-4506: -- Thanks for filing this Josh. It is actually supported

[jira] [Updated] (SPARK-1825) Windows Spark fails to work with Linux YARN

2014-11-20 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-1825: - Target Version/s: 1.3.0 Fix Version/s: (was: 1.2.0) Windows Spark fails to work with Linux

[jira] [Commented] (SPARK-4510) Add k-medoids Partitioning Around Medoids (PAM) algorithm

2014-11-20 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219736#comment-14219736 ] Xiangrui Meng commented on SPARK-4510: -- [~fjiang6] Could you explain the complexity

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219765#comment-14219765 ] Hector Yee commented on SPARK-3633: --- I'm still seeing a similar error in spark 1.2 rc2

[jira] [Comment Edited] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-20 Thread Tianshuo Deng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219795#comment-14219795 ] Tianshuo Deng edited comment on SPARK-4452 at 11/20/14 6:57 PM:

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-20 Thread Tianshuo Deng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219795#comment-14219795 ] Tianshuo Deng commented on SPARK-4452: -- Hi, [~matei]: Hi, Matei My way of

[jira] [Commented] (SPARK-4231) Add RankingMetrics to examples.MovieLensALS

2014-11-20 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219799#comment-14219799 ] Debasish Das commented on SPARK-4231: - [~srowen] I added batch predict APIs for user

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219807#comment-14219807 ] Hector Yee commented on SPARK-3633: --- I think it may be a different bug.. looked at the

[jira] [Created] (SPARK-4513) Support relational operator '=' in Spark SQL

2014-11-20 Thread Ravindra Pesala (JIRA)
Ravindra Pesala created SPARK-4513: -- Summary: Support relational operator '=' in Spark SQL Key: SPARK-4513 URL: https://issues.apache.org/jira/browse/SPARK-4513 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4513) Support relational operator '=' in Spark SQL

2014-11-20 Thread Ravindra Pesala (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ravindra Pesala updated SPARK-4513: --- Component/s: SQL Support relational operator '=' in Spark SQL

[jira] [Commented] (SPARK-4513) Support relational operator '=' in Spark SQL

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219818#comment-14219818 ] Apache Spark commented on SPARK-4513: - User 'ravipesala' has created a pull request

[jira] [Created] (SPARK-4514) ComplexFutureAction does not preserve job group IDs

2014-11-20 Thread Erik Erlandson (JIRA)
Erik Erlandson created SPARK-4514: - Summary: ComplexFutureAction does not preserve job group IDs Key: SPARK-4514 URL: https://issues.apache.org/jira/browse/SPARK-4514 Project: Spark Issue

[jira] [Created] (SPARK-4515) OOM/GC errors with sort-based shuffle

2014-11-20 Thread Nishkam Ravi (JIRA)
Nishkam Ravi created SPARK-4515: --- Summary: OOM/GC errors with sort-based shuffle Key: SPARK-4515 URL: https://issues.apache.org/jira/browse/SPARK-4515 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-4515) OOM/GC errors with sort-based shuffle

2014-11-20 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4515: --- Description: OOM/GC errors with sort-based shuffle that go away when shuffle.manager is set to hash.

[jira] [Updated] (SPARK-4515) OOM/GC errors with sort-based shuffle

2014-11-20 Thread Nishkam Ravi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nishkam Ravi updated SPARK-4515: Description: OOM/GC errors with sort-based shuffle that go away when shuffle.manager is set to

[jira] [Created] (SPARK-4516) Race condition in netty

2014-11-20 Thread Hector Yee (JIRA)
Hector Yee created SPARK-4516: - Summary: Race condition in netty Key: SPARK-4516 URL: https://issues.apache.org/jira/browse/SPARK-4516 Project: Spark Issue Type: Bug Components:

[jira] [Created] (SPARK-4517) Improve memory efficiency for python broadcast

2014-11-20 Thread Davies Liu (JIRA)
Davies Liu created SPARK-4517: - Summary: Improve memory efficiency for python broadcast Key: SPARK-4517 URL: https://issues.apache.org/jira/browse/SPARK-4517 Project: Spark Issue Type:

[jira] [Updated] (SPARK-4516) Race condition in netty

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated SPARK-4516: -- Affects Version/s: (was: 1.1.0) 1.1.1 Race condition in netty

[jira] [Commented] (SPARK-4517) Improve memory efficiency for python broadcast

2014-11-20 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219876#comment-14219876 ] Davies Liu commented on SPARK-4517: --- cc [~joshrosen] Improve memory efficiency for

[jira] [Updated] (SPARK-4515) OOM/GC errors with sort-based shuffle

2014-11-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4515: -- Priority: Blocker (was: Major) Target Version/s: 1.2.0 I'm promoting this to a 1.2 blocker

[jira] [Commented] (SPARK-4515) OOM/GC errors with sort-based shuffle

2014-11-20 Thread Nishkam Ravi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4515?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219908#comment-14219908 ] Nishkam Ravi commented on SPARK-4515: - I run a slightly modified version of PageRank,

[jira] [Created] (SPARK-4518) Filestream sometimes processes files twice

2014-11-20 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-4518: Summary: Filestream sometimes processes files twice Key: SPARK-4518 URL: https://issues.apache.org/jira/browse/SPARK-4518 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-4519) Filestream does not use hadoop configuration set within sparkContext.hadoopConfiguration

2014-11-20 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-4519: Summary: Filestream does not use hadoop configuration set within sparkContext.hadoopConfiguration Key: SPARK-4519 URL: https://issues.apache.org/jira/browse/SPARK-4519

[jira] [Updated] (SPARK-4518) Filestream sometimes processes files twice

2014-11-20 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-4518: - Component/s: Streaming Filestream sometimes processes files twice

[jira] [Created] (SPARK-4520) SparkSQL exception when reading certain columns from a parquet file

2014-11-20 Thread sadhan sood (JIRA)
sadhan sood created SPARK-4520: -- Summary: SparkSQL exception when reading certain columns from a parquet file Key: SPARK-4520 URL: https://issues.apache.org/jira/browse/SPARK-4520 Project: Spark

[jira] [Updated] (SPARK-4516) Race condition in netty

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated SPARK-4516: -- Affects Version/s: (was: 1.1.1) 1.2.0 Race condition in netty

[jira] [Updated] (SPARK-4520) SparkSQL exception when reading certain columns from a parquet file

2014-11-20 Thread sadhan sood (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sadhan sood updated SPARK-4520: --- Attachment: part-r-0.parquet Attaching a sample parquet file which can be used to reproduce the

[jira] [Commented] (SPARK-4518) Filestream sometimes processes files twice

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219931#comment-14219931 ] Apache Spark commented on SPARK-4518: - User 'tdas' has created a pull request for this

[jira] [Commented] (SPARK-4519) Filestream does not use hadoop configuration set within sparkContext.hadoopConfiguration

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14219932#comment-14219932 ] Apache Spark commented on SPARK-4519: - User 'tdas' has created a pull request for this

[jira] [Updated] (SPARK-4520) SparkSQL exception when reading certain columns from a parquet file

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4520: Affects Version/s: 1.2.0 SparkSQL exception when reading certain columns from a parquet

[jira] [Updated] (SPARK-4520) SparkSQL exception when reading certain columns from a parquet file

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4520: Priority: Critical (was: Major) SparkSQL exception when reading certain columns from a

[jira] [Updated] (SPARK-4520) SparkSQL exception when reading certain columns from a parquet file

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4520: Description: I am seeing this issue with spark sql throwing an exception when trying to

[jira] [Updated] (SPARK-3839) Reimplement HashOuterJoin to construct hash table of only one relation

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3839: Target Version/s: 1.3.0 (was: 1.2.0) Reimplement HashOuterJoin to construct hash table of

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-11-20 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1421#comment-1421 ] Nicholas Chammas commented on SPARK-3431: - [~joshrosen] - Per [this comment |

[jira] [Resolved] (SPARK-3938) Set RDD name to table name during cache operations

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3938. - Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 3383

[jira] [Commented] (SPARK-4516) Race condition in netty

2014-11-20 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220014#comment-14220014 ] Andrew Or commented on SPARK-4516: -- Hey [~hector.yee] so what is the race condition?

[jira] [Commented] (SPARK-4516) Race condition in netty

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220025#comment-14220025 ] Hector Yee commented on SPARK-4516: --- The channel is marked as inactive while it is being

[jira] [Updated] (SPARK-4516) Race condition in netty

2014-11-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4516: --- Priority: Critical (was: Major) Race condition in netty ---

[jira] [Updated] (SPARK-4516) Race condition in netty

2014-11-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4516: --- Target Version/s: 1.2.0 Race condition in netty ---

[jira] [Updated] (SPARK-4516) Race condition in netty

2014-11-20 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-4516: --- Description: The netty block transfer manager has a race condition where it closes an active

[jira] [Resolved] (SPARK-4228) Save a ScheamRDD in JSON format

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4228?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-4228. - Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 3213

[jira] [Commented] (SPARK-3431) Parallelize execution of tests

2014-11-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3431?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220065#comment-14220065 ] Josh Rosen commented on SPARK-3431: --- [~nchammas]: It's been a while since I tried that

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-20 Thread Tianshuo Deng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220070#comment-14220070 ] Tianshuo Deng commented on SPARK-4452: -- Here is a link of the diff:

[jira] [Updated] (SPARK-2630) Input data size of CoalescedRDD is incorrect

2014-11-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2630: --- Target Version/s: 1.3.0 (was: 1.2.0) Input data size of CoalescedRDD is incorrect

[jira] [Comment Edited] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-20 Thread Tianshuo Deng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220070#comment-14220070 ] Tianshuo Deng edited comment on SPARK-4452 at 11/20/14 9:51 PM:

[jira] [Updated] (SPARK-2630) Input data size of CoalescedRDD is incorrect

2014-11-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2630: --- Target Version/s: 1.3.0, 1.2.1 (was: 1.3.0) Input data size of CoalescedRDD is incorrect

[jira] [Commented] (SPARK-4516) Race condition in netty

2014-11-20 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220106#comment-14220106 ] Aaron Davidson commented on SPARK-4516: --- The code in question, despite the log

[jira] [Commented] (SPARK-4516) Race condition in netty

2014-11-20 Thread Aaron Davidson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220114#comment-14220114 ] Aaron Davidson commented on SPARK-4516: --- Also ideally there would be some other

[jira] [Commented] (SPARK-4516) Race condition in netty

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220157#comment-14220157 ] Hector Yee commented on SPARK-4516: --- Digging deeper it looks like you are right, the

[jira] [Created] (SPARK-4521) Parquet fails to read columns with spaces in the name

2014-11-20 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-4521: --- Summary: Parquet fails to read columns with spaces in the name Key: SPARK-4521 URL: https://issues.apache.org/jira/browse/SPARK-4521 Project: Spark

[jira] [Commented] (SPARK-3697) HistoryServer cann't list event Log when there was a no permissions directory in the $spark.eventLog.dir

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3697?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220184#comment-14220184 ] Apache Spark commented on SPARK-3697: - User 'vanzin' has created a pull request for

[jira] [Commented] (SPARK-4038) Outlier Detection Algorithm for MLlib

2014-11-20 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220187#comment-14220187 ] Joseph K. Bradley commented on SPARK-4038: -- +1 for getting some outlier detection

[jira] [Commented] (SPARK-4174) Streaming: Optionally provide notifications to Receivers when DStream has been generated

2014-11-20 Thread Hari Shreedharan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4174?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220196#comment-14220196 ] Hari Shreedharan commented on SPARK-4174: - [~tdas] Comments? If we can agree on

[jira] [Created] (SPARK-4522) Failure to read parquet schema with missing metadata.

2014-11-20 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-4522: --- Summary: Failure to read parquet schema with missing metadata. Key: SPARK-4522 URL: https://issues.apache.org/jira/browse/SPARK-4522 Project: Spark

[jira] [Updated] (SPARK-4514) ComplexFutureAction does not preserve job group IDs

2014-11-20 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-4514: -- Priority: Critical (was: Major) Target Version/s: 1.1.1, 1.2.0 Affects Version/s:

[jira] [Created] (SPARK-4523) Improve handling of serialized schema information

2014-11-20 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-4523: --- Summary: Improve handling of serialized schema information Key: SPARK-4523 URL: https://issues.apache.org/jira/browse/SPARK-4523 Project: Spark Issue

[jira] [Resolved] (SPARK-4128) Create instructions on fully building Spark in Intellij

2014-11-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-4128. Resolution: Fixed Fix Version/s: 1.2.0 Assignee: Patrick Wendell I updated

[jira] [Created] (SPARK-4524) Add documentation on packaging Python dependencies / installing them on clusters

2014-11-20 Thread Josh Rosen (JIRA)
Josh Rosen created SPARK-4524: - Summary: Add documentation on packaging Python dependencies / installing them on clusters Key: SPARK-4524 URL: https://issues.apache.org/jira/browse/SPARK-4524 Project:

[jira] [Commented] (SPARK-4522) Failure to read parquet schema with missing metadata.

2014-11-20 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220220#comment-14220220 ] Apache Spark commented on SPARK-4522: - User 'marmbrus' has created a pull request for

[jira] [Resolved] (SPARK-4439) Expose RandomForest in Python

2014-11-20 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng resolved SPARK-4439. -- Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 3320

[jira] [Resolved] (SPARK-4513) Support relational operator '=' in Spark SQL

2014-11-20 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-4513. - Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 3387

[jira] [Commented] (SPARK-4479) Avoid unnecessary defensive copies when Sort based shuffle is on

2014-11-20 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220229#comment-14220229 ] Patrick Wendell commented on SPARK-4479: I did some research into this. In the

[jira] [Commented] (SPARK-4038) Outlier Detection Algorithm for MLlib

2014-11-20 Thread Anant Daksh Asthana (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14220232#comment-14220232 ] Anant Daksh Asthana commented on SPARK-4038: So AVF is based on k-modes for

[jira] [Updated] (SPARK-4516) Lost task with netty

2014-11-20 Thread Hector Yee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hector Yee updated SPARK-4516: -- Summary: Lost task with netty (was: Race condition in netty) Lost task with netty

  1   2   >