[jira] [Created] (SPARK-4446) MetadataCleaner schedule task with a wrong param for delay time .

2014-11-17 Thread Leo (JIRA)
Leo created SPARK-4446: -- Summary: MetadataCleaner schedule task with a wrong param for delay time . Key: SPARK-4446 URL: https://issues.apache.org/jira/browse/SPARK-4446 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-4446) MetadataCleaner schedule task with a wrong param for delay time .

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214400#comment-14214400 ] Apache Spark commented on SPARK-4446: - User 'Leolh' has created a pull request for

[jira] [Commented] (SPARK-4306) LogisticRegressionWithLBFGS support for PySpark MLlib

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214406#comment-14214406 ] Apache Spark commented on SPARK-4306: - User 'davies' has created a pull request for

[jira] [Commented] (SPARK-2208) local metrics tests can fail on fast machines

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214407#comment-14214407 ] Apache Spark commented on SPARK-2208: - User 'XuefengWu' has created a pull request for

[jira] [Created] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4447: - Summary: Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha Key: SPARK-4447 URL: https://issues.apache.org/jira/browse/SPARK-4447

[jira] [Updated] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4447: -- Description: For example, YarnRMClient and YarnRMClientImpl can be merged YarnAllocator and

[jira] [Commented] (SPARK-4447) Remove layers of abstraction in YARN code no longer needed after dropping yarn-alpha

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4447?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214411#comment-14214411 ] Sandy Ryza commented on SPARK-4447: --- Planning to work on this. Remove layers of

[jira] [Commented] (SPARK-4306) LogisticRegressionWithLBFGS support for PySpark MLlib

2014-11-17 Thread Varadharajan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214415#comment-14214415 ] Varadharajan commented on SPARK-4306: - [~matei] I'm really sorry. I'm quite occupied

[jira] [Created] (SPARK-4448) Support ConstantObjectInspector for unwrapping data

2014-11-17 Thread Cheng Hao (JIRA)
Cheng Hao created SPARK-4448: Summary: Support ConstantObjectInspector for unwrapping data Key: SPARK-4448 URL: https://issues.apache.org/jira/browse/SPARK-4448 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-4448) Support ConstantObjectInspector for unwrapping data

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4448?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214429#comment-14214429 ] Apache Spark commented on SPARK-4448: - User 'chenghao-intel' has created a pull

[jira] [Commented] (SPARK-4445) Don't display storage level in toDebugString unless RDD is persisted

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4445?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1421#comment-1421 ] Apache Spark commented on SPARK-4445: - User 'ScrapCodes' has created a pull request

[jira] [Commented] (SPARK-4442) Move common unit test utilities into their own package / module

2014-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214461#comment-14214461 ] Sean Owen commented on SPARK-4442: -- You can already depend on just core's test code from

[jira] [Updated] (SPARK-3962) Mark spark dependency as provided in external libraries

2014-11-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3962: --- Issue Type: Bug (was: Improvement) Mark spark dependency as provided in external libraries

[jira] [Commented] (SPARK-3962) Mark spark dependency as provided in external libraries

2014-11-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214480#comment-14214480 ] Patrick Wendell commented on SPARK-3962: I think this is causing the build to fail

[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214486#comment-14214486 ] Sean Owen commented on SPARK-4402: -- Can the Spark code go back and check this before any

[jira] [Created] (SPARK-4449) specify port range in spark

2014-11-17 Thread wangfei (JIRA)
wangfei created SPARK-4449: -- Summary: specify port range in spark Key: SPARK-4449 URL: https://issues.apache.org/jira/browse/SPARK-4449 Project: Spark Issue Type: Bug Components: Spark

[jira] [Commented] (SPARK-4449) specify port range in spark

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4449?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214587#comment-14214587 ] Apache Spark commented on SPARK-4449: - User 'scwf' has created a pull request for this

[jira] [Commented] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-17 Thread Vijay (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214635#comment-14214635 ] Vijay commented on SPARK-4402: -- Thanks for the explanation. It is clear now. Output path

[jira] [Resolved] (SPARK-4402) Output path validation of an action statement resulting in runtime exception

2014-11-17 Thread Vijay (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4402?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vijay resolved SPARK-4402. -- Resolution: Not a Problem Output path validation of an action statement resulting in runtime exception

[jira] [Commented] (SPARK-4411) Add kill link for jobs in the UI

2014-11-17 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214741#comment-14214741 ] Thomas Graves commented on SPARK-4411: -- Please make sure that the modify acls work

[jira] [Comment Edited] (SPARK-4411) Add kill link for jobs in the UI

2014-11-17 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4411?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214741#comment-14214741 ] Thomas Graves edited comment on SPARK-4411 at 11/17/14 3:34 PM:

[jira] [Created] (SPARK-4450) SparkSQL producing incorrect answer when using --master yarn

2014-11-17 Thread Rick Bischoff (JIRA)
Rick Bischoff created SPARK-4450: Summary: SparkSQL producing incorrect answer when using --master yarn Key: SPARK-4450 URL: https://issues.apache.org/jira/browse/SPARK-4450 Project: Spark

[jira] [Created] (SPARK-4451) force to kill process after 5 seconds

2014-11-17 Thread WangTaoTheTonic (JIRA)
WangTaoTheTonic created SPARK-4451: -- Summary: force to kill process after 5 seconds Key: SPARK-4451 URL: https://issues.apache.org/jira/browse/SPARK-4451 Project: Spark Issue Type:

[jira] [Commented] (SPARK-4451) force to kill process after 5 seconds

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214880#comment-14214880 ] Apache Spark commented on SPARK-4451: - User 'WangTaoTheTonic' has created a pull

[jira] [Created] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
tianshuo created SPARK-4452: --- Summary: Enhance Sort-based Shuffle to avoid spilling small files Key: SPARK-4452 URL: https://issues.apache.org/jira/browse/SPARK-4452 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214903#comment-14214903 ] Apache Spark commented on SPARK-4452: - User 'tsdeng' has created a pull request for

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianshuo updated SPARK-4452: Description: When an Aggregator is used with ExternalSorter in a task, spark will create many small files

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianshuo updated SPARK-4452: Description: When an Aggregator is used with ExternalSorter in a task, spark will create many small files

[jira] [Created] (SPARK-4453) Simplify Parquet record filter generation

2014-11-17 Thread Cheng Lian (JIRA)
Cheng Lian created SPARK-4453: - Summary: Simplify Parquet record filter generation Key: SPARK-4453 URL: https://issues.apache.org/jira/browse/SPARK-4453 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-4213) SparkSQL - ParquetFilters - No support for LT, LTE, GT, GTE operators

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214950#comment-14214950 ] Apache Spark commented on SPARK-4213: - User 'liancheng' has created a pull request for

[jira] [Commented] (SPARK-4453) Simplify Parquet record filter generation

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214949#comment-14214949 ] Apache Spark commented on SPARK-4453: - User 'liancheng' has created a pull request for

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214962#comment-14214962 ] tianshuo commented on SPARK-4452: - Originally, we found this problem by seeing Too Many

[jira] [Created] (SPARK-4454) Race condition in DAGScheduler

2014-11-17 Thread Rafal Kwasny (JIRA)
Rafal Kwasny created SPARK-4454: --- Summary: Race condition in DAGScheduler Key: SPARK-4454 URL: https://issues.apache.org/jira/browse/SPARK-4454 Project: Spark Issue Type: Bug Affects

[jira] [Updated] (SPARK-4454) Race condition in DAGScheduler

2014-11-17 Thread Rafal Kwasny (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rafal Kwasny updated SPARK-4454: Description: It seems to be a race condition in DAGScheduler that manifests on jobs with high

[jira] [Updated] (SPARK-2811) update algebird to 0.8.1

2014-11-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-2811: --- Assignee: Adam Pingel update algebird to 0.8.1

[jira] [Resolved] (SPARK-2811) update algebird to 0.8.1

2014-11-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-2811. Resolution: Fixed Fix Version/s: 1.2.0 update algebird to 0.8.1

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] tianshuo updated SPARK-4452: Description: When an Aggregator is used with ExternalSorter in a task, spark will create many small files

[jira] [Created] (SPARK-4455) Exclude dependency on hbase-annotations module

2014-11-17 Thread Ted Yu (JIRA)
Ted Yu created SPARK-4455: - Summary: Exclude dependency on hbase-annotations module Key: SPARK-4455 URL: https://issues.apache.org/jira/browse/SPARK-4455 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-4444) Drop VD type parameter from EdgeRDD

2014-11-17 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-. Resolution: Fixed Fix Version/s: 1.2.0 Drop VD type parameter from EdgeRDD

[jira] [Commented] (SPARK-4409) Additional (but limited) Linear Algebra Utils

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4409?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14214999#comment-14214999 ] Apache Spark commented on SPARK-4409: - User 'brkyvz' has created a pull request for

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215018#comment-14215018 ] Sandy Ryza commented on SPARK-4452: --- I haven't thought the implications out fully, but

[jira] [Commented] (SPARK-3630) Identify cause of Kryo+Snappy PARSING_ERROR

2014-11-17 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215019#comment-14215019 ] Ryan Williams commented on SPARK-3630: -- [~aash] I've not seen this since my previous

[jira] [Commented] (SPARK-4434) spark-submit cluster deploy mode JAR URLs are broken in 1.1.1

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215022#comment-14215022 ] Apache Spark commented on SPARK-4434: - User 'davies' has created a pull request for

[jira] [Updated] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4452: -- Affects Version/s: 1.1.0 Enhance Sort-based Shuffle to avoid spilling small files

[jira] [Commented] (SPARK-4455) Exclude dependency on hbase-annotations module

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215028#comment-14215028 ] Apache Spark commented on SPARK-4455: - User 'tedyu' has created a pull request for

[jira] [Commented] (SPARK-1358) Continuous integrated test should be involved in Spark ecosystem

2014-11-17 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215042#comment-14215042 ] shane knapp commented on SPARK-1358: [~aash] -- depending on the hardware reqs, we

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215047#comment-14215047 ] Sandy Ryza commented on SPARK-4452: --- A third possible fix would be to have the shuffle

[jira] [Created] (SPARK-4456) Document why spilling depends on both elements read and memory used

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4456: - Summary: Document why spilling depends on both elements read and memory used Key: SPARK-4456 URL: https://issues.apache.org/jira/browse/SPARK-4456 Project: Spark

[jira] [Commented] (SPARK-4393) Memory leak in connection manager timeout thread

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215054#comment-14215054 ] Apache Spark commented on SPARK-4393: - User 'sarutak' has created a pull request for

[jira] [Created] (SPARK-4457) Document how to build for Hadoop versions greater than 2.4

2014-11-17 Thread Sandy Ryza (JIRA)
Sandy Ryza created SPARK-4457: - Summary: Document how to build for Hadoop versions greater than 2.4 Key: SPARK-4457 URL: https://issues.apache.org/jira/browse/SPARK-4457 Project: Spark Issue

[jira] [Commented] (SPARK-3717) DecisionTree, RandomForest: Partition by feature

2014-11-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215058#comment-14215058 ] Joseph K. Bradley commented on SPARK-3717: -- I took a look at the rowToColumnStore

[jira] [Commented] (SPARK-4457) Document how to build for Hadoop versions greater than 2.4

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215060#comment-14215060 ] Apache Spark commented on SPARK-4457: - User 'sryza' has created a pull request for

[jira] [Commented] (SPARK-4434) spark-submit cluster deploy mode JAR URLs are broken in 1.1.1

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215061#comment-14215061 ] Apache Spark commented on SPARK-4434: - User 'davies' has created a pull request for

[jira] [Commented] (SPARK-4439) Expose RandomForest in Python

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4439?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215067#comment-14215067 ] Apache Spark commented on SPARK-4439: - User 'davies' has created a pull request for

[jira] [Created] (SPARK-4458) Skip compilation of tests classes when using make-distribution

2014-11-17 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-4458: Summary: Skip compilation of tests classes when using make-distribution Key: SPARK-4458 URL: https://issues.apache.org/jira/browse/SPARK-4458 Project: Spark

[jira] [Updated] (SPARK-4458) Skip compilation of tests classes when using make-distribution

2014-11-17 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-4458: - The make-distribution generates Spark distributions, and therefore does not require building of test

[jira] [Commented] (SPARK-4458) Skip compilation of tests classes when using make-distribution

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4458?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215087#comment-14215087 ] Apache Spark commented on SPARK-4458: - User 'tdas' has created a pull request for this

[jira] [Updated] (SPARK-4266) Avoid expensive JavaScript for StagePages with huge numbers of tasks

2014-11-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-4266: -- Assignee: Kay Ousterhout Avoid expensive JavaScript for StagePages with huge numbers of tasks

[jira] [Updated] (SPARK-4180) SparkContext constructor should throw exception if another SparkContext is already running

2014-11-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4180?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-4180: --- Fix Version/s: 1.2.0 SparkContext constructor should throw exception if another SparkContext

[jira] [Created] (SPARK-4459) JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors

2014-11-17 Thread Alok Saldanha (JIRA)
Alok Saldanha created SPARK-4459: Summary: JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors Key: SPARK-4459 URL: https://issues.apache.org/jira/browse/SPARK-4459 Project:

[jira] [Commented] (SPARK-4459) JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors

2014-11-17 Thread Alok Saldanha (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215202#comment-14215202 ] Alok Saldanha commented on SPARK-4459: -- I created a standalone gist to demonstrate

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215222#comment-14215222 ] tianshuo commented on SPARK-4452: - Hi, [~sandyr]: Your concern about data structures

[jira] [Commented] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215233#comment-14215233 ] tianshuo commented on SPARK-4452: - Currently, the two instances of Spillable,

[jira] [Created] (SPARK-4460) RandomForest classification uses wrong threshold

2014-11-17 Thread Joseph K. Bradley (JIRA)
Joseph K. Bradley created SPARK-4460: Summary: RandomForest classification uses wrong threshold Key: SPARK-4460 URL: https://issues.apache.org/jira/browse/SPARK-4460 Project: Spark Issue

[jira] [Comment Edited] (SPARK-4452) Enhance Sort-based Shuffle to avoid spilling small files

2014-11-17 Thread tianshuo (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215222#comment-14215222 ] tianshuo edited comment on SPARK-4452 at 11/17/14 10:00 PM:

[jira] [Updated] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sandy Ryza updated SPARK-4452: -- Summary: Shuffle data structures can starve others on the same thread for memory (was: Enhance

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215269#comment-14215269 ] Sandy Ryza commented on SPARK-4452: --- Updated the title to reflect the specific problem.

[jira] [Commented] (SPARK-4434) spark-submit cluster deploy mode JAR URLs are broken in 1.1.1

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215279#comment-14215279 ] Apache Spark commented on SPARK-4434: - User 'sarutak' has created a pull request for

[jira] [Commented] (SPARK-4459) JavaRDDLike.groupBy[K](f: JFunction[T, K]) may fail with typechecking errors

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215301#comment-14215301 ] Apache Spark commented on SPARK-4459: - User 'alokito' has created a pull request for

[jira] [Created] (SPARK-4461) Spark should not relies on mapred-site.xml for classpath

2014-11-17 Thread Zhan Zhang (JIRA)
Zhan Zhang created SPARK-4461: - Summary: Spark should not relies on mapred-site.xml for classpath Key: SPARK-4461 URL: https://issues.apache.org/jira/browse/SPARK-4461 Project: Spark Issue Type:

[jira] [Commented] (SPARK-4395) Running a Spark SQL SELECT command from PySpark causes a hang for ~ 1 hour

2014-11-17 Thread Davies Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4395?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215327#comment-14215327 ] Davies Liu commented on SPARK-4395: --- Workaround: remove cache() or cache() after

[jira] [Comment Edited] (SPARK-1867) Spark Documentation Error causes java.lang.IllegalStateException: unread block data

2014-11-17 Thread Anson Abraham (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1867?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14212852#comment-14212852 ] Anson Abraham edited comment on SPARK-1867 at 11/17/14 10:40 PM:

[jira] [Updated] (SPARK-4461) Spark should not relies on mapred-site.xml for classpath

2014-11-17 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhan Zhang updated SPARK-4461: -- Description: Currently spark read mapred-site.xml to get the class path. From hadoop 2.6, the library

[jira] [Updated] (SPARK-2087) Clean Multi-user semantics for thrift JDBC/ODBC server.

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2087?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2087: Target Version/s: 1.3.0 (was: 1.2.0) Clean Multi-user semantics for thrift JDBC/ODBC

[jira] [Updated] (SPARK-4338) Remove yarn-alpha support

2014-11-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-4338: - Assignee: Sandy Ryza Remove yarn-alpha support - Key:

[jira] [Created] (SPARK-4462) flume-sink build broken in SBT

2014-11-17 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-4462: --- Summary: flume-sink build broken in SBT Key: SPARK-4462 URL: https://issues.apache.org/jira/browse/SPARK-4462 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-3184) Allow user to specify num tasks to use for a table

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3184?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3184: Target Version/s: 1.3.0 (was: 1.2.0) Allow user to specify num tasks to use for a table

[jira] [Updated] (SPARK-4443) Statistics bug for external table in spark sql hive

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4443?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4443: Priority: Critical (was: Major) Statistics bug for external table in spark sql hive

[jira] [Updated] (SPARK-2873) OOM happens when group by and join operation with big data

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2873?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2873: Target Version/s: 1.3.0 (was: 1.2.0) OOM happens when group by and join operation with

[jira] [Updated] (SPARK-4074) No exception for drop nonexistent table

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4074: Target Version/s: 1.3.0 (was: 1.2.0) No exception for drop nonexistent table

[jira] [Resolved] (SPARK-3720) support ORC in spark sql

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3720?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3720. - Resolution: Duplicate support ORC in spark sql

[jira] [Updated] (SPARK-2472) Spark SQL Thrift server sometimes assigns wrong job group name

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2472?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2472: Target Version/s: 1.3.0 (was: 1.2.0) Spark SQL Thrift server sometimes assigns wrong job

[jira] [Updated] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3298: Assignee: (was: Michael Armbrust) [SQL] registerAsTable / registerTempTable overwrites

[jira] [Updated] (SPARK-3298) [SQL] registerAsTable / registerTempTable overwrites old tables

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3298?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3298: Target Version/s: 1.3.0 (was: 1.2.0) [SQL] registerAsTable / registerTempTable overwrites

[jira] [Updated] (SPARK-2554) CountDistinct and SumDistinct should do partial aggregation

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2554?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2554: Target Version/s: 1.3.0 (was: 1.2.0) CountDistinct and SumDistinct should do partial

[jira] [Updated] (SPARK-3379) Implement 'POWER' for sql

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3379?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3379: Target Version/s: 1.3.0 (was: 1.2.0) Implement 'POWER' for sql -

[jira] [Updated] (SPARK-4269) Make wait time in BroadcastHashJoin configurable

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4269: Target Version/s: 1.3.0 (was: 1.2.0) Make wait time in BroadcastHashJoin configurable

[jira] [Updated] (SPARK-3955) Different versions between jackson-mapper-asl and jackson-core-asl

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3955: Component/s: Build Different versions between jackson-mapper-asl and jackson-core-asl

[jira] [Updated] (SPARK-3955) Different versions between jackson-mapper-asl and jackson-core-asl

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3955?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3955: Target Version/s: 1.3.0 (was: 1.1.1, 1.2.0) Different versions between jackson-mapper-asl

[jira] [Updated] (SPARK-2551) Cleanup FilteringParquetRowInputFormat

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2551: Target Version/s: 1.3.0 (was: 1.2.0) Cleanup FilteringParquetRowInputFormat

[jira] [Updated] (SPARK-2449) Spark sql reflection code requires a constructor taking all the fields for the table

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2449: Target Version/s: 1.3.0 (was: 1.2.0) Spark sql reflection code requires a constructor

[jira] [Updated] (SPARK-4453) Simplify Parquet record filter generation

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4453: Assignee: Cheng Lian Simplify Parquet record filter generation

[jira] [Updated] (SPARK-2178) createSchemaRDD is not thread safe

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2178: Target Version/s: 1.3.0 (was: 1.2.0) createSchemaRDD is not thread safe

[jira] [Updated] (SPARK-4453) Simplify Parquet record filter generation

2014-11-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-4453: Priority: Critical (was: Major) Simplify Parquet record filter generation

[jira] [Commented] (SPARK-4266) Avoid expensive JavaScript for StagePages with huge numbers of tasks

2014-11-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215391#comment-14215391 ] Apache Spark commented on SPARK-4266: - User 'kayousterhout' has created a pull request

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215395#comment-14215395 ] Andrew Or commented on SPARK-4452: -- Hey [~tianshuo] do you see this issue only for

[jira] [Created] (SPARK-4463) Add (de)select all button for additional metrics in webUI

2014-11-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-4463: - Summary: Add (de)select all button for additional metrics in webUI Key: SPARK-4463 URL: https://issues.apache.org/jira/browse/SPARK-4463 Project: Spark

[jira] [Closed] (SPARK-4460) RandomForest classification uses wrong threshold

2014-11-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley closed SPARK-4460. Resolution: Invalid Realized this was invalid. Current implementation is fine, except for

[jira] [Commented] (SPARK-4452) Shuffle data structures can starve others on the same thread for memory

2014-11-17 Thread Tianshuo Deng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4452?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215411#comment-14215411 ] Tianshuo Deng commented on SPARK-4452: -- Hi, [~andrewor14]: Actually hash-based

[jira] [Commented] (SPARK-3633) Fetches failure observed after SPARK-2711

2014-11-17 Thread Arun Ahuja (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3633?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14215417#comment-14215417 ] Arun Ahuja commented on SPARK-3633: --- [~andrewor14] We were using Hash-Based shuffle when

  1   2   >