[jira] [Updated] (SPARK-3562) Periodic cleanup event logs

2014-09-17 Thread xukun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xukun updated SPARK-3562: - Summary: Periodic cleanup event logs (was: Periodic cleanup) Periodic cleanup event logs

[jira] [Updated] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated SPARK-3563: Affects Version/s: 1.0.2 Shuffle data not always be cleaned --

[jira] [Updated] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated SPARK-3563: Description: In our cluster, when we run a spark streaming job, after running for many hours, the shuffle

[jira] [Updated] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated SPARK-3563: Description: In our cluster, when we run a spark streaming job, after running for many hours, the shuffle

[jira] [Commented] (SPARK-3560) In yarn-cluster mode, jars are distributed through multiple mechanisms.

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3560?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136882#comment-14136882 ] Sandy Ryza commented on SPARK-3560: --- Right. I believe Min from LinkedIn who discovered

[jira] [Updated] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated SPARK-3563: Description: In our cluster, when we run a spark streaming job, after running for many hours, the shuffle

[jira] [Updated] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] shenhong updated SPARK-3563: Description: In our cluster, when we run a spark streaming job, after running for many hours, the shuffle

[jira] [Commented] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136899#comment-14136899 ] Sean Owen commented on SPARK-3563: -- I am no expert, but I believe this is on purpose, in

[jira] [Commented] (SPARK-3550) Disable automatic rdd caching in python api for relevant learners

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3550?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136907#comment-14136907 ] Apache Spark commented on SPARK-3550: - User 'OdinLin' has created a pull request for

[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread Evan Chan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136920#comment-14136920 ] Evan Chan commented on SPARK-2593: -- [~pwendell] I'd have to agree with Helena and

[jira] [Commented] (SPARK-3564) Display App ID on HistoryPage

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136929#comment-14136929 ] Apache Spark commented on SPARK-3564: - User 'sarutak' has created a pull request for

[jira] [Commented] (SPARK-3566) .gitignore and .rat-excludes should consider cmd file and Emacs' backup files

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3566?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136947#comment-14136947 ] Apache Spark commented on SPARK-3566: - User 'sarutak' has created a pull request for

[jira] [Commented] (SPARK-3565) make code consistent with document

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136948#comment-14136948 ] Apache Spark commented on SPARK-3565: - User 'WangTaoTheTonic' has created a pull

[jira] [Commented] (SPARK-3292) Shuffle Tasks run incessantly even though there's no inputs

2014-09-17 Thread guowei (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14136999#comment-14136999 ] guowei commented on SPARK-3292: --- [~saisai_shao] i test the scenario with windowing

[jira] [Commented] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137014#comment-14137014 ] shenhong commented on SPARK-3563: - Thanks Sean Owen! I don‘t have set spark.cleaner.ttl,

[jira] [Comment Edited] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137014#comment-14137014 ] shenhong edited comment on SPARK-3563 at 9/17/14 9:55 AM: -- Thanks

[jira] [Comment Edited] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137014#comment-14137014 ] shenhong edited comment on SPARK-3563 at 9/17/14 9:59 AM: -- Thanks

[jira] [Commented] (SPARK-3567) appId field in SparkDeploySchedulerBackend should be volatile

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137033#comment-14137033 ] Apache Spark commented on SPARK-3567: - User 'sarutak' has created a pull request for

[jira] [Created] (SPARK-3567) appId field in SparkDeploySchedulerBackend should be volatile

2014-09-17 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-3567: - Summary: appId field in SparkDeploySchedulerBackend should be volatile Key: SPARK-3567 URL: https://issues.apache.org/jira/browse/SPARK-3567 Project: Spark

[jira] [Commented] (SPARK-1719) spark.executor.extraLibraryPath isn't applied on yarn

2014-09-17 Thread Wilfred Spiegelenburg (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1719?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137176#comment-14137176 ] Wilfred Spiegelenburg commented on SPARK-1719: -- oops, not sure how I missed

[jira] [Commented] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137246#comment-14137246 ] Saisai Shao commented on SPARK-3563: In my thought, I think it relies on JVM's GC

[jira] [Commented] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-17 Thread Oleg Zhurakousky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137326#comment-14137326 ] Oleg Zhurakousky commented on SPARK-3561: - Patrick, thanks for following up.

[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread Helena Edelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137338#comment-14137338 ] Helena Edelson commented on SPARK-2593: --- [~pwendell] I forgot to not this on my

[jira] [Comment Edited] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread Helena Edelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137338#comment-14137338 ] Helena Edelson edited comment on SPARK-2593 at 9/17/14 2:42 PM:

[jira] [Comment Edited] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread Helena Edelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137338#comment-14137338 ] Helena Edelson edited comment on SPARK-2593 at 9/17/14 2:44 PM:

[jira] [Commented] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread shenhong (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137348#comment-14137348 ] shenhong commented on SPARK-3563: - Thanks, Saisai. I thank you are right, it depend on

[jira] [Comment Edited] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-17 Thread Oleg Zhurakousky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137326#comment-14137326 ] Oleg Zhurakousky edited comment on SPARK-3561 at 9/17/14 2:55 PM:

[jira] [Commented] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-17 Thread Oleg Zhurakousky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137378#comment-14137378 ] Oleg Zhurakousky commented on SPARK-3561: - Patrick, sorry as I feel like I missed

[jira] [Comment Edited] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-17 Thread Oleg Zhurakousky (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137378#comment-14137378 ] Oleg Zhurakousky edited comment on SPARK-3561 at 9/17/14 3:24 PM:

[jira] [Resolved] (SPARK-3177) Yarn-alpha ClientBaseSuite Unit test failed

2014-09-17 Thread Thomas Graves (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Thomas Graves resolved SPARK-3177. -- Resolution: Fixed Fix Version/s: (was: 1.1.1) 1.2.0 Yarn-alpha

[jira] [Commented] (SPARK-2593) Add ability to pass an existing Akka ActorSystem into Spark

2014-09-17 Thread kannapiran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2593?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137529#comment-14137529 ] kannapiran commented on SPARK-2593: --- Is there a way to add akka system in to spark

[jira] [Updated] (SPARK-3074) support groupByKey() with hot keys in PySpark

2014-09-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen updated SPARK-3074: -- Component/s: PySpark support groupByKey() with hot keys in PySpark

[jira] [Updated] (SPARK-3371) Spark SQL: Renaming a function expression with group by gives error

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3371: Target Version/s: 1.2.0 Spark SQL: Renaming a function expression with group by gives

[jira] [Updated] (SPARK-3537) Statistics for cached RDDs

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3537: Assignee: Cheng Lian Statistics for cached RDDs --

[jira] [Commented] (SPARK-3377) Metrics can be accidentally aggregated against our intention

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137669#comment-14137669 ] Apache Spark commented on SPARK-3377: - User 'sarutak' has created a pull request for

[jira] [Created] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Shuo Xiang (JIRA)
Shuo Xiang created SPARK-3568: - Summary: Add metrics for ranking algorithms Key: SPARK-3568 URL: https://issues.apache.org/jira/browse/SPARK-3568 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-2902) Change default options to be more agressive

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2902: Summary: Change default options to be more agressive (was: Enable compression for

[jira] [Commented] (SPARK-2902) Change default options to be more agressive

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2902?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137698#comment-14137698 ] Michael Armbrust commented on SPARK-2902: - Things we might consider changing:

[jira] [Updated] (SPARK-2271) Use Hive's high performance Decimal128 to replace BigDecimal

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-2271: Target Version/s: 1.2.0 Use Hive's high performance Decimal128 to replace BigDecimal

[jira] [Resolved] (SPARK-2063) Creating a SchemaRDD via sql() does not correctly resolve nested types

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2063?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-2063. - Resolution: Duplicate Fix Version/s: 1.2.0 Creating a SchemaRDD via sql() does

[jira] [Resolved] (SPARK-1694) Simplify ColumnBuilder/Accessor class hierarchy

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-1694. - Resolution: Fixed Simplify ColumnBuilder/Accessor class hierarchy

[jira] [Updated] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3568: - Priority: Minor (was: Major) Add metrics for ranking algorithms

[jira] [Updated] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3568: - Assignee: Shuo Xiang Add metrics for ranking algorithms --

[jira] [Commented] (SPARK-3530) Pipeline and Parameters

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137719#comment-14137719 ] Matei Zaharia commented on SPARK-3530: -- To comment on the versioning stuff here,

[jira] [Updated] (SPARK-3553) Spark Streaming app streams files that have already been streamed in an endless loop

2014-09-17 Thread Ezequiel Bella (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ezequiel Bella updated SPARK-3553: -- Description: We have a spark streaming app deployed in a YARN ec2 cluster with 1 name node and

[jira] [Commented] (SPARK-3530) Pipeline and Parameters

2014-09-17 Thread Eustache (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137758#comment-14137758 ] Eustache commented on SPARK-3530: - Great to see the design docs ! A few

[jira] [Commented] (SPARK-2883) Spark Support for ORCFile format

2014-09-17 Thread Zhan Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2883?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137759#comment-14137759 ] Zhan Zhang commented on SPARK-2883: --- I am starting to prototyping the last feature with

[jira] [Commented] (SPARK-2707) Upgrade to Akka 2.3

2014-09-17 Thread lee mighdoll (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137780#comment-14137780 ] lee mighdoll commented on SPARK-2707: - Now that 1.1 is out, hopefully a solution for

[jira] [Created] (SPARK-3569) Add metadata field to StructField

2014-09-17 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-3569: Summary: Add metadata field to StructField Key: SPARK-3569 URL: https://issues.apache.org/jira/browse/SPARK-3569 Project: Spark Issue Type: New Feature

[jira] [Resolved] (SPARK-3534) Avoid running MLlib and Streaming tests when testing SQL PRs

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust resolved SPARK-3534. - Resolution: Fixed Fix Version/s: 1.2.0 Assignee: Nicholas Chammas Avoid

[jira] [Updated] (SPARK-3569) Add metadata field to StructField

2014-09-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3569: - Component/s: MLlib ML Add metadata field to StructField

[jira] [Created] (SPARK-3570) Shuffle write time does not include time to open shuffle files

2014-09-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-3570: - Summary: Shuffle write time does not include time to open shuffle files Key: SPARK-3570 URL: https://issues.apache.org/jira/browse/SPARK-3570 Project: Spark

[jira] [Updated] (SPARK-3570) Shuffle write time does not include time to open shuffle files

2014-09-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-3570: -- Attachment: 3a_1410957857_0_job_log_waterfall.pdf Shuffle write time does not include time to

[jira] [Updated] (SPARK-3570) Shuffle write time does not include time to open shuffle files

2014-09-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-3570: -- Attachment: (was: 3a_1410943402_0_job_log_waterfall.pdf) Shuffle write time does not

[jira] [Created] (SPARK-3571) Spark standalone cluster mode doesn't work.

2014-09-17 Thread Kousuke Saruta (JIRA)
Kousuke Saruta created SPARK-3571: - Summary: Spark standalone cluster mode doesn't work. Key: SPARK-3571 URL: https://issues.apache.org/jira/browse/SPARK-3571 Project: Spark Issue Type: Bug

[jira] [Commented] (SPARK-3571) Spark standalone cluster mode doesn't work.

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14137884#comment-14137884 ] Apache Spark commented on SPARK-3571: - User 'sarutak' has created a pull request for

[jira] [Created] (SPARK-3572) Support register UserType in SQL

2014-09-17 Thread Xiangrui Meng (JIRA)
Xiangrui Meng created SPARK-3572: Summary: Support register UserType in SQL Key: SPARK-3572 URL: https://issues.apache.org/jira/browse/SPARK-3572 Project: Spark Issue Type: New Feature

[jira] [Updated] (SPARK-3573) Dataset

2014-09-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3573?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3573: - Shepherd: Michael Armbrust Dataset --- Key: SPARK-3573

[jira] [Created] (SPARK-3574) Shuffle finish time always reported as -1

2014-09-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-3574: - Summary: Shuffle finish time always reported as -1 Key: SPARK-3574 URL: https://issues.apache.org/jira/browse/SPARK-3574 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Shuo Xiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuo Xiang updated SPARK-3568: -- Description: Include widely-used metrics for ranking algorithms, including: - Mean Average Precision

[jira] [Updated] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Shuo Xiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuo Xiang updated SPARK-3568: -- Description: Include widely-used metrics for ranking algorithms, including: - Mean Average Precision

[jira] [Updated] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Shuo Xiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuo Xiang updated SPARK-3568: -- Description: Include widely-used metrics for ranking algorithms, including: - Mean Average Precision

[jira] [Updated] (SPARK-3568) Add metrics for ranking algorithms

2014-09-17 Thread Shuo Xiang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shuo Xiang updated SPARK-3568: -- Description: Include widely-used metrics for ranking algorithms, including: - Mean Average Precision

[jira] [Updated] (SPARK-3569) Add metadata field to StructField

2014-09-17 Thread Xiangrui Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiangrui Meng updated SPARK-3569: - Description: Want to add a metadata field to StructField that can be used by other applications

[jira] [Commented] (SPARK-3051) Support looking-up named accumulators in a registry

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138095#comment-14138095 ] Apache Spark commented on SPARK-3051: - User 'nfergu' has created a pull request for

[jira] [Updated] (SPARK-3161) Cache example-node map for DecisionTree training

2014-09-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-3161: - Description: Improvement: worker computation When training each level of a DecisionTree,

[jira] [Commented] (SPARK-3563) Shuffle data not always be cleaned

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138100#comment-14138100 ] Patrick Wendell commented on SPARK-3563: Eventually the references should be

[jira] [Updated] (SPARK-3565) make code consistent with document

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell updated SPARK-3565: --- Component/s: (was: core) Spark Core make code consistent with document

[jira] [Commented] (SPARK-3565) make code consistent with document

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3565?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138101#comment-14138101 ] Patrick Wendell commented on SPARK-3565: Please use Spark Core component and not

[jira] [Updated] (SPARK-3161) Cache example-node map for DecisionTree training

2014-09-17 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3161?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-3161: - Description: Improvement: worker computation When training each level of a DecisionTree,

[jira] [Resolved] (SPARK-901) UISuite jetty port increases under contention fails if startPort is in use

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-901?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-901. --- Resolution: Fixed This is fixed by SPARK-3555 since we no longer chose a specific starting

[jira] [Resolved] (SPARK-1739) Close PR's after 30 days of inactivity

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-1739?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-1739. Resolution: Won't Fix We've introduced a different mechanism for manual closing, so I don't

[jira] [Updated] (SPARK-3575) Hive Schema is ignored when using convertMetastoreParquet

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3575: Component/s: SQL Hive Schema is ignored when using convertMetastoreParquet

[jira] [Created] (SPARK-3575) Hive Schema is ignored when using convertMetastoreParquet

2014-09-17 Thread Michael Armbrust (JIRA)
Michael Armbrust created SPARK-3575: --- Summary: Hive Schema is ignored when using convertMetastoreParquet Key: SPARK-3575 URL: https://issues.apache.org/jira/browse/SPARK-3575 Project: Spark

[jira] [Updated] (SPARK-3575) Hive Schema is ignored when using convertMetastoreParquet

2014-09-17 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-3575: Target Version/s: 1.2.0 Hive Schema is ignored when using convertMetastoreParquet

[jira] [Comment Edited] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-17 Thread Sean McNamara (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138130#comment-14138130 ] Sean McNamara edited comment on SPARK-3561 at 9/17/14 10:45 PM:

[jira] [Comment Edited] (SPARK-3561) Native Hadoop/YARN integration for batch/ETL workloads

2014-09-17 Thread Sean McNamara (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138130#comment-14138130 ] Sean McNamara edited comment on SPARK-3561 at 9/17/14 10:47 PM:

[jira] [Created] (SPARK-3576) Provide script for creating the Spark AMI from scratch

2014-09-17 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-3576: -- Summary: Provide script for creating the Spark AMI from scratch Key: SPARK-3576 URL: https://issues.apache.org/jira/browse/SPARK-3576 Project: Spark

[jira] [Created] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Kay Ousterhout (JIRA)
Kay Ousterhout created SPARK-3577: - Summary: Shuffle write time incorrect for sort-based shuffle Key: SPARK-3577 URL: https://issues.apache.org/jira/browse/SPARK-3577 Project: Spark Issue

[jira] [Updated] (SPARK-3270) Spark API for Application Extensions

2014-09-17 Thread Michal Malohlava (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michal Malohlava updated SPARK-3270: Description: Any application should be able to enrich spark infrastructure by services

[jira] [Updated] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kay Ousterhout updated SPARK-3577: -- Description: After this change

[jira] [Created] (SPARK-3578) GraphGenerators.sampleLogNormal sometimes returns too-large result

2014-09-17 Thread Ankur Dave (JIRA)
Ankur Dave created SPARK-3578: - Summary: GraphGenerators.sampleLogNormal sometimes returns too-large result Key: SPARK-3578 URL: https://issues.apache.org/jira/browse/SPARK-3578 Project: Spark

[jira] [Updated] (SPARK-3578) GraphGenerators.sampleLogNormal sometimes returns too-large result

2014-09-17 Thread Ankur Dave (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ankur Dave updated SPARK-3578: -- Description: GraphGenerators.sampleLogNormal is supposed to return an integer strictly less than

[jira] [Commented] (SPARK-3578) GraphGenerators.sampleLogNormal sometimes returns too-large result

2014-09-17 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3578?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138188#comment-14138188 ] Apache Spark commented on SPARK-3578: - User 'ankurdave' has created a pull request for

[jira] [Resolved] (SPARK-3571) Spark standalone cluster mode doesn't work.

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3571. Resolution: Fixed Fix Version/s: 1.2.0 Issue resolved by pull request 2436

[jira] [Updated] (SPARK-3571) Spark standalone cluster mode doesn't work.

2014-09-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3571: - Assignee: Kousuke Saruta Spark standalone cluster mode doesn't work.

[jira] [Updated] (SPARK-3564) Display App ID on HistoryPage

2014-09-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3564: - Assignee: Kousuke Saruta Display App ID on HistoryPage -

[jira] [Resolved] (SPARK-3564) Display App ID on HistoryPage

2014-09-17 Thread Patrick Wendell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Patrick Wendell resolved SPARK-3564. Resolution: Fixed Fix Version/s: 1.1.1 1.2.0 Issue resolved by

[jira] [Updated] (SPARK-3567) appId field in SparkDeploySchedulerBackend should be volatile

2014-09-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3567: - Target Version/s: 1.2.0 (was: 1.1.1, 1.2.0) appId field in SparkDeploySchedulerBackend should be

[jira] [Updated] (SPARK-3567) appId field in SparkDeploySchedulerBackend should be volatile

2014-09-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3567: - Fix Version/s: 1.2.0 appId field in SparkDeploySchedulerBackend should be volatile

[jira] [Updated] (SPARK-3567) appId field in SparkDeploySchedulerBackend should be volatile

2014-09-17 Thread Andrew Or (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Or updated SPARK-3567: - Assignee: Kousuke Saruta appId field in SparkDeploySchedulerBackend should be volatile

[jira] [Created] (SPARK-3579) Jekyll doc generation is different across environments

2014-09-17 Thread Patrick Wendell (JIRA)
Patrick Wendell created SPARK-3579: -- Summary: Jekyll doc generation is different across environments Key: SPARK-3579 URL: https://issues.apache.org/jira/browse/SPARK-3579 Project: Spark

[jira] [Commented] (SPARK-2321) Design a proper progress reporting event listener API

2014-09-17 Thread Josh Rosen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138242#comment-14138242 ] Josh Rosen commented on SPARK-2321: --- I agree that this should be a pull API. A

[jira] [Commented] (SPARK-3574) Shuffle finish time always reported as -1

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138246#comment-14138246 ] Sandy Ryza commented on SPARK-3574: --- On it Shuffle finish time always reported as -1

[jira] [Commented] (SPARK-3577) Shuffle write time incorrect for sort-based shuffle

2014-09-17 Thread Sandy Ryza (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138245#comment-14138245 ] Sandy Ryza commented on SPARK-3577: --- On it Shuffle write time incorrect for sort-based

[jira] [Commented] (SPARK-3574) Shuffle finish time always reported as -1

2014-09-17 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3574?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138247#comment-14138247 ] Kay Ousterhout commented on SPARK-3574: --- Thanks Sandy!! Shuffle finish time always

[jira] [Commented] (SPARK-3129) Prevent data loss in Spark Streaming

2014-09-17 Thread Matei Zaharia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3129?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14138281#comment-14138281 ] Matei Zaharia commented on SPARK-3129: -- Great, it will be nice to see how fast this

[jira] [Created] (SPARK-3580) Add Consistent Method For Number of RDD Partitions Across Differnet Languages

2014-09-17 Thread Pat McDonough (JIRA)
Pat McDonough created SPARK-3580: Summary: Add Consistent Method For Number of RDD Partitions Across Differnet Languages Key: SPARK-3580 URL: https://issues.apache.org/jira/browse/SPARK-3580 Project:

[jira] [Updated] (SPARK-3580) Add Consistent Method To Get Number of RDD Partitions Across Differnet Languages

2014-09-17 Thread Pat McDonough (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat McDonough updated SPARK-3580: - Summary: Add Consistent Method To Get Number of RDD Partitions Across Differnet Languages (was:

[jira] [Updated] (SPARK-3580) Add Consistent Method To Get Number of RDD Partitions Across Different Languages

2014-09-17 Thread Pat McDonough (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3580?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Pat McDonough updated SPARK-3580: - Summary: Add Consistent Method To Get Number of RDD Partitions Across Different Languages (was:

  1   2   >