[jira] [Commented] (SPARK-12717) pyspark broadcast fails when using multiple threads

2017-04-05 Thread Kang Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12717?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958336#comment-15958336 ] Kang Liu commented on SPARK-12717: -- I met the same problem with pyspark 2.1.0. Any progress? > pyspark

[jira] [Closed] (SPARK-19722) Clean up the usage of EliminateSubqueryAliases

2017-04-05 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li closed SPARK-19722. --- Resolution: Won't Fix > Clean up the usage of EliminateSubqueryAliases >

[jira] [Commented] (SPARK-20233) Apply star-join filter heuristics to dynamic programming join enumeration

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958265#comment-15958265 ] Apache Spark commented on SPARK-20233: -- User 'ioana-delaney' has created a pull request for this

[jira] [Assigned] (SPARK-20233) Apply star-join filter heuristics to dynamic programming join enumeration

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20233: Assignee: (was: Apache Spark) > Apply star-join filter heuristics to dynamic

[jira] [Assigned] (SPARK-20233) Apply star-join filter heuristics to dynamic programming join enumeration

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20233: Assignee: Apache Spark > Apply star-join filter heuristics to dynamic programming join

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958259#comment-15958259 ] Liang-Chi Hsieh commented on SPARK-20226: - [~barrybecker4] Can you try to disable this config

[jira] [Assigned] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-05 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai reassigned SPARK-20217: Assignee: Eric Liang > Executor should not fail stage if killed task throws non-interrupted

[jira] [Resolved] (SPARK-20217) Executor should not fail stage if killed task throws non-interrupted exception

2017-04-05 Thread Yin Huai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yin Huai resolved SPARK-20217. -- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17531

[jira] [Resolved] (SPARK-20231) Refactor star schema code for the subsequent star join detection in CBO

2017-04-05 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20231. - Resolution: Fixed Assignee: Ioana Delaney Fix Version/s: 2.2.0 > Refactor star schema

[jira] [Updated] (SPARK-20214) pyspark linalg _convert_to_vector should check for sorted indices

2017-04-05 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20214: -- Affects Version/s: 1.5.2 1.6.3 > pyspark linalg

[jira] [Resolved] (SPARK-20214) pyspark linalg _convert_to_vector should check for sorted indices

2017-04-05 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley resolved SPARK-20214. --- Resolution: Fixed Fix Version/s: 2.2.0 2.0.3

[jira] [Created] (SPARK-20234) Improve Framework for Basic ML Tests

2017-04-05 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-20234: Summary: Improve Framework for Basic ML Tests Key: SPARK-20234 URL: https://issues.apache.org/jira/browse/SPARK-20234 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-20234) Improve Framework for Basic ML Tests

2017-04-05 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15958077#comment-15958077 ] Bryan Cutler commented on SPARK-20234: -- I can work on this > Improve Framework for Basic ML Tests >

[jira] [Resolved] (SPARK-20224) Update apache docs

2017-04-05 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-20224. --- Resolution: Fixed Issue resolved by pull request 17539

[jira] [Created] (SPARK-20233) Apply star-join filter heuristics to dynamic programming join enumeration

2017-04-05 Thread Ioana Delaney (JIRA)
Ioana Delaney created SPARK-20233: - Summary: Apply star-join filter heuristics to dynamic programming join enumeration Key: SPARK-20233 URL: https://issues.apache.org/jira/browse/SPARK-20233 Project:

[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2017-04-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957843#comment-15957843 ] Reynold Xin commented on SPARK-20202: - I've created a ticket on the Hive side to publish 1.2.x:

[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2017-04-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957811#comment-15957811 ] Reynold Xin commented on SPARK-20202: - Yes this is really important. The proper way to do this is to

[jira] [Updated] (SPARK-20202) Remove references to org.spark-project.hive

2017-04-05 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin updated SPARK-20202: Priority: Major (was: Blocker) > Remove references to org.spark-project.hive >

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Attachment: profile_indexer2.PNG A snapshot of the hotspot sampler from JVisualVM while

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957732#comment-15957732 ] Barry Becker commented on SPARK-20226: -- I did some profiling using the sampler in JVisualVM and took

[jira] [Commented] (SPARK-20232) Better combineByKey documentation: clarify memory allocation, better example

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20232?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957537#comment-15957537 ] Apache Spark commented on SPARK-20232: -- User 'dgingrich' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20232) Better combineByKey documentation: clarify memory allocation, better example

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20232: Assignee: Apache Spark > Better combineByKey documentation: clarify memory allocation,

[jira] [Assigned] (SPARK-20232) Better combineByKey documentation: clarify memory allocation, better example

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20232: Assignee: (was: Apache Spark) > Better combineByKey documentation: clarify memory

[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957530#comment-15957530 ] Sean Owen commented on SPARK-20228: --- This alone wouldn't matter directly, but could certainly matter

[jira] [Assigned] (SPARK-20231) Refactor star schema code for the subsequent star join detection in CBO

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20231: Assignee: Apache Spark > Refactor star schema code for the subsequent star join detection

[jira] [Assigned] (SPARK-20231) Refactor star schema code for the subsequent star join detection in CBO

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20231: Assignee: (was: Apache Spark) > Refactor star schema code for the subsequent star

[jira] [Commented] (SPARK-20231) Refactor star schema code for the subsequent star join detection in CBO

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20231?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957523#comment-15957523 ] Apache Spark commented on SPARK-20231: -- User 'ioana-delaney' has created a pull request for this

[jira] [Created] (SPARK-20232) Better combineByKey documentation: clarify memory allocation, better example

2017-04-05 Thread David Gingrich (JIRA)
David Gingrich created SPARK-20232: -- Summary: Better combineByKey documentation: clarify memory allocation, better example Key: SPARK-20232 URL: https://issues.apache.org/jira/browse/SPARK-20232

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957496#comment-15957496 ] Sean Owen commented on SPARK-20226: --- It could just look that way because caching means evaluating, and

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957489#comment-15957489 ] Barry Becker commented on SPARK-20226: -- I thought the problem was in the cacheTable call because

[jira] [Created] (SPARK-20231) Refactor star schema code for the subsequent star join detection in CBO

2017-04-05 Thread Ioana Delaney (JIRA)
Ioana Delaney created SPARK-20231: - Summary: Refactor star schema code for the subsequent star join detection in CBO Key: SPARK-20231 URL: https://issues.apache.org/jira/browse/SPARK-20231 Project:

[jira] [Assigned] (SPARK-20230) FetchFailedExceptions should invalidate file caches in MapOutputTracker even if newer stages are launched

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20230: Assignee: (was: Apache Spark) > FetchFailedExceptions should invalidate file caches

[jira] [Assigned] (SPARK-20230) FetchFailedExceptions should invalidate file caches in MapOutputTracker even if newer stages are launched

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20230: Assignee: Apache Spark > FetchFailedExceptions should invalidate file caches in

[jira] [Commented] (SPARK-20230) FetchFailedExceptions should invalidate file caches in MapOutputTracker even if newer stages are launched

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957477#comment-15957477 ] Apache Spark commented on SPARK-20230: -- User 'brkyvz' has created a pull request for this issue:

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957468#comment-15957468 ] Sean Owen commented on SPARK-20226: --- OK, this doesn't sound like it's anything to do with SQLContext or

[jira] [Created] (SPARK-20230) FetchFailedExceptions should invalidate file caches in MapOutputTracker even if newer stages are launched

2017-04-05 Thread Burak Yavuz (JIRA)
Burak Yavuz created SPARK-20230: --- Summary: FetchFailedExceptions should invalidate file caches in MapOutputTracker even if newer stages are launched Key: SPARK-20230 URL:

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957457#comment-15957457 ] Barry Becker commented on SPARK-20226: -- It seems like it has to do with the interaction between the

[jira] [Resolved] (SPARK-19454) Improve DataFrame.replace API

2017-04-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-19454. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 16793

[jira] [Assigned] (SPARK-19454) Improve DataFrame.replace API

2017-04-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk reassigned SPARK-19454: --- Assignee: Maciej Szymkiewicz > Improve DataFrame.replace API > - > >

[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Ansgar Schulze (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957411#comment-15957411 ] Ansgar Schulze commented on SPARK-20228: Hi Sean, thanks for your comment! Its not a problem that

[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2017-04-05 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957377#comment-15957377 ] Ryan Blue commented on SPARK-20202: --- +1 for a release of the Spark fork from the Hive community. While

[jira] [Commented] (SPARK-20216) Install pandoc on machine(s) used for packaging

2017-04-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20216?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957346#comment-15957346 ] holdenk commented on SPARK-20216: - Thanks [~marmbrus] :) So looking at that host it seems like pandoc is

[jira] [Resolved] (SPARK-20216) Install pandoc on machine(s) used for packaging

2017-04-05 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20216?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-20216. - Resolution: Fixed Assignee: holdenk This has been fixed with install pypandoc into the conda env

[jira] [Commented] (SPARK-20178) Improve Scheduler fetch failures

2017-04-05 Thread Sital Kedia (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20178?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957341#comment-15957341 ] Sital Kedia commented on SPARK-20178: - [~tgraves] Thanks for creating the JIRA and driving the

[jira] [Assigned] (SPARK-20229) add semanticHash to QueryPlan

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20229: Assignee: Wenchen Fan (was: Apache Spark) > add semanticHash to QueryPlan >

[jira] [Commented] (SPARK-20229) add semanticHash to QueryPlan

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957324#comment-15957324 ] Apache Spark commented on SPARK-20229: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20229) add semanticHash to QueryPlan

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20229: Assignee: Apache Spark (was: Wenchen Fan) > add semanticHash to QueryPlan >

[jira] [Commented] (SPARK-20219) Schedule tasks based on size of input from ScheduledRDD

2017-04-05 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957309#comment-15957309 ] Imran Rashid commented on SPARK-20219: -- I agree with Kay -- the idea here is a good one, I just

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Labels: cache (was: ) > Call to sqlContext.cacheTable takes an incredibly long time in some

[jira] [Comment Edited] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957296#comment-15957296 ] Barry Becker edited comment on SPARK-20226 at 4/5/17 5:36 PM: -- We noticed

[jira] [Commented] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957296#comment-15957296 ] Barry Becker commented on SPARK-20226: -- We noticed that this is reproducible just by adding a new

[jira] [Assigned] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20213: Assignee: Apache Spark > DataFrameWriter operations do not show up in SQL tab >

[jira] [Assigned] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20213: Assignee: (was: Apache Spark) > DataFrameWriter operations do not show up in SQL tab

[jira] [Commented] (SPARK-20213) DataFrameWriter operations do not show up in SQL tab

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957292#comment-15957292 ] Apache Spark commented on SPARK-20213: -- User 'rdblue' has created a pull request for this issue:

[jira] [Assigned] (SPARK-20223) Typo in tpcds q77.sql

2017-04-05 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li reassigned SPARK-20223: --- Assignee: Zhenhua Wang > Typo in tpcds q77.sql > - > > Key:

[jira] [Resolved] (SPARK-20223) Typo in tpcds q77.sql

2017-04-05 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-20223. - Resolution: Fixed Fix Version/s: 2.2.0 2.1.2 2.0.3 > Typo

[jira] [Assigned] (SPARK-20214) pyspark linalg _convert_to_vector should check for sorted indices

2017-04-05 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley reassigned SPARK-20214: - Assignee: Liang-Chi Hsieh > pyspark linalg _convert_to_vector should check for

[jira] [Updated] (SPARK-20214) pyspark linalg _convert_to_vector should check for sorted indices

2017-04-05 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20214: -- Summary: pyspark linalg _convert_to_vector should check for sorted indices (was:

[jira] [Updated] (SPARK-20214) pyspark.mllib SciPyTests test_serialize

2017-04-05 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20214: -- Target Version/s: 2.0.3, 2.1.2, 2.2.0 > pyspark.mllib SciPyTests test_serialize >

[jira] [Updated] (SPARK-20214) pyspark.mllib SciPyTests test_serialize

2017-04-05 Thread Joseph K. Bradley (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Joseph K. Bradley updated SPARK-20214: -- Shepherd: Joseph K. Bradley > pyspark.mllib SciPyTests test_serialize >

[jira] [Updated] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,Wr

2017-04-05 Thread Artur Sukhenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Artur Sukhenko updated SPARK-19068: --- Affects Version/s: 2.0.1 > Large number of executors causing a ton of ERROR

[jira] [Commented] (SPARK-19068) Large number of executors causing a ton of ERROR scheduler.LiveListenerBus: SparkListenerBus has already stopped! Dropping event SparkListenerExecutorMetricsUpdate(41,

2017-04-05 Thread Artur Sukhenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957258#comment-15957258 ] Artur Sukhenko commented on SPARK-19068: Having similar problem. Reproduce:

[jira] [Commented] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20228?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957257#comment-15957257 ] Sean Owen commented on SPARK-20228: --- I don't think I would expect the exact same results even with

[jira] [Commented] (SPARK-20219) Schedule tasks based on size of input from ScheduledRDD

2017-04-05 Thread Kay Ousterhout (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15957253#comment-15957253 ] Kay Ousterhout commented on SPARK-20219: Like [~mridulm80] (as mentioned on the PR) I'm hesitant

[jira] [Created] (SPARK-20229) add semanticHash to QueryPlan

2017-04-05 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-20229: --- Summary: add semanticHash to QueryPlan Key: SPARK-20229 URL: https://issues.apache.org/jira/browse/SPARK-20229 Project: Spark Issue Type: Improvement

[jira] [Created] (SPARK-20228) Random Forest instable results depending on spark.executor.memory

2017-04-05 Thread Ansgar Schulze (JIRA)
Ansgar Schulze created SPARK-20228: -- Summary: Random Forest instable results depending on spark.executor.memory Key: SPARK-20228 URL: https://issues.apache.org/jira/browse/SPARK-20228 Project: Spark

[jira] [Commented] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956985#comment-15956985 ] Sean Owen commented on SPARK-20227: --- Are you sure it's not just "a longer time" and not "forever"? 22

[jira] [Updated] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quentin Auge updated SPARK-20227: - Description: I'm trying to replace a lot of different columns in a dataframe with aggregates of

[jira] [Updated] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quentin Auge updated SPARK-20227: - Description: I'm trying to replace a lot of different columns in a dataframe with aggregates of

[jira] [Comment Edited] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956973#comment-15956973 ] Quentin Auge edited comment on SPARK-20227 at 4/5/17 2:55 PM: -- I managed to

[jira] [Comment Edited] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956973#comment-15956973 ] Quentin Auge edited comment on SPARK-20227 at 4/5/17 2:51 PM: -- I managed to

[jira] [Comment Edited] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956973#comment-15956973 ] Quentin Auge edited comment on SPARK-20227 at 4/5/17 2:51 PM: -- I manages to

[jira] [Comment Edited] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956973#comment-15956973 ] Quentin Auge edited comment on SPARK-20227 at 4/5/17 2:51 PM: -- I managed to

[jira] [Commented] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956973#comment-15956973 ] Quentin Auge commented on SPARK-20227: -- I found two distinct workarounds that prevent the job for

[jira] [Updated] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20227?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Quentin Auge updated SPARK-20227: - Description: I'm trying to replace a lot of different columns in a dataframe with aggregates of

[jira] [Created] (SPARK-20227) Job hangs when joining a lot of aggregated columns

2017-04-05 Thread Quentin Auge (JIRA)
Quentin Auge created SPARK-20227: Summary: Job hangs when joining a lot of aggregated columns Key: SPARK-20227 URL: https://issues.apache.org/jira/browse/SPARK-20227 Project: Spark Issue

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Attachment: xyzzy.csv Attaching the datafile, but I don't think it is significant. This problem

[jira] [Updated] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20226?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Barry Becker updated SPARK-20226: - Description: I have a case where the call to sqlContext.cacheTable can take an arbitrarily long

[jira] [Commented] (SPARK-20225) Spark Job hangs while writing parquet files to HDFS

2017-04-05 Thread Darshan Mehta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956877#comment-15956877 ] Darshan Mehta commented on SPARK-20225: --- Attached are the Spark UI screenshots with Failed Tasks

[jira] [Commented] (SPARK-20202) Remove references to org.spark-project.hive

2017-04-05 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20202?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956879#comment-15956879 ] Steve Loughran commented on SPARK-20202: # the ugliness need to inset the spark thrift stuff

[jira] [Created] (SPARK-20226) Call to sqlContext.cacheTable takes an incredibly long time in some cases

2017-04-05 Thread Barry Becker (JIRA)
Barry Becker created SPARK-20226: Summary: Call to sqlContext.cacheTable takes an incredibly long time in some cases Key: SPARK-20226 URL: https://issues.apache.org/jira/browse/SPARK-20226 Project:

[jira] [Updated] (SPARK-20225) Spark Job hangs while writing parquet files to HDFS

2017-04-05 Thread Darshan Mehta (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20225?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Darshan Mehta updated SPARK-20225: -- Attachment: Failed_Tasks_2.JPG Failed_Tasks.JPG > Spark Job hangs while

[jira] [Created] (SPARK-20225) Spark Job hangs while writing parquet files to HDFS

2017-04-05 Thread Darshan Mehta (JIRA)
Darshan Mehta created SPARK-20225: - Summary: Spark Job hangs while writing parquet files to HDFS Key: SPARK-20225 URL: https://issues.apache.org/jira/browse/SPARK-20225 Project: Spark Issue

[jira] [Updated] (SPARK-20155) CSV-files with quoted quotes can't be parsed, if delimiter follows quoted quote

2017-04-05 Thread Rick Moritz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20155?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rick Moritz updated SPARK-20155: Component/s: SQL > CSV-files with quoted quotes can't be parsed, if delimiter follows quoted >

[jira] [Assigned] (SPARK-20224) Update apache docs

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20224: Assignee: Tathagata Das (was: Apache Spark) > Update apache docs > -- >

[jira] [Assigned] (SPARK-20224) Update apache docs

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20224: Assignee: Apache Spark (was: Tathagata Das) > Update apache docs > -- >

[jira] [Commented] (SPARK-20224) Update apache docs

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956818#comment-15956818 ] Apache Spark commented on SPARK-20224: -- User 'tdas' has created a pull request for this issue:

[jira] [Created] (SPARK-20224) Update apache docs

2017-04-05 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-20224: - Summary: Update apache docs Key: SPARK-20224 URL: https://issues.apache.org/jira/browse/SPARK-20224 Project: Spark Issue Type: Sub-task

[jira] [Updated] (SPARK-19067) mapGroupsWithState - arbitrary stateful operations with Structured Streaming (similar to DStream.mapWithState)

2017-04-05 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das updated SPARK-19067: -- Description: Right now the only way to do stateful operations with with Aggregator or UDAF.

[jira] [Assigned] (SPARK-19807) Add reason for cancellation when a stage is killed using web UI

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-19807: - Assignee: shaolinliu > Add reason for cancellation when a stage is killed using web UI >

[jira] [Resolved] (SPARK-19807) Add reason for cancellation when a stage is killed using web UI

2017-04-05 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-19807. --- Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 17258

[jira] [Assigned] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14681: Assignee: Apache Spark > Provide label/impurity stats for spark.ml decision tree nodes >

[jira] [Commented] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956646#comment-15956646 ] Apache Spark commented on SPARK-14681: -- User 'shaynativ' has created a pull request for this issue:

[jira] [Assigned] (SPARK-14681) Provide label/impurity stats for spark.ml decision tree nodes

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-14681: Assignee: (was: Apache Spark) > Provide label/impurity stats for spark.ml decision

[jira] [Comment Edited] (SPARK-20199) GradientBoostedTreesModel doesn't have Column Sampling Rate Paramenter

2017-04-05 Thread pralabhkumar (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20199?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15954796#comment-15954796 ] pralabhkumar edited comment on SPARK-20199 at 4/5/17 10:05 AM: --- Hi

[jira] [Assigned] (SPARK-20223) Typo in tpcds q77.sql

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20223: Assignee: (was: Apache Spark) > Typo in tpcds q77.sql > - > >

[jira] [Assigned] (SPARK-20223) Typo in tpcds q77.sql

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-20223: Assignee: Apache Spark > Typo in tpcds q77.sql > - > >

[jira] [Commented] (SPARK-20223) Typo in tpcds q77.sql

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956574#comment-15956574 ] Apache Spark commented on SPARK-20223: -- User 'wzhfy' has created a pull request for this issue:

[jira] [Created] (SPARK-20223) Typo in tpcds q77.sql

2017-04-05 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-20223: Summary: Typo in tpcds q77.sql Key: SPARK-20223 URL: https://issues.apache.org/jira/browse/SPARK-20223 Project: Spark Issue Type: Bug Components:

[jira] [Commented] (SPARK-20204) remove SimpleCatalystConf and CatalystConf type alias

2017-04-05 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20204?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15956520#comment-15956520 ] Apache Spark commented on SPARK-20204: -- User 'dilipbiswal' has created a pull request for this

  1   2   >