[jira] [Resolved] (SPARK-18368) Regular expression replace throws NullPointerException when serialized

2016-11-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-18368. - Resolution: Fixed Assignee: Ryan Blue Fix Version/s: 2.1.0

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15650025#comment-15650025 ] Reynold Xin commented on SPARK-18352: - Again, this has nothing to do with streaming. It should just

[jira] [Resolved] (SPARK-18333) Revert hacks in parquet and orc reader to support case insensitive resolution

2016-11-08 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18333?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-18333. - Resolution: Fixed Assignee: Eric Liang Fix Version/s: 2.1.0 > Revert hacks in

[jira] [Comment Edited] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Thomas Sebastian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649982#comment-15649982 ] Thomas Sebastian edited comment on SPARK-18352 at 11/9/16 6:49 AM: --- Hi

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Thomas Sebastian (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649982#comment-15649982 ] Thomas Sebastian commented on SPARK-18352: -- Hi Reynold, So, do you mean that stream API need not

[jira] [Commented] (SPARK-18350) Support session local timezone

2016-11-08 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649969#comment-15649969 ] Xiao Li commented on SPARK-18350: - Agree. Session-specific SQL conf can be used here. > Support session

[jira] [Commented] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-08 Thread yuhao yang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649953#comment-15649953 ] yuhao yang commented on SPARK-18374: Just to provide some history info for the issue:

[jira] [Updated] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread mingjie tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingjie tang updated SPARK-18372: - Attachment: _thumb_37664.png the staging directory fail to be removed when hive table in the

[jira] [Commented] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread mingjie tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649935#comment-15649935 ] mingjie tang commented on SPARK-18372: -- This bug can be reproduced by the following codes: val

[jira] [Commented] (SPARK-18282) Add model summaries for Python GMM and BisectingKMeans

2016-11-08 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649900#comment-15649900 ] zhengruifeng commented on SPARK-18282: -- This is a duplicate of SPARK-18240. But I prefer you to take

[jira] [Closed] (SPARK-18240) Add Summary of BiKMeans and GMM in pyspark

2016-11-08 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-18240. Resolution: Duplicate > Add Summary of BiKMeans and GMM in pyspark >

[jira] [Commented] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649891#comment-15649891 ] mapreduced commented on SPARK-18371: I'll try to test it out hopefully soon. > Spark Streaming

[jira] [Commented] (SPARK-18350) Support session local timezone

2016-11-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649865#comment-15649865 ] Reynold Xin commented on SPARK-18350: - If it is session specific, I don't think we need an API. Just

[jira] [Comment Edited] (SPARK-18350) Support session local timezone

2016-11-08 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649858#comment-15649858 ] Xiao Li edited comment on SPARK-18350 at 11/9/16 5:26 AM: -- Below might be needed

[jira] [Commented] (SPARK-18350) Support session local timezone

2016-11-08 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18350?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649858#comment-15649858 ] Xiao Li commented on SPARK-18350: - Below might be needed if we want to support session timezone? - Add a

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649835#comment-15649835 ] Reynold Xin commented on SPARK-18352: - There is already a readStream.json. "Stream" here means not

[jira] [Commented] (SPARK-18377) warehouse path should be a static conf

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649809#comment-15649809 ] Apache Spark commented on SPARK-18377: -- User 'cloud-fan' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18377) warehouse path should be a static conf

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18377: Assignee: Wenchen Fan (was: Apache Spark) > warehouse path should be a static conf >

[jira] [Assigned] (SPARK-18377) warehouse path should be a static conf

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18377: Assignee: Apache Spark (was: Wenchen Fan) > warehouse path should be a static conf >

[jira] [Commented] (SPARK-18352) Parse normal, multi-line JSON files (not just JSON Lines)

2016-11-08 Thread Jayadevan M (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18352?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649803#comment-15649803 ] Jayadevan M commented on SPARK-18352: - [~rxin] Are you looking a new api like

[jira] [Created] (SPARK-18377) warehouse path should be a static conf

2016-11-08 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-18377: --- Summary: warehouse path should be a static conf Key: SPARK-18377 URL: https://issues.apache.org/jira/browse/SPARK-18377 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-17691) Add aggregate function to collect list with maximum number of elements

2016-11-08 Thread Assaf Mendelson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649792#comment-15649792 ] Assaf Mendelson commented on SPARK-17691: - I don't believe UDF (or rather UDAF in this case) is

[jira] [Assigned] (SPARK-18376) Skip subexpression elimination for conditional expressions

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18376: Assignee: Apache Spark > Skip subexpression elimination for conditional expressions >

[jira] [Assigned] (SPARK-18376) Skip subexpression elimination for conditional expressions

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18376: Assignee: (was: Apache Spark) > Skip subexpression elimination for conditional

[jira] [Commented] (SPARK-18376) Skip subexpression elimination for conditional expressions

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18376?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649779#comment-15649779 ] Apache Spark commented on SPARK-18376: -- User 'viirya' has created a pull request for this issue:

[jira] [Created] (SPARK-18376) Skip subexpression elimination for conditional expressions

2016-11-08 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-18376: --- Summary: Skip subexpression elimination for conditional expressions Key: SPARK-18376 URL: https://issues.apache.org/jira/browse/SPARK-18376 Project: Spark

[jira] [Commented] (SPARK-18191) Port RDD API to use commit protocol

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649674#comment-15649674 ] Apache Spark commented on SPARK-18191: -- User 'jiangxb1987' has created a pull request for this

[jira] [Commented] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649638#comment-15649638 ] Cody Koeninger commented on SPARK-18371: Thanks for digging into this. The other thing I noticed

[jira] [Updated] (SPARK-18375) Upgrade netty to 4.0.42.Final

2016-11-08 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Summary: Upgrade netty to 4.0.42.Final (was: Upgrade netty to 4.0.42) > Upgrade netty to

[jira] [Updated] (SPARK-18375) Upgrade netty to 4.0.42

2016-11-08 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Summary: Upgrade netty to 4.0.42 (was: Upgrade netty to 4.042) > Upgrade netty to 4.0.42 >

[jira] [Updated] (SPARK-18375) Upgrade netty to 4.042

2016-11-08 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Description: One of the important changes for 4.0.42.Final is "Support any FileRegion

[jira] [Updated] (SPARK-18375) Upgrade netty to 4.042

2016-11-08 Thread Guoqiang Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18375?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Guoqiang Li updated SPARK-18375: Affects Version/s: 1.6.2 > Upgrade netty to 4.042 > -- > >

[jira] [Created] (SPARK-18374) Incorrect words in StopWords/english.txt

2016-11-08 Thread nirav patel (JIRA)
nirav patel created SPARK-18374: --- Summary: Incorrect words in StopWords/english.txt Key: SPARK-18374 URL: https://issues.apache.org/jira/browse/SPARK-18374 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-18375) Upgrade netty to 4.042

2016-11-08 Thread Guoqiang Li (JIRA)
Guoqiang Li created SPARK-18375: --- Summary: Upgrade netty to 4.042 Key: SPARK-18375 URL: https://issues.apache.org/jira/browse/SPARK-18375 Project: Spark Issue Type: Bug Components:

[jira] [Commented] (SPARK-18364) expose metrics for YarnShuffleService

2016-11-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649506#comment-15649506 ] Steven Rand commented on SPARK-18364: - The solution I'd initially had in mind, simply creating a

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649436#comment-15649436 ] Apache Spark commented on SPARK-13534: -- User 'BryanCutler' has created a pull request for this

[jira] [Assigned] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13534: Assignee: (was: Apache Spark) > Implement Apache Arrow serializer for Spark DataFrame

[jira] [Assigned] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-13534: Assignee: Apache Spark > Implement Apache Arrow serializer for Spark DataFrame for use in

[jira] [Assigned] (SPARK-18373) Make KafkaSource's failOnDataLoss=false work with Spark jobs

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18373: Assignee: Apache Spark (was: Shixiong Zhu) > Make KafkaSource's failOnDataLoss=false

[jira] [Assigned] (SPARK-18373) Make KafkaSource's failOnDataLoss=false work with Spark jobs

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18373: Assignee: Shixiong Zhu (was: Apache Spark) > Make KafkaSource's failOnDataLoss=false

[jira] [Commented] (SPARK-18373) Make KafkaSource's failOnDataLoss=false work with Spark jobs

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18373?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649430#comment-15649430 ] Apache Spark commented on SPARK-18373: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Created] (SPARK-18373) Make KafkaSource's failOnDataLoss=false work with Spark jobs

2016-11-08 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-18373: Summary: Make KafkaSource's failOnDataLoss=false work with Spark jobs Key: SPARK-18373 URL: https://issues.apache.org/jira/browse/SPARK-18373 Project: Spark

[jira] [Commented] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649409#comment-15649409 ] mapreduced commented on SPARK-18371: I worked the math for

[jira] [Commented] (SPARK-13534) Implement Apache Arrow serializer for Spark DataFrame for use in DataFrame.toPandas

2016-11-08 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649405#comment-15649405 ] Bryan Cutler commented on SPARK-13534: -- I've been working on this with [~xusen]. We have a very

[jira] [Commented] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread mingjie tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649397#comment-15649397 ] mingjie tang commented on SPARK-18372: -- Solution: This bug is reported by customers. The reason is

[jira] [Updated] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread mingjie tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mingjie tang updated SPARK-18372: - Description: Steps to reproduce: 1. Launch spark-shell 2. Run the following

[jira] [Commented] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649387#comment-15649387 ] Apache Spark commented on SPARK-18372: -- User 'merlintang' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18372: Assignee: (was: Apache Spark) > .Hive-staging folders created from Spark hiveContext

[jira] [Assigned] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18372: Assignee: Apache Spark > .Hive-staging folders created from Spark hiveContext are not

[jira] [Commented] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread mingjie tang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18372?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649386#comment-15649386 ] mingjie tang commented on SPARK-18372: -- the PR is https://github.com/apache/spark/pull/15819 >

[jira] [Created] (SPARK-18372) .Hive-staging folders created from Spark hiveContext are not getting cleaned up

2016-11-08 Thread mingjie tang (JIRA)
mingjie tang created SPARK-18372: Summary: .Hive-staging folders created from Spark hiveContext are not getting cleaned up Key: SPARK-18372 URL: https://issues.apache.org/jira/browse/SPARK-18372

[jira] [Commented] (SPARK-18359) No option in read csv for other decimal delimiter than dot

2016-11-08 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18359?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649352#comment-15649352 ] Hyukjin Kwon commented on SPARK-18359: -- I guess that should be locale specific. I guess we recently

[jira] [Commented] (SPARK-14914) Test Cases fail on Windows

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14914?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649344#comment-15649344 ] Apache Spark commented on SPARK-14914: -- User 'wangmiao1981' has created a pull request for this

[jira] [Commented] (SPARK-18369) Deprecate runs in Pyspark mllib KMeans

2016-11-08 Thread Sandeep Singh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18369?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649345#comment-15649345 ] Sandeep Singh commented on SPARK-18369: --- I think its already deprecated

[jira] [Updated] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mapreduced updated SPARK-18371: --- Attachment: GiantBatch3.png > Spark Streaming backpressure bug - generates a batch with large number

[jira] [Updated] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mapreduced updated SPARK-18371: --- Attachment: Giant_batch_at_23_00.png > Spark Streaming backpressure bug - generates a batch with

[jira] [Updated] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mapreduced updated SPARK-18371: --- Attachment: Look_at_batch_at_22_14.png > Spark Streaming backpressure bug - generates a batch with

[jira] [Updated] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mapreduced updated SPARK-18371: --- Attachment: GiantBatch2.png > Spark Streaming backpressure bug - generates a batch with large number

[jira] [Created] (SPARK-18371) Spark Streaming backpressure bug - generates a batch with large number of records

2016-11-08 Thread mapreduced (JIRA)
mapreduced created SPARK-18371: -- Summary: Spark Streaming backpressure bug - generates a batch with large number of records Key: SPARK-18371 URL: https://issues.apache.org/jira/browse/SPARK-18371

[jira] [Assigned] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18366: Assignee: (was: Apache Spark) > Add handleInvalid to Pyspark for QuantileDiscretizer

[jira] [Commented] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649319#comment-15649319 ] Apache Spark commented on SPARK-18366: -- User 'techaddict' has created a pull request for this issue:

[jira] [Assigned] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18366: Assignee: Apache Spark > Add handleInvalid to Pyspark for QuantileDiscretizer and

[jira] [Created] (SPARK-18370) InsertIntoHadoopFsRelationCommand should keep track of its table

2016-11-08 Thread Herman van Hovell (JIRA)
Herman van Hovell created SPARK-18370: - Summary: InsertIntoHadoopFsRelationCommand should keep track of its table Key: SPARK-18370 URL: https://issues.apache.org/jira/browse/SPARK-18370 Project:

[jira] [Resolved] (SPARK-18239) Gradient Boosted Tree wrapper in SparkR

2016-11-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-18239. -- Resolution: Fixed Fix Version/s: 2.2.0 2.1.0 Target

[jira] [Commented] (SPARK-18131) Support returning Vector/Dense Vector from backend

2016-11-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649200#comment-15649200 ] Felix Cheung commented on SPARK-18131: -- We discussed this as a part of the GBT PR, from here

[jira] [Comment Edited] (SPARK-18320) ML 2.1 QA: API: Python API coverage

2016-11-08 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649171#comment-15649171 ] Seth Hendrickson edited comment on SPARK-18320 at 11/8/16 11:23 PM: I

[jira] [Commented] (SPARK-18320) ML 2.1 QA: API: Python API coverage

2016-11-08 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649171#comment-15649171 ] Seth Hendrickson commented on SPARK-18320: -- I scanned through the {{@Since("2.1.0") tags in

[jira] [Created] (SPARK-18369) Deprecate runs in Pyspark mllib KMeans

2016-11-08 Thread Seth Hendrickson (JIRA)
Seth Hendrickson created SPARK-18369: Summary: Deprecate runs in Pyspark mllib KMeans Key: SPARK-18369 URL: https://issues.apache.org/jira/browse/SPARK-18369 Project: Spark Issue Type:

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649132#comment-15649132 ] Nicholas Chammas commented on SPARK-18367: -- On 2.0.x the caching is required due to SPARK-18254,

[jira] [Resolved] (SPARK-18342) HDFSBackedStateStore can fail to rename files causing snapshotting and recovery to fail

2016-11-08 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18342?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-18342. --- Resolution: Fixed Fix Version/s: 2.1.0 2.0.2 Issue resolved by

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649110#comment-15649110 ] Herman van Hovell commented on SPARK-18367: --- Could you try this without caching? > limit()

[jira] [Assigned] (SPARK-18368) Regular expression replace throws NullPointerException when serialized

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18368: Assignee: (was: Apache Spark) > Regular expression replace throws

[jira] [Assigned] (SPARK-18368) Regular expression replace throws NullPointerException when serialized

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18368: Assignee: Apache Spark > Regular expression replace throws NullPointerException when

[jira] [Commented] (SPARK-18368) Regular expression replace throws NullPointerException when serialized

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649104#comment-15649104 ] Apache Spark commented on SPARK-18368: -- User 'rdblue' has created a pull request for this issue:

[jira] [Updated] (SPARK-18368) Regular expression replace throws NullPointerException when serialized

2016-11-08 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue updated SPARK-18368: -- Description: This query fails with a [NullPointerException on line

[jira] [Commented] (SPARK-14540) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner

2016-11-08 Thread Taro L. Saito (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649090#comment-15649090 ] Taro L. Saito commented on SPARK-14540: --- I'm also hitting a similar problem in my dependency

[jira] [Created] (SPARK-18368) Regular expression replace throws NullPointerException when serialized

2016-11-08 Thread Ryan Blue (JIRA)
Ryan Blue created SPARK-18368: - Summary: Regular expression replace throws NullPointerException when serialized Key: SPARK-18368 URL: https://issues.apache.org/jira/browse/SPARK-18368 Project: Spark

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649063#comment-15649063 ] Nicholas Chammas commented on SPARK-18367: -- I'm not trying to write any files actually. In this

[jira] [Commented] (SPARK-17916) CSV data source treats empty string as null no matter what nullValue option is

2016-11-08 Thread Eric Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649039#comment-15649039 ] Eric Liang commented on SPARK-17916: We're hitting this as a regression from 2.0 as well. Ideally,

[jira] [Commented] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Herman van Hovell (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649022#comment-15649022 ] Herman van Hovell commented on SPARK-18367: --- You might be trying to write a lot of partitions

[jira] [Commented] (SPARK-18226) SparkR displaying vector columns in incorrect way

2016-11-08 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15649026#comment-15649026 ] Felix Cheung commented on SPARK-18226: -- We discussed this as a part of the GBT PR, from here

[jira] [Updated] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Description: I have a complex DataFrame query that fails to run normally but succeeds if

[jira] [Updated] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas Chammas updated SPARK-18367: - Attachment: plan-without-limit.txt plan-with-limit.txt > limit() makes

[jira] [Created] (SPARK-18367) limit() makes the lame walk again

2016-11-08 Thread Nicholas Chammas (JIRA)
Nicholas Chammas created SPARK-18367: Summary: limit() makes the lame walk again Key: SPARK-18367 URL: https://issues.apache.org/jira/browse/SPARK-18367 Project: Spark Issue Type: Bug

[jira] [Updated] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-08 Thread Seth Hendrickson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Seth Hendrickson updated SPARK-18366: - Component/s: PySpark ML > Add handleInvalid to Pyspark for

[jira] [Created] (SPARK-18366) Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer

2016-11-08 Thread Seth Hendrickson (JIRA)
Seth Hendrickson created SPARK-18366: Summary: Add handleInvalid to Pyspark for QuantileDiscretizer and Bucketizer Key: SPARK-18366 URL: https://issues.apache.org/jira/browse/SPARK-18366 Project:

[jira] [Updated] (SPARK-18339) Don't push down current_timestamp for filters in StructuredStreaming

2016-11-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-18339: - Labels: (was: correctness) > Don't push down current_timestamp for filters in

[jira] [Commented] (SPARK-18336) SQL started to fail with OOM and etc. after move from 1.6.2 to 2.0.2

2016-11-08 Thread Egor Pahomov (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648912#comment-15648912 ] Egor Pahomov commented on SPARK-18336: -- [~srowen], I've read everything in documentation about new

[jira] [Commented] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-08 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648900#comment-15648900 ] Luke Miner commented on SPARK-18343: I ran jstack on an executor and on the driver and have attached

[jira] [Commented] (SPARK-17691) Add aggregate function to collect list with maximum number of elements

2016-11-08 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1564#comment-1564 ] Michael Armbrust commented on SPARK-17691: -- +1 > Add aggregate function to collect list with

[jira] [Updated] (SPARK-18343) FileSystem$Statistics$StatisticsDataReferenceCleaner hangs on s3 write

2016-11-08 Thread Luke Miner (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke Miner updated SPARK-18343: --- Description: I have a driver program where I write read data in from Cassandra using spark, perform

[jira] [Updated] (SPARK-18365) Improve Documentation for Sample Method

2016-11-08 Thread Bill Chambers (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bill Chambers updated SPARK-18365: -- Summary: Improve Documentation for Sample Method (was: Documentation for Sampling is

[jira] [Assigned] (SPARK-18365) Documentation for Sampling is Incorrect

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18365: Assignee: Apache Spark > Documentation for Sampling is Incorrect >

[jira] [Assigned] (SPARK-18365) Documentation for Sampling is Incorrect

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-18365: Assignee: (was: Apache Spark) > Documentation for Sampling is Incorrect >

[jira] [Commented] (SPARK-18365) Documentation for Sampling is Incorrect

2016-11-08 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648845#comment-15648845 ] Apache Spark commented on SPARK-18365: -- User 'anabranch' has created a pull request for this issue:

[jira] [Created] (SPARK-18365) Documentation for Sampling is Incorrect

2016-11-08 Thread Bill Chambers (JIRA)
Bill Chambers created SPARK-18365: - Summary: Documentation for Sampling is Incorrect Key: SPARK-18365 URL: https://issues.apache.org/jira/browse/SPARK-18365 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-18280) Potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-18280. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.1.0

[jira] [Commented] (SPARK-18364) expose metrics for YarnShuffleService

2016-11-08 Thread Steven Rand (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15648785#comment-15648785 ] Steven Rand commented on SPARK-18364: - I can implement this if people think it makes sense. > expose

[jira] [Created] (SPARK-18364) expose metrics for YarnShuffleService

2016-11-08 Thread Steven Rand (JIRA)
Steven Rand created SPARK-18364: --- Summary: expose metrics for YarnShuffleService Key: SPARK-18364 URL: https://issues.apache.org/jira/browse/SPARK-18364 Project: Spark Issue Type: Improvement

[jira] [Resolved] (SPARK-16215) Reduce runtime overhead of a program that writes an primitive array in Dataframe/Dataset

2016-11-08 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-16215. --- Resolution: Duplicate > Reduce runtime overhead of a program that writes an primitive array in >

  1   2   3   >