[jira] [Created] (SPARK-21149) Add job description API for R

2017-06-19 Thread Felix Cheung (JIRA)
Felix Cheung created SPARK-21149: Summary: Add job description API for R Key: SPARK-21149 URL: https://issues.apache.org/jira/browse/SPARK-21149 Project: Spark Issue Type: Improvement

[jira] [Assigned] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21148: Assignee: Apache Spark > Set SparkUncaughtExceptionHandler to the Master >

[jira] [Assigned] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21148?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21148: Assignee: (was: Apache Spark) > Set SparkUncaughtExceptionHandler to the Master >

[jira] [Commented] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21148?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055070#comment-16055070 ] Apache Spark commented on SPARK-21148: -- User 'devaraj-kavali' has created a pull request for this

[jira] [Resolved] (SPARK-20889) SparkR grouped documentation for Column methods

2017-06-19 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung resolved SPARK-20889. -- Resolution: Fixed Assignee: Wayne Zhang Fix Version/s: 2.3.0

[jira] [Created] (SPARK-21148) Set SparkUncaughtExceptionHandler to the Master

2017-06-19 Thread Devaraj K (JIRA)
Devaraj K created SPARK-21148: - Summary: Set SparkUncaughtExceptionHandler to the Master Key: SPARK-21148 URL: https://issues.apache.org/jira/browse/SPARK-21148 Project: Spark Issue Type:

[jira] [Commented] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055049#comment-16055049 ] Takeshi Yamamuro commented on SPARK-21144: -- okay, I'm currently looking into this. > Unexpected

[jira] [Created] (SPARK-21147) the schema of socket source can not be set.

2017-06-19 Thread Fei Shao (JIRA)
Fei Shao created SPARK-21147: Summary: the schema of socket source can not be set. Key: SPARK-21147 URL: https://issues.apache.org/jira/browse/SPARK-21147 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-19 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-21133: --- Assignee: Yuming Wang > HighlyCompressedMapStatus#writeExternal throws NPE >

[jira] [Resolved] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-19 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-21133. - Resolution: Fixed Fix Version/s: 2.2.0 Issue resolved by pull request 18343

[jira] [Commented] (SPARK-21146) Worker should handle and shutdown when any thread gets UncaughtException

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055024#comment-16055024 ] Apache Spark commented on SPARK-21146: -- User 'devaraj-kavali' has created a pull request for this

[jira] [Assigned] (SPARK-21146) Worker should handle and shutdown when any thread gets UncaughtException

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21146: Assignee: Apache Spark > Worker should handle and shutdown when any thread gets

[jira] [Assigned] (SPARK-21146) Worker should handle and shutdown when any thread gets UncaughtException

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21146: Assignee: (was: Apache Spark) > Worker should handle and shutdown when any thread

[jira] [Assigned] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21144: Assignee: Apache Spark > Unexpected results when the data schema and partition schema

[jira] [Commented] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16055015#comment-16055015 ] Apache Spark commented on SPARK-21144: -- User 'maropu' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21144: Assignee: (was: Apache Spark) > Unexpected results when the data schema and partition

[jira] [Created] (SPARK-21146) Worker should handle and shutdown when any thread gets UncaughtException

2017-06-19 Thread Devaraj K (JIRA)
Devaraj K created SPARK-21146: - Summary: Worker should handle and shutdown when any thread gets UncaughtException Key: SPARK-21146 URL: https://issues.apache.org/jira/browse/SPARK-21146 Project: Spark

[jira] [Commented] (SPARK-18191) Port RDD API to use commit protocol

2017-06-19 Thread Shridhar Ramachandran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054978#comment-16054978 ] Shridhar Ramachandran commented on SPARK-18191: --- I see this got committed only in 2.2.0 and

[jira] [Comment Edited] (SPARK-18191) Port RDD API to use commit protocol

2017-06-19 Thread Shridhar Ramachandran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054978#comment-16054978 ] Shridhar Ramachandran edited comment on SPARK-18191 at 6/20/17 12:11 AM:

[jira] [Resolved] (SPARK-8642) Ungraceful failure when yarn client is not configured.

2017-06-19 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-8642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-8642. --- Resolution: Won't Fix Even though a better error here would be nice, I'll close this because

[jira] [Assigned] (SPARK-21145) Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21145: Assignee: Apache Spark (was: Tathagata Das) > Restarted queries reuse same

[jira] [Commented] (SPARK-21145) Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054945#comment-16054945 ] Apache Spark commented on SPARK-21145: -- User 'tdas' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21145) Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21145: Assignee: Tathagata Das (was: Apache Spark) > Restarted queries reuse same

[jira] [Created] (SPARK-21145) Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore

2017-06-19 Thread Tathagata Das (JIRA)
Tathagata Das created SPARK-21145: - Summary: Restarted queries reuse same StateStoreProvider, causing multiple concurrent tasks to update same StateStore Key: SPARK-21145 URL:

[jira] [Resolved] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-21138. Resolution: Fixed Assignee: sharkd tu Fix Version/s: 2.3.0

[jira] [Resolved] (SPARK-21124) Wrong user shown in UI when using kerberos

2017-06-19 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-21124. Resolution: Fixed Assignee: Marcelo Vanzin Fix Version/s: 2.3.0 > Wrong

[jira] [Updated] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li updated SPARK-21144: Target Version/s: 2.2.0 > Unexpected results when the data schema and partition schema have the >

[jira] [Commented] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21144?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054788#comment-16054788 ] Xiao Li commented on SPARK-21144: - cc [~maropu] > Unexpected results when the data schema and partition

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054785#comment-16054785 ] Apache Spark commented on SPARK-18016: -- User 'bdrillard' has created a pull request for this issue:

[jira] [Created] (SPARK-21144) Unexpected results when the data schema and partition schema have the duplicate columns

2017-06-19 Thread Xiao Li (JIRA)
Xiao Li created SPARK-21144: --- Summary: Unexpected results when the data schema and partition schema have the duplicate columns Key: SPARK-21144 URL: https://issues.apache.org/jira/browse/SPARK-21144

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-06-19 Thread Aleksander Eskilson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054784#comment-16054784 ] Aleksander Eskilson commented on SPARK-18016: - [~cloud_fan], [~divshukla], I've created a PR

[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054707#comment-16054707 ] Ryan Williams commented on SPARK-21143: --- [~zsxwing] bq. it's too risky to upgrade from 4.0.X to

[jira] [Commented] (SPARK-11170) ​ EOFException on History server reading in progress lz4

2017-06-19 Thread remoteServer (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-11170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054701#comment-16054701 ] remoteServer commented on SPARK-11170: -- Do we have steps to reproduce the issue? I have enabled

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-19 Thread Cody Koeninger (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054690#comment-16054690 ] Cody Koeninger commented on SPARK-20928: Cool, can you label it SPIP so it shows up linked from

[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054631#comment-16054631 ] Sean Owen commented on SPARK-21143: --- If this reduces to a 4.0 vs 4.1 conflict, then this is SPARK-19552

[jira] [Assigned] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21142: Assignee: Apache Spark > spark-streaming-kafka-0-10 has too fat dependency on kafka >

[jira] [Commented] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054612#comment-16054612 ] Apache Spark commented on SPARK-21142: -- User 'timvw' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21142: Assignee: (was: Apache Spark) > spark-streaming-kafka-0-10 has too fat dependency on

[jira] [Updated] (SPARK-21133) HighlyCompressedMapStatus#writeExternal throws NPE

2017-06-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Michael Armbrust updated SPARK-21133: - Target Version/s: 2.2.0 Priority: Blocker (was: Major) Description:

[jira] [Commented] (SPARK-21102) Refresh command is too aggressive in parsing

2017-06-19 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054600#comment-16054600 ] Reynold Xin commented on SPARK-21102: - Can you submit a pull request so we can discuss the details of

[jira] [Commented] (SPARK-20928) Continuous Processing Mode for Structured Streaming

2017-06-19 Thread Michael Armbrust (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20928?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054596#comment-16054596 ] Michael Armbrust commented on SPARK-20928: -- Hi Cody, I do plan to flesh this out with the other

[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054593#comment-16054593 ] Shixiong Zhu commented on SPARK-21143: -- The reason you cannot use 4.0.42.Final is because you are

[jira] [Resolved] (SPARK-19975) Add map_keys and map_values functions to Python

2017-06-19 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-19975. - Resolution: Fixed Assignee: Yong Tang Fix Version/s: 2.3.0 > Add map_keys and map_values

[jira] [Commented] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054592#comment-16054592 ] Shixiong Zhu commented on SPARK-21143: -- As Netty is so core to Spark, it's too risky to upgrade from

[jira] [Commented] (SPARK-21102) Refresh command is too aggressive in parsing

2017-06-19 Thread Anton Okolnychyi (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054591#comment-16054591 ] Anton Okolnychyi commented on SPARK-21102: -- Hi [~rxin], I took a look at this issue and have a

[jira] [Commented] (SPARK-12414) Remove closure serializer

2017-06-19 Thread Ritesh Tijoriwala (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-12414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054529#comment-16054529 ] Ritesh Tijoriwala commented on SPARK-12414: --- I have a similar situation. I have several classes

[jira] [Updated] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-21142: - Component/s: (was: Structured Streaming) DStreams >

[jira] [Resolved] (SPARK-21123) Options for file stream source are in a wrong table

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21123. -- Resolution: Fixed Fix Version/s: 2.3.0 2.2.0 > Options for file

[jira] [Updated] (SPARK-16430) Add an option in file stream source to read 1 file at a time

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-16430: - Fix Version/s: (was: 2.1.0) 2.0.0 > Add an option in file stream source

[jira] [Updated] (SPARK-16430) Add an option in file stream source to read 1 file at a time

2017-06-19 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu updated SPARK-16430: - Fix Version/s: 2.1.0 > Add an option in file stream source to read 1 file at a time >

[jira] [Resolved] (SPARK-19688) Spark on Yarn Credentials File set to different application directory

2017-06-19 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-19688. Resolution: Fixed Fix Version/s: 2.3.0 2.2.1

[jira] [Created] (SPARK-21143) Fail to fetch blocks >1MB in size in presence of conflicting Netty version

2017-06-19 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-21143: - Summary: Fail to fetch blocks >1MB in size in presence of conflicting Netty version Key: SPARK-21143 URL: https://issues.apache.org/jira/browse/SPARK-21143

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054176#comment-16054176 ] sam edited comment on SPARK-21137 at 6/19/17 3:20 PM: -- [~srowen] Ah OK, sorry, not

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054176#comment-16054176 ] sam commented on SPARK-21137: - [~srowen] Ah OK, sorry, not used to that process. On other projects I've seen

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2017-06-19 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054162#comment-16054162 ] Michael Schmeißer commented on SPARK-650: - [~riteshtijoriwala] - Sorry, but I am not familiar with

[jira] [Commented] (SPARK-21080) Workaround for HDFS delegation token expiry broken with some Hadoop versions

2017-06-19 Thread Lukasz Raszka (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21080?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054155#comment-16054155 ] Lukasz Raszka commented on SPARK-21080: --- [~jerryshao] Yes, it's in HA mode. Updating to newer HDFS

[jira] [Resolved] (SPARK-17176) Task are sorted by "Index" in Stage Page.

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17176?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-17176. --- Resolution: Won't Fix > Task are sorted by "Index" in Stage Page. >

[jira] [Created] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Tim Van Wassenhove (JIRA)
Tim Van Wassenhove created SPARK-21142: -- Summary: spark-streaming-kafka-0-10 has too fat dependency on kafka Key: SPARK-21142 URL: https://issues.apache.org/jira/browse/SPARK-21142 Project: Spark

[jira] [Commented] (SPARK-21142) spark-streaming-kafka-0-10 has too fat dependency on kafka

2017-06-19 Thread Tim Van Wassenhove (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054130#comment-16054130 ] Tim Van Wassenhove commented on SPARK-21142: Opened a PR on github:

[jira] [Commented] (SPARK-18016) Code Generation: Constant Pool Past Limit for Wide/Nested Dataset

2017-06-19 Thread Aleksander Eskilson (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054118#comment-16054118 ] Aleksander Eskilson commented on SPARK-18016: - [~cloud_fan], [~divshukla], yeah, I'd be happy

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054115#comment-16054115 ] Sean Owen commented on SPARK-21137: --- Try a thread dump on the driver. Until there's some more detail

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054111#comment-16054111 ] sam edited comment on SPARK-21137 at 6/19/17 2:36 PM: -- [~srowen] > what stages are

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054111#comment-16054111 ] sam edited comment on SPARK-21137 at 6/19/17 2:35 PM: -- [~srowen] > what stages are

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054111#comment-16054111 ] sam commented on SPARK-21137: - [~srowen] > what stages are executing if any? *None, no tasks are started*.

[jira] [Commented] (SPARK-19809) NullPointerException on empty ORC file

2017-06-19 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054108#comment-16054108 ] Dongjoon Hyun commented on SPARK-19809: --- Yep. I'm trying to fix this with new ORC data source. It

[jira] [Commented] (SPARK-19809) NullPointerException on empty ORC file

2017-06-19 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054076#comment-16054076 ] Hyukjin Kwon commented on SPARK-19809: -- What you see is what you get. This is "Reopened" per the

[jira] [Resolved] (SPARK-21141) spark-update --version is hard to parse

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21141. --- Resolution: Not A Problem [~mprocop] please don't reopen JIRAs. We can reopen if needed. As I say, I

[jira] [Commented] (SPARK-21140) Reduce collect high memory requrements

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054055#comment-16054055 ] Sean Owen commented on SPARK-21140: --- Yes, it's possible the executor makes a copy of some data during

[jira] [Reopened] (SPARK-21141) spark-update --version is hard to parse

2017-06-19 Thread michael procopio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael procopio reopened SPARK-21141: -- My apologies, I mean spark-submit --version. > spark-update --version is hard to parse >

[jira] [Reopened] (SPARK-21140) Reduce collect high memory requrements

2017-06-19 Thread michael procopio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] michael procopio reopened SPARK-21140: -- I am not sure what detail you are looking for. I provided the test code I was using.

[jira] [Comment Edited] (SPARK-21140) Reduce collect high memory requrements

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054043#comment-16054043 ] Sean Owen edited comment on SPARK-21140 at 6/19/17 2:02 PM: I disagree

[jira] [Commented] (SPARK-21140) Reduce collect high memory requrements

2017-06-19 Thread michael procopio (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054043#comment-16054043 ] michael procopio commented on SPARK-21140: -- I disagree executor memory does depend on the size

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054042#comment-16054042 ] Sean Owen commented on SPARK-21137: --- Are you sure it's not just appearing to be stuck reading the file

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054026#comment-16054026 ] sam edited comment on SPARK-21137 at 6/19/17 1:53 PM: -- [~srowen] As I said in the

[jira] [Commented] (SPARK-19809) NullPointerException on empty ORC file

2017-06-19 Thread Renu Yadav (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054031#comment-16054031 ] Renu Yadav commented on SPARK-19809: What is the resolution of this issue.

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054026#comment-16054026 ] sam commented on SPARK-21137: - [~srowen] As I said in the description, which you may have missed, the logs

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16054004#comment-16054004 ] Sean Owen commented on SPARK-21137: --- As i say, you're not setting anything about the partitioning here.

[jira] [Resolved] (SPARK-21141) spark-update --version is hard to parse

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21141. --- Resolution: Not A Problem There is no spark-update. It is not intended as an API to determine the

[jira] [Resolved] (SPARK-21140) Reduce collect high memory requrements

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21140. --- Resolution: Invalid There's no real detail here. Executor memory doesn't directly matter to how

[jira] [Created] (SPARK-21141) spark-update --version is hard to parse

2017-06-19 Thread michael procopio (JIRA)
michael procopio created SPARK-21141: Summary: spark-update --version is hard to parse Key: SPARK-21141 URL: https://issues.apache.org/jira/browse/SPARK-21141 Project: Spark Issue Type:

[jira] [Created] (SPARK-21140) Reduce collect high memory requrements

2017-06-19 Thread michael procopio (JIRA)
michael procopio created SPARK-21140: Summary: Reduce collect high memory requrements Key: SPARK-21140 URL: https://issues.apache.org/jira/browse/SPARK-21140 Project: Spark Issue Type:

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053977#comment-16053977 ] sam commented on SPARK-21137: - [~srowen] So I've provided full reproduce steps here (including code and

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Resolved] (SPARK-20931) Built-in SQL Function ABS support string type

2017-06-19 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20931?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang resolved SPARK-20931. - Resolution: Fixed Fix Version/s: 2.3.0 > Built-in SQL Function ABS support string type >

[jira] [Commented] (SPARK-21139) java.util.concurrent.RejectedExecutionException: rejected from java.util.concurrent.ThreadPoolExecutor@46477dd0[Terminated, pool size = 0, active threads = 0, queued t

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21139?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053911#comment-16053911 ] Sean Owen commented on SPARK-21139: --- That looks like an issue from the HBase client, not Spark. >

[jira] [Created] (SPARK-21139) java.util.concurrent.RejectedExecutionException: rejected from java.util.concurrent.ThreadPoolExecutor@46477dd0[Terminated, pool size = 0, active threads = 0, queued tas

2017-06-19 Thread shining (JIRA)
shining created SPARK-21139: --- Summary: java.util.concurrent.RejectedExecutionException: rejected from java.util.concurrent.ThreadPoolExecutor@46477dd0[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 14109]

[jira] [Commented] (SPARK-20568) Delete files after processing in structured streaming

2017-06-19 Thread Fei Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053871#comment-16053871 ] Fei Shao commented on SPARK-20568: -- I also do not support this feature too. If we delete files

[jira] [Commented] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053820#comment-16053820 ] Sean Owen commented on SPARK-21137: --- Here's a hint, or example of what could be going wrong: you may

[jira] [Assigned] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21138: Assignee: (was: Apache Spark) > Cannot delete staging dir when the clusters of

[jira] [Commented] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053817#comment-16053817 ] Apache Spark commented on SPARK-21138: -- User 'sharkdtu' has created a pull request for this issue:

[jira] [Assigned] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-21138: Assignee: Apache Spark > Cannot delete staging dir when the clusters of

[jira] [Closed] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-21137. - > Spark cannot read many small files (wholeTextFiles) > ---

[jira] [Comment Edited] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16053808#comment-16053808 ] sam edited comment on SPARK-21137 at 6/19/17 11:14 AM: --- [~srowen] Sorry about the

[jira] [Updated] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread sharkd tu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21138?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sharkd tu updated SPARK-21138: -- Description: When I set different clusters for "spark.hadoop.fs.defaultFS" and

[jira] [Created] (SPARK-21138) Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different

2017-06-19 Thread sharkd tu (JIRA)
sharkd tu created SPARK-21138: - Summary: Cannot delete staging dir when the clusters of "spark.yarn.stagingDir" and "spark.hadoop.fs.defaultFS" are different Key: SPARK-21138 URL:

[jira] [Reopened] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam reopened SPARK-21137: - Reopened after adding detail. > Spark cannot read many small files (wholeTextFiles) >

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

[jira] [Resolved] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21137. --- Resolution: Invalid Don't reopen this please. Someone will do that if it's appropriate. This still

[jira] [Updated] (SPARK-21137) Spark cannot read many small files (wholeTextFiles)

2017-06-19 Thread sam (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sam updated SPARK-21137: Description: A very common use case in big data is to read a large number of small files. For example the Enron

  1   2   >