[jira] [Created] (SPARK-25111) increment kinesis client/producer lib versions & aws-sdk to match

2018-08-13 Thread Steve Loughran (JIRA)
Steve Loughran created SPARK-25111: -- Summary: increment kinesis client/producer lib versions & aws-sdk to match Key: SPARK-25111 URL: https://issues.apache.org/jira/browse/SPARK-25111 Project: Spark

[jira] [Assigned] (SPARK-22974) CountVectorModel does not attach attributes to output column

2018-08-13 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-22974: --- Assignee: Liang-Chi Hsieh > CountVectorModel does not attach attributes to output column >

[jira] [Resolved] (SPARK-22974) CountVectorModel does not attach attributes to output column

2018-08-13 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-22974. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 20313

[jira] [Resolved] (SPARK-25104) Validate user specified output schema

2018-08-13 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai resolved SPARK-25104. - Resolution: Fixed Fix Version/s: 2.4.0 Issue resolved by pull request 22094

[jira] [Assigned] (SPARK-25104) Validate user specified output schema

2018-08-13 Thread DB Tsai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] DB Tsai reassigned SPARK-25104: --- Assignee: Gengliang Wang > Validate user specified output schema >

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579243#comment-16579243 ] Wenchen Fan commented on SPARK-24771: - It's good to pay more attention to compatibility issues, I've

[jira] [Commented] (SPARK-23050) Structured Streaming with S3 file source duplicates data because of eventual consistency.

2018-08-13 Thread bharath kumar avusherla (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23050?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579242#comment-16579242 ] bharath kumar avusherla commented on SPARK-23050: - [~ste...@apache.org], I can start 

[jira] [Created] (SPARK-25110) make sure Flume streaming connector works with Spark 2.4

2018-08-13 Thread Wenchen Fan (JIRA)
Wenchen Fan created SPARK-25110: --- Summary: make sure Flume streaming connector works with Spark 2.4 Key: SPARK-25110 URL: https://issues.apache.org/jira/browse/SPARK-25110 Project: Spark Issue

[jira] [Updated] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-24771: Labels: release-notes (was: ) > Upgrade AVRO version from 1.7.7 to 1.8 >

[jira] [Comment Edited] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579213#comment-16579213 ] Marcelo Vanzin edited comment on SPARK-24771 at 8/14/18 3:46 AM: - The

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579213#comment-16579213 ] Marcelo Vanzin commented on SPARK-24771: The main problem pointed out in the original attempt is

[jira] [Commented] (SPARK-25068) High-order function: exists(array, function) → boolean

2018-08-13 Thread Takuya Ueshin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579210#comment-16579210 ] Takuya Ueshin commented on SPARK-25068: --- I added this because I thought this was a missing

[jira] [Updated] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect

2018-08-13 Thread Yuanbo Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated SPARK-25109: --- Description: We use this code to read parquet files from HDFS: spark.read.parquet('xxx') and get

[jira] [Updated] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect

2018-08-13 Thread Yuanbo Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated SPARK-25109: --- Description: We use this code to read parquet files from HDFS: spark.read.parquet('xxx') and get

[jira] [Resolved] (SPARK-23308) ignoreCorruptFiles should not ignore retryable IOException

2018-08-13 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23308?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-23308. -- Resolution: Won't Fix > ignoreCorruptFiles should not ignore retryable IOException >

[jira] [Resolved] (SPARK-24006) ExecutorAllocationManager.onExecutorAdded is an O(n) operation

2018-08-13 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24006?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-24006. -- Resolution: Won't Fix I am resolving this assuming there's no more update on actual numbers.

[jira] [Resolved] (SPARK-25086) Incorrect Default Value For "escape" For CSV Files

2018-08-13 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-25086. -- Resolution: Duplicate > Incorrect Default Value For "escape" For CSV Files >

[jira] [Updated] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect

2018-08-13 Thread Yuanbo Liu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuanbo Liu updated SPARK-25109: --- Attachment: WeChatWorkScreenshot_86b5-1d19-430a-a138-335e4bd3211c.png > spark python should

[jira] [Commented] (SPARK-24356) Duplicate strings in File.path managed by FileSegmentManagedBuffer

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24356?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579184#comment-16579184 ] Imran Rashid commented on SPARK-24356: -- Somewhat related to SPARK-24938 -- that explains why these

[jira] [Created] (SPARK-25109) spark python should retry reading another datanode if the first one fails to connect

2018-08-13 Thread Yuanbo Liu (JIRA)
Yuanbo Liu created SPARK-25109: -- Summary: spark python should retry reading another datanode if the first one fails to connect Key: SPARK-25109 URL: https://issues.apache.org/jira/browse/SPARK-25109

[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579181#comment-16579181 ] Yuming Wang edited comment on SPARK-25051 at 8/14/18 3:13 AM: -- Yes. The bug

[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread Yuming Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579181#comment-16579181 ] Yuming Wang commented on SPARK-25051: - Yes. The bug still exists. I can reproduced by: {code:scala}

[jira] [Commented] (SPARK-24938) Understand usage of netty's onheap memory use, even with offheap pools

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579177#comment-16579177 ] Imran Rashid commented on SPARK-24938: -- yeah thats about what I expected. Its worse than 16MB per

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: Description: The Dataset.show() method generates incorrect space padding since column name or

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: Description: The Dataset.show() method generates incorrect space padding since column name or

[jira] [Assigned] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25108: Assignee: Apache Spark > Dataset.show() generates incorrect padding for Unicode

[jira] [Assigned] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25108: Assignee: (was: Apache Spark) > Dataset.show() generates incorrect padding for

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: Description: The Dataset.show() method generates incorrect space padding since column name or

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: Description: The Dataset.show() method generates incorrect space padding since column name or

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: Environment: spark-shell on Xshell5 (was: spark-shell on Xshell) > Dataset.show() generates

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: Attachment: show.bmp > Dataset.show() generates incorrect padding for Unicode Character >

[jira] [Updated] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] xuejianbest updated SPARK-25108: External issue URL: (was: https://github.com/apache/spark/pull/22048) Description:

[jira] [Created] (SPARK-25108) Dataset.show() generates incorrect padding for Unicode Character

2018-08-13 Thread xuejianbest (JIRA)
xuejianbest created SPARK-25108: --- Summary: Dataset.show() generates incorrect padding for Unicode Character Key: SPARK-25108 URL: https://issues.apache.org/jira/browse/SPARK-25108 Project: Spark

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Michael Heuer (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579133#comment-16579133 ] Michael Heuer commented on SPARK-24771: --- I'm looking forward to testing this with

[jira] [Commented] (SPARK-24886) Increase Jenkins build time

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24886?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579131#comment-16579131 ] Apache Spark commented on SPARK-24886: -- User 'HyukjinKwon' has created a pull request for this

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579128#comment-16579128 ] Sean Owen commented on SPARK-24771: --- I confess I just don't know enough to have a strong opinion. A

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579126#comment-16579126 ] Wenchen Fan commented on SPARK-24771: - cc [~r...@databricks.com] [~srowen] > Upgrade AVRO version

[jira] [Commented] (SPARK-16617) Upgrade to Avro 1.8.x

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-16617?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579125#comment-16579125 ] Wenchen Fan commented on SPARK-16617: - Sorry I missed this ticket, the upgrade is now done by

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579124#comment-16579124 ] Wenchen Fan commented on SPARK-24771: - Sorry I was not aware of

[jira] [Assigned] (SPARK-25028) AnalyzePartitionCommand failed with NPE if value is null

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-25028: --- Assignee: Marco Gaido > AnalyzePartitionCommand failed with NPE if value is null >

[jira] [Resolved] (SPARK-25028) AnalyzePartitionCommand failed with NPE if value is null

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-25028. - Resolution: Fixed Fix Version/s: 2.3.2 2.4.0 Issue resolved by pull

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579042#comment-16579042 ] Marcelo Vanzin commented on SPARK-24918: I like the idea in general. On the implementation side,

[jira] [Commented] (SPARK-24938) Understand usage of netty's onheap memory use, even with offheap pools

2018-08-13 Thread Nihar Sheth (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579039#comment-16579039 ] Nihar Sheth commented on SPARK-24938: - After making the change and running the tool on a very simple

[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread MIK (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578752#comment-16578752 ] MIK edited comment on SPARK-25051 at 8/13/18 11:04 PM: --- Thanks [~yumwang] , with

[jira] [Commented] (SPARK-24787) Events being dropped at an alarming rate due to hsync being slow for eventLogging

2018-08-13 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24787?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579012#comment-16579012 ] Marcelo Vanzin commented on SPARK-24787: Is the slowness really caused by the use of hsync vs.

[jira] [Commented] (SPARK-24771) Upgrade AVRO version from 1.7.7 to 1.8

2018-08-13 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16579005#comment-16579005 ] Marcelo Vanzin commented on SPARK-24771: Hi guys, why was this accepted? It has been tried in

[jira] [Updated] (SPARK-25107) Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors

2018-08-13 Thread Karan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karan updated SPARK-25107: -- Description: I am in the process of upgrading Spark 1.6 to Spark 2.2. I have two stage query and I am

[jira] [Updated] (SPARK-25107) Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors

2018-08-13 Thread Karan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karan updated SPARK-25107: -- Description: I am in the process of upgrading Spark 1.6 to Spark 2.2. I have two stage query and I am

[jira] [Updated] (SPARK-25107) Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors

2018-08-13 Thread Karan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karan updated SPARK-25107: -- Description: I am in the process of upgrading Spark 1.6 to Spark 2.2. I have two stage query and I am

[jira] [Updated] (SPARK-25107) Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors

2018-08-13 Thread Karan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karan updated SPARK-25107: -- Description: I am in the process of upgrading Spark 1.6 to Spark 2.2. I have two stage query and I am

[jira] [Updated] (SPARK-25107) Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors

2018-08-13 Thread Karan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Karan updated SPARK-25107: -- Description: I am in the process of upgrading Spark 1.6 to Spark 2.2. I have two stage query and I am

[jira] [Created] (SPARK-25107) Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors

2018-08-13 Thread Karan (JIRA)
Karan created SPARK-25107: - Summary: Spark 2.2.0 Upgrade Issue : Throwing TreeNodeException: makeCopy, tree: CatalogRelation Errors Key: SPARK-25107 URL: https://issues.apache.org/jira/browse/SPARK-25107

[jira] [Commented] (SPARK-18057) Update structured streaming kafka from 0.10.0.1 to 2.0.0

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18057?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578962#comment-16578962 ] Apache Spark commented on SPARK-18057: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Commented] (SPARK-24156) Enable no-data micro batches for more eager streaming state clean up

2018-08-13 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578937#comment-16578937 ] Xiao Li commented on SPARK-24156: - [~tdas] Can we mark it done? > Enable no-data micro batches for more

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578894#comment-16578894 ] Imran Rashid commented on SPARK-24918: -- With dynamic allocation you don't have a good place to run

[jira] [Updated] (SPARK-25091) Spark Thrift Server: UNCACHE TABLE and CLEAR CACHE does not clean up executor memory

2018-08-13 Thread Yunling Cai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yunling Cai updated SPARK-25091: Priority: Critical (was: Major) > Spark Thrift Server: UNCACHE TABLE and CLEAR CACHE does not

[jira] [Updated] (SPARK-25091) Spark Thrift Server: UNCACHE TABLE and CLEAR CACHE does not clean up executor memory

2018-08-13 Thread Yunling Cai (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25091?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yunling Cai updated SPARK-25091: Component/s: (was: Spark Core) SQL > Spark Thrift Server: UNCACHE TABLE and

[jira] [Commented] (SPARK-24736) --py-files not functional for non local URLs. It appears to pass non-local URL's into PYTHONPATH directly.

2018-08-13 Thread Ilan Filonenko (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578882#comment-16578882 ] Ilan Filonenko commented on SPARK-24736: The URL, until a resource-staging-server is setup will

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578872#comment-16578872 ] Sean Owen commented on SPARK-24918: --- This is just for per-executor initialization right? What's the

[jira] [Commented] (SPARK-23984) PySpark Bindings for K8S

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578871#comment-16578871 ] Apache Spark commented on SPARK-23984: -- User 'ifilonenko' has created a pull request for this

[jira] [Comment Edited] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578858#comment-16578858 ] Imran Rashid edited comment on SPARK-24918 at 8/13/18 8:15 PM: ---

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578858#comment-16578858 ] Imran Rashid commented on SPARK-24918: -- [~lucacanali] OK I see the case for what you're proposing

[jira] [Commented] (SPARK-650) Add a "setup hook" API for running initialization code on each executor

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578856#comment-16578856 ] Imran Rashid commented on SPARK-650: Folks may be interested in SPARK-24918. perhaps one should be

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578853#comment-16578853 ] Imran Rashid commented on SPARK-24918: -- Ah, right, thanks [~vanzin], I knew I had seen this before.

[jira] [Commented] (SPARK-25079) [PYTHON] upgrade python 3.4 -> 3.5

2018-08-13 Thread shane knapp (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25079?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578851#comment-16578851 ] shane knapp commented on SPARK-25079: - question:  do we want to upgrade to 3.6 instead? > [PYTHON]

[jira] [Commented] (SPARK-22905) Fix ChiSqSelectorModel, GaussianMixtureModel save implementation for Row order issues

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22905?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578850#comment-16578850 ] Apache Spark commented on SPARK-22905: -- User 'bersprockets' has created a pull request for this

[jira] [Updated] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-13 Thread Alexis Seigneurin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Seigneurin updated SPARK-25106: -- Description: I have a fairly simple piece of code that reads from Kafka, applies some

[jira] [Updated] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-13 Thread Alexis Seigneurin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Seigneurin updated SPARK-25106: -- Description: I have a fairly simple piece of code that reads from Kafka, applies some

[jira] [Updated] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-13 Thread Alexis Seigneurin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Alexis Seigneurin updated SPARK-25106: -- Attachment: console.txt > A new Kafka consumer gets created for every batch >

[jira] [Created] (SPARK-25106) A new Kafka consumer gets created for every batch

2018-08-13 Thread Alexis Seigneurin (JIRA)
Alexis Seigneurin created SPARK-25106: - Summary: A new Kafka consumer gets created for every batch Key: SPARK-25106 URL: https://issues.apache.org/jira/browse/SPARK-25106 Project: Spark

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-08-13 Thread Eyal Farago (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578789#comment-16578789 ] Eyal Farago commented on SPARK-24410: - [~viirya], my bad :) seems there are two distinct issues

[jira] [Commented] (SPARK-24918) Executor Plugin API

2018-08-13 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24918?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578767#comment-16578767 ] Marcelo Vanzin commented on SPARK-24918: For reference: this looks kinda similar to SPARK-650.

[jira] [Commented] (SPARK-25105) Importing all of pyspark.sql.functions should bring PandasUDFType in as well

2018-08-13 Thread kevin yu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578766#comment-16578766 ] kevin yu commented on SPARK-25105: -- I will try to fix it. Thanks. Kevin > Importing all of

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-08-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578763#comment-16578763 ] Liang-Chi Hsieh commented on SPARK-24410: - The above code shows that the two tables in union

[jira] [Comment Edited] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread MIK (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578752#comment-16578752 ] MIK edited comment on SPARK-25051 at 8/13/18 6:21 PM: -- Thanks [~yumwang] , with

[jira] [Commented] (SPARK-25051) where clause on dataset gives AnalysisException

2018-08-13 Thread MIK (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25051?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578752#comment-16578752 ] MIK commented on SPARK-25051: - Thanks [~yumwang] , with 2.3.2-rc4 the error is gone now but the result is

[jira] [Updated] (SPARK-23654) Cut jets3t as a dependency of spark-core

2018-08-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23654: --- Summary: Cut jets3t as a dependency of spark-core (was: Cut jets3t and bouncy castle as

[jira] [Updated] (SPARK-23654) Cut jets3t and bouncy castle as dependencies of spark-core

2018-08-13 Thread Steve Loughran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran updated SPARK-23654: --- Summary: Cut jets3t and bouncy castle as dependencies of spark-core (was: Cut jets3t as a

[jira] [Commented] (SPARK-24735) Improve exception when mixing up pandas_udf types

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578719#comment-16578719 ] holdenk commented on SPARK-24735: - So [~bryanc]what do you think of if we add a AggregatePythonUDF and

[jira] [Commented] (SPARK-24735) Improve exception when mixing up pandas_udf types

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578710#comment-16578710 ] holdenk commented on SPARK-24735: - I think we could do better than just improving the exception, if we

[jira] [Commented] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578709#comment-16578709 ] Liang-Chi Hsieh commented on SPARK-22347: - Agreed. Thanks [~rdblue] > UDF is evaluated when

[jira] [Created] (SPARK-25105) Importing all of pyspark.sql.functions should bring PandasUDFType in as well

2018-08-13 Thread holdenk (JIRA)
holdenk created SPARK-25105: --- Summary: Importing all of pyspark.sql.functions should bring PandasUDFType in as well Key: SPARK-25105 URL: https://issues.apache.org/jira/browse/SPARK-25105 Project: Spark

[jira] [Updated] (SPARK-24735) Improve exception when mixing up pandas_udf types

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-24735: Summary: Improve exception when mixing up pandas_udf types (was: Improve exception when mixing

[jira] [Assigned] (SPARK-25104) Validate user specified output schema

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25104: Assignee: Apache Spark > Validate user specified output schema >

[jira] [Assigned] (SPARK-25104) Validate user specified output schema

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-25104: Assignee: (was: Apache Spark) > Validate user specified output schema >

[jira] [Commented] (SPARK-25104) Validate user specified output schema

2018-08-13 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578637#comment-16578637 ] Apache Spark commented on SPARK-25104: -- User 'gengliangwang' has created a pull request for this

[jira] [Commented] (SPARK-24736) --py-files not functional for non local URLs. It appears to pass non-local URL's into PYTHONPATH directly.

2018-08-13 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24736?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578624#comment-16578624 ] holdenk commented on SPARK-24736: - cc [~ifilonenko] > --py-files not functional for non local URLs. It

[jira] [Created] (SPARK-25104) Validate user specified output schema

2018-08-13 Thread Gengliang Wang (JIRA)
Gengliang Wang created SPARK-25104: -- Summary: Validate user specified output schema Key: SPARK-25104 URL: https://issues.apache.org/jira/browse/SPARK-25104 Project: Spark Issue Type:

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Updated] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan updated SPARK-22347: Fix Version/s: (was: 2.3.0) > UDF is evaluated when 'F.when' condition is false >

[jira] [Resolved] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22347. - Resolution: Won't Fix > UDF is evaluated when 'F.when' condition is false >

[jira] [Reopened] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reopened SPARK-22347: - > UDF is evaluated when 'F.when' condition is false >

[jira] [Commented] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578566#comment-16578566 ] Wenchen Fan commented on SPARK-22347: - we changed our mind during code review and this JIRA is no

[jira] [Resolved] (SPARK-25060) PySpark UDF in case statement is always run

2018-08-13 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Blue resolved SPARK-25060. --- Resolution: Won't Fix I'm closing this issue as "Won't Fix", the same as the issue this duplicates,

[jira] [Commented] (SPARK-25060) PySpark UDF in case statement is always run

2018-08-13 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578557#comment-16578557 ] Ryan Blue commented on SPARK-25060: --- Thanks, [~hyukjin.kwon], you're right that this is a duplicate.

[jira] [Commented] (SPARK-22347) UDF is evaluated when 'F.when' condition is false

2018-08-13 Thread Ryan Blue (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578549#comment-16578549 ] Ryan Blue commented on SPARK-22347: --- [~viirya], [~cloud_fan]: Is there any objection to changing the

[jira] [Updated] (SPARK-23874) Upgrade apache/arrow to 0.10.0

2018-08-13 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-23874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-23874: - Description: Version 0.10.0 will allow for the following improvements and bug fixes: * Allow

[jira] [Commented] (SPARK-25103) CompletionIterator may delay GC of completed resources

2018-08-13 Thread Eyal Farago (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-25103?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578519#comment-16578519 ] Eyal Farago commented on SPARK-25103: - CC: [~cloud_fan], [~hvanhovell] > CompletionIterator may

[jira] [Created] (SPARK-25103) CompletionIterator may delay GC of completed resources

2018-08-13 Thread Eyal Farago (JIRA)
Eyal Farago created SPARK-25103: --- Summary: CompletionIterator may delay GC of completed resources Key: SPARK-25103 URL: https://issues.apache.org/jira/browse/SPARK-25103 Project: Spark Issue

[jira] [Commented] (SPARK-24410) Missing optimization for Union on bucketed tables

2018-08-13 Thread Eyal Farago (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24410?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16578489#comment-16578489 ] Eyal Farago commented on SPARK-24410: - [~viirya], I think your conclusion about co-partitioning is

  1   2   >