[jira] [Updated] (SPARK-22469) Accuracy problem in comparison with string and numeric

2017-11-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22469: - Target Version/s: 2.2.2 Description: {code:sql} select '1.5' > 0.5; // Result is NULL

[jira] [Updated] (SPARK-21187) Complete support for remaining Spark data types in Arrow Converters

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-21187: - Description: This is to track adding the remaining type support in Arrow Converters.

[jira] [Commented] (SPARK-22522) Convert to apache-release to publish Maven artifacts to Nexus/repository.apache.org

2017-11-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253918#comment-16253918 ] Felix Cheung commented on SPARK-22522: -- sorry about the confusion - to clarify, the repo is the same

[jira] [Commented] (SPARK-3162) Train DecisionTree locally when possible

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-3162?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254081#comment-16254081 ] Apache Spark commented on SPARK-3162: - User 'smurching' has created a pull request for this issue:

[jira] [Comment Edited] (SPARK-22522) Convert to apache-release to publish Maven artifacts

2017-11-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253918#comment-16253918 ] Felix Cheung edited comment on SPARK-22522 at 11/15/17 6:33 PM: sorry

[jira] [Comment Edited] (SPARK-22522) Convert to apache-release to publish Maven artifacts

2017-11-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253918#comment-16253918 ] Felix Cheung edited comment on SPARK-22522 at 11/15/17 6:33 PM: sorry

[jira] [Commented] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22530?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254074#comment-16254074 ] Bryan Cutler commented on SPARK-22530: -- working on it > Add ArrayType Support for working with

[jira] [Created] (SPARK-22530) Add ArrayType Support for working with Pandas and Arrow

2017-11-15 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22530: Summary: Add ArrayType Support for working with Pandas and Arrow Key: SPARK-22530 URL: https://issues.apache.org/jira/browse/SPARK-22530 Project: Spark

[jira] [Commented] (SPARK-22267) Spark SQL incorrectly reads ORC file when column order is different

2017-11-15 Thread Dongjoon Hyun (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254038#comment-16254038 ] Dongjoon Hyun commented on SPARK-22267: --- [~mpetruska]. I made this issue during that PR. Please see

[jira] [Commented] (SPARK-22324) Upgrade Arrow to version 0.8.0

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254054#comment-16254054 ] Bryan Cutler commented on SPARK-22324: -- I started working on this to test out latest changes in

[jira] [Updated] (SPARK-22522) Convert to apache-release to publish Maven artifacts

2017-11-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22522: - Description: see http://www.apache.org/dev/publishing-maven-artifacts.html to publish to

[jira] [Updated] (SPARK-22522) Convert to apache-release to publish Maven artifacts

2017-11-15 Thread Felix Cheung (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Felix Cheung updated SPARK-22522: - Summary: Convert to apache-release to publish Maven artifacts (was: Convert to apache-release

[jira] [Comment Edited] (SPARK-16996) Hive ACID delta files not seen

2017-11-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253531#comment-16253531 ] Maciej Bryński edited comment on SPARK-16996 at 11/15/17 9:11 PM: -- In

[jira] [Created] (SPARK-22531) Migrate the implementation of IDF from MLLib to ML

2017-11-15 Thread Jorge Sanchez (JIRA)
Jorge Sanchez created SPARK-22531: - Summary: Migrate the implementation of IDF from MLLib to ML Key: SPARK-22531 URL: https://issues.apache.org/jira/browse/SPARK-22531 Project: Spark Issue

[jira] [Created] (SPARK-22532) Spark SQL function 'drop_duplicates' throws error when passing in a column that is an element of a struct

2017-11-15 Thread Nicholas Hakobian (JIRA)
Nicholas Hakobian created SPARK-22532: - Summary: Spark SQL function 'drop_duplicates' throws error when passing in a column that is an element of a struct Key: SPARK-22532 URL:

[jira] [Commented] (SPARK-22531) Migrate the implementation of IDF from MLLib to ML

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22531?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254271#comment-16254271 ] Apache Spark commented on SPARK-22531: -- User 'flowcont' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22531) Migrate the implementation of IDF from MLLib to ML

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22531: Assignee: (was: Apache Spark) > Migrate the implementation of IDF from MLLib to ML >

[jira] [Assigned] (SPARK-22531) Migrate the implementation of IDF from MLLib to ML

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22531?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22531: Assignee: Apache Spark > Migrate the implementation of IDF from MLLib to ML >

[jira] [Commented] (SPARK-22479) SaveIntoDataSourceCommand logs jdbc credentials

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254424#comment-16254424 ] Apache Spark commented on SPARK-22479: -- User 'onursatici' has created a pull request for this issue:

[jira] [Comment Edited] (SPARK-16996) Hive ACID delta files not seen

2017-11-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253531#comment-16253531 ] Maciej Bryński edited comment on SPARK-16996 at 11/15/17 9:06 PM: -- In

[jira] [Created] (SPARK-22533) SparkConfigProvider does not handle deprecated config keys

2017-11-15 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-22533: -- Summary: SparkConfigProvider does not handle deprecated config keys Key: SPARK-22533 URL: https://issues.apache.org/jira/browse/SPARK-22533 Project: Spark

[jira] [Assigned] (SPARK-20649) Simplify REST API class hierarchy

2017-11-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid reassigned SPARK-20649: Assignee: Marcelo Vanzin > Simplify REST API class hierarchy >

[jira] [Resolved] (SPARK-20649) Simplify REST API class hierarchy

2017-11-15 Thread Imran Rashid (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Imran Rashid resolved SPARK-20649. -- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19748

[jira] [Assigned] (SPARK-22533) SparkConfigProvider does not handle deprecated config keys

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22533: Assignee: Apache Spark > SparkConfigProvider does not handle deprecated config keys >

[jira] [Assigned] (SPARK-22533) SparkConfigProvider does not handle deprecated config keys

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22533: Assignee: (was: Apache Spark) > SparkConfigProvider does not handle deprecated config

[jira] [Commented] (SPARK-22533) SparkConfigProvider does not handle deprecated config keys

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22533?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254346#comment-16254346 ] Apache Spark commented on SPARK-22533: -- User 'vanzin' has created a pull request for this issue:

[jira] [Comment Edited] (SPARK-16996) Hive ACID delta files not seen

2017-11-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253531#comment-16253531 ] Maciej Bryński edited comment on SPARK-16996 at 11/15/17 9:09 PM: -- In

[jira] [Commented] (SPARK-22491) union all can't execute parallel with group by

2017-11-15 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254431#comment-16254431 ] Liang-Chi Hsieh commented on SPARK-22491: - For the query without aggregation, the exchanges are

[jira] [Created] (SPARK-22525) Spark download page doesn't update package name based package type

2017-11-15 Thread holdenk (JIRA)
holdenk created SPARK-22525: --- Summary: Spark download page doesn't update package name based package type Key: SPARK-22525 URL: https://issues.apache.org/jira/browse/SPARK-22525 Project: Spark

[jira] [Created] (SPARK-22524) Subquery on UI appear as the same node even if it's not reused

2017-11-15 Thread Chenzhao Guo (JIRA)
Chenzhao Guo created SPARK-22524: Summary: Subquery on UI appear as the same node even if it's not reused Key: SPARK-22524 URL: https://issues.apache.org/jira/browse/SPARK-22524 Project: Spark

[jira] [Commented] (SPARK-22524) Subquery on UI appear as the same node even if it's not reused

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253104#comment-16253104 ] Apache Spark commented on SPARK-22524: -- User 'gczsjdy' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22524) Subquery on UI appear as the same node even if it's not reused

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22524: Assignee: Apache Spark > Subquery on UI appear as the same node even if it's not reused >

[jira] [Assigned] (SPARK-22524) Subquery on UI appear as the same node even if it's not reused

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22524: Assignee: (was: Apache Spark) > Subquery on UI appear as the same node even if it's

[jira] [Comment Edited] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254450#comment-16254450 ] Bryan Cutler edited comment on SPARK-22534 at 11/15/17 11:37 PM: - Opened

[jira] [Resolved] (SPARK-21842) Support Kerberos ticket renewal and creation in Mesos

2017-11-15 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin resolved SPARK-21842. Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19272

[jira] [Created] (SPARK-22535) PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker

2017-11-15 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-22535: Summary: PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker Key: SPARK-22535 URL:

[jira] [Created] (SPARK-22536) VectorizedParquetRecordReader doesn't use Parquet's dictionary filtering feature

2017-11-15 Thread Ivan Gozali (JIRA)
Ivan Gozali created SPARK-22536: --- Summary: VectorizedParquetRecordReader doesn't use Parquet's dictionary filtering feature Key: SPARK-22536 URL: https://issues.apache.org/jira/browse/SPARK-22536

[jira] [Commented] (SPARK-22517) NullPointerException in ShuffleExternalSorter.spill()

2017-11-15 Thread Kazuaki Ishizaki (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254706#comment-16254706 ] Kazuaki Ishizaki commented on SPARK-22517: -- Could you please post a program that can reproduce

[jira] [Assigned] (SPARK-21842) Support Kerberos ticket renewal and creation in Mesos

2017-11-15 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Vanzin reassigned SPARK-21842: -- Assignee: Arthur Rand > Support Kerberos ticket renewal and creation in Mesos >

[jira] [Commented] (SPARK-22535) PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254549#comment-16254549 ] Apache Spark commented on SPARK-22535: -- User 'zsxwing' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22535) PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22535: Assignee: Apache Spark > PythonRunner.MonitorThread should give the task a little time to

[jira] [Assigned] (SPARK-22535) PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22535: Assignee: (was: Apache Spark) > PythonRunner.MonitorThread should give the task a

[jira] [Commented] (SPARK-22528) History service and non-HDFS filesystems

2017-11-15 Thread Saisai Shao (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254669#comment-16254669 ] Saisai Shao commented on SPARK-22528: - Hi [~adobe_pmackles] I think this issue might be due to the

[jira] [Resolved] (SPARK-22523) Janino throws StackOverflowError on nested structs with many fields

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22523. --- Resolution: Duplicate > Janino throws StackOverflowError on nested structs with many fields >

[jira] [Created] (SPARK-22539) Add second order for rangepartitioner since partition number may be small if the specified key is skewed

2017-11-15 Thread zhoukang (JIRA)
zhoukang created SPARK-22539: Summary: Add second order for rangepartitioner since partition number may be small if the specified key is skewed Key: SPARK-22539 URL: https://issues.apache.org/jira/browse/SPARK-22539

[jira] [Assigned] (SPARK-22539) Add second order for rangepartitioner since partition number may be small if the specified key is skewed

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22539: Assignee: Apache Spark > Add second order for rangepartitioner since partition number may

[jira] [Updated] (SPARK-22539) Add second order for rangepartitioner since partition number may be small if the specified key is skewed

2017-11-15 Thread zhoukang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhoukang updated SPARK-22539: - Description: The rangepartitioner generated from shuffle exchange may cause partiton skew if sort key

[jira] [Assigned] (SPARK-22539) Add second order for rangepartitioner since partition number may be small if the specified key is skewed

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22539: Assignee: (was: Apache Spark) > Add second order for rangepartitioner since partition

[jira] [Commented] (SPARK-22539) Add second order for rangepartitioner since partition number may be small if the specified key is skewed

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254867#comment-16254867 ] Apache Spark commented on SPARK-22539: -- User 'caneGuy' has created a pull request for this issue:

[jira] [Updated] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-15 Thread MBA Learns to Code (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MBA Learns to Code updated SPARK-22538: --- Description: When running the below code on PySpark v2.2.0, the cached input

[jira] [Updated] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-15 Thread MBA Learns to Code (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MBA Learns to Code updated SPARK-22538: --- Description: When running the below code on PySpark v2.2.0, the cached input

[jira] [Assigned] (SPARK-22535) PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker

2017-11-15 Thread Takuya Ueshin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin reassigned SPARK-22535: - Assignee: Shixiong Zhu > PythonRunner.MonitorThread should give the task a little time

[jira] [Resolved] (SPARK-22535) PythonRunner.MonitorThread should give the task a little time to finish before killing the python worker

2017-11-15 Thread Takuya Ueshin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-22535. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19762

[jira] [Commented] (SPARK-16996) Hive ACID delta files not seen

2017-11-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254821#comment-16254821 ] Maciej Bryński commented on SPARK-16996: After research I think Spark 2.2 in HDP is using older

[jira] [Commented] (SPARK-18406) Race between end-of-task and completion iterator read lock release

2017-11-15 Thread david zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-18406?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254839#comment-16254839 ] david zhang commented on SPARK-18406: - we still find this issue in Spark 2.2, I check this case

[jira] [Commented] (SPARK-22537) Aggregation of map output statistics on driver faces single point bottleneck

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22537?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16254730#comment-16254730 ] Apache Spark commented on SPARK-22537: -- User 'gczsjdy' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22537) Aggregation of map output statistics on driver faces single point bottleneck

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22537: Assignee: (was: Apache Spark) > Aggregation of map output statistics on driver faces

[jira] [Created] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-15 Thread MBA Learns to Code (JIRA)
MBA Learns to Code created SPARK-22538: -- Summary: SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame Key: SPARK-22538 URL: https://issues.apache.org/jira/browse/SPARK-22538

[jira] [Created] (SPARK-22537) Aggregation of map output statistics on driver faces single point bottleneck

2017-11-15 Thread Chenzhao Guo (JIRA)
Chenzhao Guo created SPARK-22537: Summary: Aggregation of map output statistics on driver faces single point bottleneck Key: SPARK-22537 URL: https://issues.apache.org/jira/browse/SPARK-22537

[jira] [Assigned] (SPARK-22537) Aggregation of map output statistics on driver faces single point bottleneck

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22537?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22537: Assignee: Apache Spark > Aggregation of map output statistics on driver faces single

[jira] [Updated] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-15 Thread MBA Learns to Code (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MBA Learns to Code updated SPARK-22538: --- Description: When running the below code on PySpark v2.2.0, the cached input

[jira] [Updated] (SPARK-22538) SQLTransformer.transform(inputDataFrame) uncaches inputDataFrame

2017-11-15 Thread MBA Learns to Code (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22538?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] MBA Learns to Code updated SPARK-22538: --- Description: When running the below code on PySpark v2.2.0, the cached input

[jira] [Updated] (SPARK-22529) Relation stats should be consistent with other plans based on cbo config

2017-11-15 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22529: - Description: Currently, relation stats is the same whether cbo is enabled or not. While relation

[jira] [Updated] (SPARK-22529) Relation stats should be consistent with other plans based on cbo config

2017-11-15 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22529: - Summary: Relation stats should be consistent with other plans based on cbo config (was: Only

[jira] [Created] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
Bryan Cutler created SPARK-22534: Summary: Add integration test case to explicitly verify optional validity buffer Key: SPARK-22534 URL: https://issues.apache.org/jira/browse/SPARK-22534 Project:

[jira] [Closed] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler closed SPARK-22534. > Add integration test case to explicitly verify optional validity buffer >

[jira] [Resolved] (SPARK-22534) Add integration test case to explicitly verify optional validity buffer

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22534?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler resolved SPARK-22534. -- Resolution: Not A Problem Opened by mistake > Add integration test case to explicitly verify

[jira] [Updated] (SPARK-22484) PySpark DataFrame.write.csv(quote="") uses nullchar as quote

2017-11-15 Thread Bryan Cutler (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bryan Cutler updated SPARK-22484: - Component/s: PySpark > PySpark DataFrame.write.csv(quote="") uses nullchar as quote >

[jira] [Commented] (SPARK-22267) Spark SQL incorrectly reads ORC file when column order is different

2017-11-15 Thread Mark Petruska (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253281#comment-16253281 ] Mark Petruska commented on SPARK-22267: --- Investigated the issue; unfortunately did not find any

[jira] [Updated] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mohamed imran updated SPARK-22526: -- Description: Hi, I am using Spark 2.2.0(recent version) to read binary files from S3. I use

[jira] [Updated] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mohamed imran updated SPARK-22526: -- Description: Hi, I am using Spark 2.2.0(recent version) to read binary files from S3. I use

[jira] [Created] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)
mohamed imran created SPARK-22526: - Summary: Spark hangs while reading binary files from S3 Key: SPARK-22526 URL: https://issues.apache.org/jira/browse/SPARK-22526 Project: Spark Issue Type:

[jira] [Updated] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread mohamed imran (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] mohamed imran updated SPARK-22526: -- Description: Hi, I am using Spark 2.2.0(recent version) to read binary files from S3. I use

[jira] [Commented] (SPARK-22526) Spark hangs while reading binary files from S3

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253694#comment-16253694 ] Sean Owen commented on SPARK-22526: --- It's not clear what you're reporting. Are you saying it's an Avro

[jira] [Updated] (SPARK-22529) Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled

2017-11-15 Thread Zhenhua Wang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhenhua Wang updated SPARK-22529: - Summary: Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled (was: No

[jira] [Assigned] (SPARK-22529) Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22529: Assignee: Apache Spark > Only sizeInBytes in catalog stats needs to be propagated when

[jira] [Assigned] (SPARK-22529) Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22529: Assignee: (was: Apache Spark) > Only sizeInBytes in catalog stats needs to be

[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253554#comment-16253554 ] Sean Owen commented on SPARK-19371: --- Yes, I get why you want to spread the partitions. It doesn't

[jira] [Commented] (SPARK-16996) Hive ACID delta files not seen

2017-11-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253531#comment-16253531 ] Maciej Bryński commented on SPARK-16996: In Spark 2.2 even major compaction doesn't help. Any

[jira] [Comment Edited] (SPARK-16996) Hive ACID delta files not seen

2017-11-15 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-16996?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253531#comment-16253531 ] Maciej Bryński edited comment on SPARK-16996 at 11/15/17 2:38 PM: -- In

[jira] [Created] (SPARK-22529) No need to propagate catalog stats when cbo is disabled

2017-11-15 Thread Zhenhua Wang (JIRA)
Zhenhua Wang created SPARK-22529: Summary: No need to propagate catalog stats when cbo is disabled Key: SPARK-22529 URL: https://issues.apache.org/jira/browse/SPARK-22529 Project: Spark

[jira] [Commented] (SPARK-22522) Convert to apache-release to publish Maven artifacts to Nexus/repository.apache.org

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253540#comment-16253540 ] Sean Owen commented on SPARK-22522: --- I confess I'm not sure about this, but there may be two distinct

[jira] [Commented] (SPARK-22529) Only sizeInBytes in catalog stats needs to be propagated when cbo is disabled

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253545#comment-16253545 ] Apache Spark commented on SPARK-22529: -- User 'wzhfy' has created a pull request for this issue:

[jira] [Created] (SPARK-22527) Reuse coordinated ShuffleExchange if possible

2017-11-15 Thread Liang-Chi Hsieh (JIRA)
Liang-Chi Hsieh created SPARK-22527: --- Summary: Reuse coordinated ShuffleExchange if possible Key: SPARK-22527 URL: https://issues.apache.org/jira/browse/SPARK-22527 Project: Spark Issue

[jira] [Updated] (SPARK-22528) History service and non-HDFS filesystems

2017-11-15 Thread paul mackles (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] paul mackles updated SPARK-22528: - Description: We are using Azure Data Lake (ADL) to store our event logs. This worked fine in

[jira] [Resolved] (SPARK-22514) move ColumnVector.Array and ColumnarBatch.Row to individual files

2017-11-15 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-22514. - Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 19740

[jira] [Created] (SPARK-22528) History service and non-HDFS filesystems

2017-11-15 Thread paul mackles (JIRA)
paul mackles created SPARK-22528: Summary: History service and non-HDFS filesystems Key: SPARK-22528 URL: https://issues.apache.org/jira/browse/SPARK-22528 Project: Spark Issue Type: Bug

[jira] [Assigned] (SPARK-22527) Reuse coordinated ShuffleExchange if possible

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22527: Assignee: Apache Spark > Reuse coordinated ShuffleExchange if possible >

[jira] [Assigned] (SPARK-22527) Reuse coordinated ShuffleExchange if possible

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-22527: Assignee: (was: Apache Spark) > Reuse coordinated ShuffleExchange if possible >

[jira] [Commented] (SPARK-22527) Reuse coordinated ShuffleExchange if possible

2017-11-15 Thread Apache Spark (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253446#comment-16253446 ] Apache Spark commented on SPARK-22527: -- User 'viirya' has created a pull request for this issue:

[jira] [Assigned] (SPARK-22525) Spark download page doesn't update package name based package type

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-22525: - Assignee: Sean Owen https://github.com/apache/spark-website/pull/76 > Spark download page

[jira] [Issue Comment Deleted] (SPARK-22525) Spark download page doesn't update package name based package type

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-22525: -- Comment: was deleted (was: https://github.com/apache/spark-website/pull/76) > Spark download page

[jira] [Updated] (SPARK-10915) Add support for UDAFs in Python

2017-11-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk updated SPARK-10915: Affects Version/s: 2.3.0 > Add support for UDAFs in Python > --- > >

[jira] [Reopened] (SPARK-6802) User Defined Aggregate Function Refactoring

2017-11-15 Thread Holden Karau (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Holden Karau reopened SPARK-6802: - Now that the Arrow accelerated UDFs are in https://issues.apache.org/jira/browse/SPARK-21404 maybe

[jira] [Commented] (SPARK-22528) History service and non-HDFS filesystems

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253577#comment-16253577 ] Sean Owen commented on SPARK-22528: --- This sounds like a question for the mailing list. > History

[jira] [Resolved] (SPARK-22525) Spark download page doesn't update package name based package type

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-22525. --- Resolution: Fixed Assignee: holdenk (was: Sean Owen) Fix Version/s: 2.3.0 Resolved

[jira] [Commented] (SPARK-6802) User Defined Aggregate Function Refactoring

2017-11-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253634#comment-16253634 ] holdenk commented on SPARK-6802: Oh wait sorry I re-opened the wrong issue. > User Defined Aggregate

[jira] [Commented] (SPARK-22274) Used-defined aggregation functions with pandas udf

2017-11-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16253635#comment-16253635 ] holdenk commented on SPARK-22274: - See also the discussion in

[jira] [Resolved] (SPARK-6802) User Defined Aggregate Function Refactoring

2017-11-15 Thread holdenk (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] holdenk resolved SPARK-6802. Resolution: Fixed > User Defined Aggregate Function Refactoring >

[jira] [Assigned] (SPARK-22422) Add Adjusted R2 to RegressionMetrics

2017-11-15 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-22422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-22422: - Assignee: Teng Peng > Add Adjusted R2 to RegressionMetrics >

  1   2   >