[jira] [Created] (SPARK-21682) Caching 100k-task RDD GC-kills driver due to updatedBlockStatuses

2017-08-09 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-21682: - Summary: Caching 100k-task RDD GC-kills driver due to updatedBlockStatuses Key: SPARK-21682 URL: https://issues.apache.org/jira/browse/SPARK-21682 Project: Spark

[jira] [Commented] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120393#comment-16120393 ] Shixiong Zhu commented on SPARK-21453: -- The error message looks like the Kafka broker storing the

[jira] [Comment Edited] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120393#comment-16120393 ] Shixiong Zhu edited comment on SPARK-21453 at 8/9/17 6:13 PM: -- The error

[jira] [Updated] (SPARK-21656) spark dynamic allocation should not idle timeout executors when tasks still to run

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21656: -- Target Version/s: (was: 2.1.1) Fix Version/s: (was: 2.1.1) > spark dynamic allocation

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120115#comment-16120115 ] Peng Meng commented on SPARK-21680: --- Then we will have two toSparse: toSparse and toSparse(size) Do

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120128#comment-16120128 ] Sean Owen commented on SPARK-21680: --- Yes, the latter should be private and the former calls it too, I

[jira] [Created] (SPARK-21683) "TaskKilled (another attempt succeeded)" log message should be INFO level, not WARN

2017-08-09 Thread Ryan Williams (JIRA)
Ryan Williams created SPARK-21683: - Summary: "TaskKilled (another attempt succeeded)" log message should be INFO level, not WARN Key: SPARK-21683 URL: https://issues.apache.org/jira/browse/SPARK-21683

[jira] [Commented] (SPARK-20642) Use key-value store to keep History Server application listing

2017-08-09 Thread Marcelo Vanzin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20642?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120398#comment-16120398 ] Marcelo Vanzin commented on SPARK-20642: PR: https://github.com/apache/spark/pull/18887 > Use

[jira] [Updated] (SPARK-21682) Caching 100k-task RDD GC-kills driver due to updatedBlockStatuses

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Williams updated SPARK-21682: -- Affects Version/s: 2.0.2 > Caching 100k-task RDD GC-kills driver due to updatedBlockStatuses >

[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver due to updatedBlockStatuses

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120409#comment-16120409 ] Ryan Williams commented on SPARK-21682: --- Interestingly, I thought the {{updatedBlockStatuses}}

[jira] [Updated] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Williams updated SPARK-21682: -- Description: h3. Summary * * {{internal.metrics.updatedBlockStatuses}} breaks a contract

[jira] [Updated] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Williams updated SPARK-21682: -- Summary: Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?) (was: Caching

[jira] [Updated] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ryan Williams updated SPARK-21682: -- Description: h3. Summary * {{sc.parallelize(1 to 10, 10).cache.count}} causes a

[jira] [Comment Edited] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120409#comment-16120409 ] Ryan Williams edited comment on SPARK-21682 at 8/9/17 6:28 PM: ---

[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120415#comment-16120415 ] Shixiong Zhu commented on SPARK-21682: -- I agree that driver is a bottleneck. I already saw several

[jira] [Created] (SPARK-21684) df.write double escaping all the already escaped characters except the first one

2017-08-09 Thread Taran Saini (JIRA)
Taran Saini created SPARK-21684: --- Summary: df.write double escaping all the already escaped characters except the first one Key: SPARK-21684 URL: https://issues.apache.org/jira/browse/SPARK-21684

[jira] [Commented] (SPARK-21667) ConsoleSink should not fail streaming query with checkpointLocation option

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120423#comment-16120423 ] Shixiong Zhu commented on SPARK-21667: -- Do you mind to submit a PR to fix it? > ConsoleSink should

[jira] [Updated] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-09 Thread Vincent (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21688?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Vincent updated SPARK-21688: Attachment: mllib svm training.png > performance improvement in mllib SVM with native BLAS >

[jira] [Created] (SPARK-21687) Spark SQL should set createTime for Hive partition

2017-08-09 Thread Chaozhong Yang (JIRA)
Chaozhong Yang created SPARK-21687: -- Summary: Spark SQL should set createTime for Hive partition Key: SPARK-21687 URL: https://issues.apache.org/jira/browse/SPARK-21687 Project: Spark Issue

[jira] [Closed] (SPARK-20762) Make String Params Case-Insensitive

2017-08-09 Thread zhengruifeng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng closed SPARK-20762. Resolution: Not A Problem > Make String Params Case-Insensitive >

[jira] [Commented] (SPARK-21676) cannot compile on hadoop 2.2.0 and hive

2017-08-09 Thread Qinghe Jin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16121100#comment-16121100 ] Qinghe Jin commented on SPARK-21676: Thanks! > cannot compile on hadoop 2.2.0 and hive >

[jira] [Created] (SPARK-21688) performance improvement in mllib SVM with native BLAS

2017-08-09 Thread Vincent (JIRA)
Vincent created SPARK-21688: --- Summary: performance improvement in mllib SVM with native BLAS Key: SPARK-21688 URL: https://issues.apache.org/jira/browse/SPARK-21688 Project: Spark Issue Type:

[jira] [Comment Edited] (SPARK-14540) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner

2017-08-09 Thread Lukas Rytz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119524#comment-16119524 ] Lukas Rytz edited comment on SPARK-14540 at 8/9/17 7:34 AM: [~joshrosen] the

[jira] [Resolved] (SPARK-21662) modify the appname to [SparkSQL::localHostName] instead of [SparkSQL::lP]

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21662. --- Resolution: Not A Problem > modify the appname to [SparkSQL::localHostName] instead of

[jira] [Updated] (SPARK-21662) modify the appname to [SparkSQL::localHostName] instead of [SparkSQL::lP]

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen updated SPARK-21662: -- Priority: Trivial (was: Critical) > modify the appname to [SparkSQL::localHostName] instead of

[jira] [Updated] (SPARK-21034) Allow filter pushdown filters through non deterministic functions for columns involved in groupby / join

2017-08-09 Thread Abhijit Bhole (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhijit Bhole updated SPARK-21034: -- Summary: Allow filter pushdown filters through non deterministic functions for columns

[jira] [Updated] (SPARK-21034) Allow filter pushdown filters through non deterministic functions for columns involved in groupby / join

2017-08-09 Thread Abhijit Bhole (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Abhijit Bhole updated SPARK-21034: -- Description: If the column is involved in aggregation / join then pushing down filter should

[jira] [Commented] (SPARK-21651) Detect MapType in Json InferSchema

2017-08-09 Thread Jochen Niebuhr (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119501#comment-16119501 ] Jochen Niebuhr commented on SPARK-21651: Specifying the Schema myself would mean I'll have to

[jira] [Commented] (SPARK-14540) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner

2017-08-09 Thread Lukas Rytz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119524#comment-16119524 ] Lukas Rytz commented on SPARK-14540: [~joshrosen] the closure in your last example is serializable

[jira] [Commented] (SPARK-21651) Detect MapType in Json InferSchema

2017-08-09 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119484#comment-16119484 ] Takeshi Yamamuro commented on SPARK-21651: -- Specifying a schema by yourself is not enough for

[jira] [Resolved] (SPARK-21523) Fix bug of strong wolfe linesearch `init` parameter lose effectiveness

2017-08-09 Thread Yanbo Liang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yanbo Liang resolved SPARK-21523. - Resolution: Fixed Assignee: Weichen Xu Fix Version/s: 2.3.0

[jira] [Commented] (SPARK-19109) ORC metadata section can sometimes exceed protobuf message size limit

2017-08-09 Thread sydt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119546#comment-16119546 ] sydt commented on SPARK-19109: -- I meet this problem and resolved .I re-complie source code of

[jira] [Commented] (SPARK-19019) PySpark does not work with Python 3.6.0

2017-08-09 Thread sydt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119553#comment-16119553 ] sydt commented on SPARK-19019: -- I meet this problem and resolved .I re-complie source code of

[jira] [Commented] (SPARK-20901) Feature parity for ORC with Parquet

2017-08-09 Thread sydt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-20901?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119552#comment-16119552 ] sydt commented on SPARK-20901: -- about SPARK-19019,I resolved it I meet this problem and resolved .I

[jira] [Issue Comment Deleted] (SPARK-19019) PySpark does not work with Python 3.6.0

2017-08-09 Thread sydt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] sydt updated SPARK-19019: - Comment: was deleted (was: I meet this problem and resolved .I re-complie source code of

[jira] [Comment Edited] (SPARK-21651) Detect MapType in Json InferSchema

2017-08-09 Thread Jochen Niebuhr (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119450#comment-16119450 ] Jochen Niebuhr edited comment on SPARK-21651 at 8/9/17 6:03 AM: Ok,

[jira] [Resolved] (SPARK-21596) Audit the places calling HDFSMetadataLog.get

2017-08-09 Thread Shixiong Zhu (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21596?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Shixiong Zhu resolved SPARK-21596. -- Resolution: Fixed Assignee: Shixiong Zhu Fix Version/s: 2.3.0

[jira] [Comment Edited] (SPARK-14540) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner

2017-08-09 Thread Lukas Rytz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119524#comment-16119524 ] Lukas Rytz edited comment on SPARK-14540 at 8/9/17 7:37 AM: [~joshrosen] the

[jira] [Comment Edited] (SPARK-14540) Support Scala 2.12 closures and Java 8 lambdas in ClosureCleaner

2017-08-09 Thread Lukas Rytz (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119524#comment-16119524 ] Lukas Rytz edited comment on SPARK-14540 at 8/9/17 7:37 AM: [~joshrosen] the

[jira] [Comment Edited] (SPARK-19109) ORC metadata section can sometimes exceed protobuf message size limit

2017-08-09 Thread sydt (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19109?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119546#comment-16119546 ] sydt edited comment on SPARK-19109 at 8/9/17 8:05 AM: -- I meet this problem and

[jira] [Created] (SPARK-21675) Add a navigation bar at the bottom of the Details for Stage Page

2017-08-09 Thread Kent Yao (JIRA)
Kent Yao created SPARK-21675: Summary: Add a navigation bar at the bottom of the Details for Stage Page Key: SPARK-21675 URL: https://issues.apache.org/jira/browse/SPARK-21675 Project: Spark

[jira] [Updated] (SPARK-21520) Improvement a special case for non-deterministic projects and filters in optimizer

2017-08-09 Thread caoxuewen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caoxuewen updated SPARK-21520: -- Description: Currently, Did a lot of special handling for non-deterministic projects and filters in

[jira] [Commented] (SPARK-21520) Improvement a special case for non-deterministic projects and filters in optimizer

2017-08-09 Thread caoxuewen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119617#comment-16119617 ] caoxuewen commented on SPARK-21520: --- User 'heary-cao' has created a pull request for this issue:

[jira] [Updated] (SPARK-21520) Improvement a special case for non-deterministic projects and filters in optimizer

2017-08-09 Thread caoxuewen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] caoxuewen updated SPARK-21520: -- Summary: Improvement a special case for non-deterministic projects and filters in optimizer (was:

[jira] [Assigned] (SPARK-21663) MapOutputTrackerSuite case test("remote fetch below max RPC message size") should call stop

2017-08-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-21663: --- Assignee: wangjiaochun > MapOutputTrackerSuite case test("remote fetch below max RPC

[jira] [Created] (SPARK-21676) cannot compile on hadoop 2.2.0 and hive

2017-08-09 Thread Qinghe Jin (JIRA)
Qinghe Jin created SPARK-21676: -- Summary: cannot compile on hadoop 2.2.0 and hive Key: SPARK-21676 URL: https://issues.apache.org/jira/browse/SPARK-21676 Project: Spark Issue Type: Bug

[jira] [Resolved] (SPARK-21663) MapOutputTrackerSuite case test("remote fetch below max RPC message size") should call stop

2017-08-09 Thread Wenchen Fan (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-21663. - Resolution: Fixed Fix Version/s: 2.3.0 2.2.1 Issue resolved by pull

[jira] [Resolved] (SPARK-21676) cannot compile on hadoop 2.2.0 and hive

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21676. --- Resolution: Invalid Hadoop 2.2 is not supported > cannot compile on hadoop 2.2.0 and hive >

[jira] [Updated] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-09 Thread Ratan Rai Sur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratan Rai Sur updated SPARK-21685: -- Description: I'm trying to write a PySpark wrapper for a Transformer whose transform method

[jira] [Created] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-09 Thread Ratan Rai Sur (JIRA)
Ratan Rai Sur created SPARK-21685: - Summary: Params isSet in scala Transformer triggered by _setDefault in pyspark Key: SPARK-21685 URL: https://issues.apache.org/jira/browse/SPARK-21685 Project:

[jira] [Updated] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-09 Thread Ratan Rai Sur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratan Rai Sur updated SPARK-21685: -- Description: I'm trying to write a PySpark wrapper for a Transformer whose transform method

[jira] [Commented] (SPARK-21667) ConsoleSink should not fail streaming query with checkpointLocation option

2017-08-09 Thread Jacek Laskowski (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120576#comment-16120576 ] Jacek Laskowski commented on SPARK-21667: - Oh, what an offer! Couldn't have thought of a better

[jira] [Created] (SPARK-21686) spark.sql.hive.convertMetastoreOrc is causing NullPointerException while reading ORC tables

2017-08-09 Thread Ernani Pereira de Mattos Junior (JIRA)
Ernani Pereira de Mattos Junior created SPARK-21686: --- Summary: spark.sql.hive.convertMetastoreOrc is causing NullPointerException while reading ORC tables Key: SPARK-21686 URL:

[jira] [Updated] (SPARK-21685) Params isSet in scala Transformer triggered by _setDefault in pyspark

2017-08-09 Thread Ratan Rai Sur (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ratan Rai Sur updated SPARK-21685: -- Description: I'm trying to write a PySpark wrapper for a Transformer whose transform method

[jira] [Resolved] (SPARK-21587) Filter pushdown for EventTime Watermark Operator

2017-08-09 Thread Tathagata Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21587?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tathagata Das resolved SPARK-21587. --- Resolution: Fixed Fix Version/s: 3.0.0 Issue resolved by pull request 18790

[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread Ryan Williams (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120537#comment-16120537 ] Ryan Williams commented on SPARK-21682: --- bq. But do you really need to create so many partitions?

[jira] [Commented] (SPARK-19116) LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file

2017-08-09 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120643#comment-16120643 ] Andrew Ash commented on SPARK-19116: Ah yes, for files it seems like Spark currently uses size of the

[jira] [Closed] (SPARK-19116) LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file

2017-08-09 Thread Andrew Ash (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-19116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Andrew Ash closed SPARK-19116. -- Resolution: Not A Problem > LogicalPlan.statistics.sizeInBytes wrong for trivial parquet file >

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120136#comment-16120136 ] Peng Meng commented on SPARK-21680: --- Ok, thanks, I will submit a PR. > ML/MLLIB Vector compressed

[jira] [Assigned] (SPARK-21276) Update lz4-java to remove custom LZ4BlockInputStream

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21276: - Assignee: Takeshi Yamamuro > Update lz4-java to remove custom LZ4BlockInputStream >

[jira] [Resolved] (SPARK-21276) Update lz4-java to remove custom LZ4BlockInputStream

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21276. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18883

[jira] [Commented] (SPARK-21624) Optimize communication cost of RF/GBT/DT

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120105#comment-16120105 ] Peng Meng commented on SPARK-21624: --- Hi [~mlnick], how do you think about this:

[jira] [Created] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
Peng Meng created SPARK-21680: - Summary: ML/MLLIB Vector compressed optimization Key: SPARK-21680 URL: https://issues.apache.org/jira/browse/SPARK-21680 Project: Spark Issue Type: Improvement

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120109#comment-16120109 ] Sean Owen commented on SPARK-21680: --- You definitely want to avoid duplicating the code, but could

[jira] [Resolved] (SPARK-21504) Add spark version info in table metadata

2017-08-09 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-21504. - Resolution: Fixed Fix Version/s: 2.3.0 > Add spark version info in table metadata >

[jira] [Commented] (SPARK-17557) SQL query on parquet table java.lang.UnsupportedOperationException: org.apache.parquet.column.values.dictionary.PlainValuesDictionary

2017-08-09 Thread Steve Drew (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-17557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120181#comment-16120181 ] Steve Drew commented on SPARK-17557: Hi, This issue is still happening (spark 2.1.1). Hopefully

[jira] [Created] (SPARK-21681) MLOR do not work correctly when featureStd contains zero

2017-08-09 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-21681: -- Summary: MLOR do not work correctly when featureStd contains zero Key: SPARK-21681 URL: https://issues.apache.org/jira/browse/SPARK-21681 Project: Spark Issue

[jira] [Resolved] (SPARK-21551) pyspark's collect fails when getaddrinfo is too slow

2017-08-09 Thread Reynold Xin (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21551?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Reynold Xin resolved SPARK-21551. - Resolution: Fixed Assignee: peay Fix Version/s: 2.3.0 > pyspark's collect fails

[jira] [Resolved] (SPARK-14932) Allow DataFrame.replace() to replace values with None

2017-08-09 Thread Xiao Li (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-14932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Li resolved SPARK-14932. - Resolution: Fixed Assignee: Bravo Zhang Fix Version/s: 2.3.0 > Allow DataFrame.replace()

[jira] [Commented] (SPARK-21590) Structured Streaming window start time should support negative values to adjust time zone

2017-08-09 Thread Kevin Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21590?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120924#comment-16120924 ] Kevin Zhang commented on SPARK-21590: - Thanks, I'd like to work on this. I agree the requirement

[jira] [Commented] (SPARK-21245) Resolve code duplication for classification/regression summarizers

2017-08-09 Thread Bravo Zhang (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120930#comment-16120930 ] Bravo Zhang commented on SPARK-21245: - User 'bravo-zhang' has created a pull request for this issue:

[jira] [Commented] (SPARK-21680) ML/MLLIB Vector compressed optimization

2017-08-09 Thread Peng Meng (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120947#comment-16120947 ] Peng Meng commented on SPARK-21680: --- Hi [~srowen], if add toSparse(size), for secure reason, it is

[jira] [Commented] (SPARK-21682) Caching 100k-task RDD GC-kills driver (due to updatedBlockStatuses?)

2017-08-09 Thread DjvuLee (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120959#comment-16120959 ] DjvuLee commented on SPARK-21682: - Yes, our company also faced with this scalability problem, the driver

[jira] [Assigned] (SPARK-21665) Need to close resources after use

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reassigned SPARK-21665: - Assignee: Vinod KC Priority: Trivial (was: Minor) > Need to close resources after use >

[jira] [Reopened] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taran Saini reopened SPARK-21678: - > Disabling quotes while writing a dataframe > -- > >

[jira] [Updated] (SPARK-21679) KMeans Clustering is Not Deterministic

2017-08-09 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Brücke updated SPARK-21679: - Description: I’m trying to figure out how to use KMeans in order to achieve reproducible

[jira] [Updated] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taran Saini updated SPARK-21678: Description: Hi, I have the my dataframe cloumn values which can contain commas, double quotes

[jira] [Updated] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taran Saini updated SPARK-21678: Description: Hi, I have the my dataframe cloumn values which can contain commas, double quotes

[jira] [Resolved] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21678. --- Resolution: Invalid Questions belong on the mailing list or SO. > Disabling quotes while writing a

[jira] [Resolved] (SPARK-21665) Need to close resources after use

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21665. --- Resolution: Fixed Fix Version/s: 2.3.0 Issue resolved by pull request 18880

[jira] [Commented] (SPARK-21677) json_tuple throws NullPointException when column is null as string type.

2017-08-09 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119891#comment-16119891 ] Hyukjin Kwon commented on SPARK-21677: -- cc [~viirya], I remember your mentee was checking through

[jira] [Created] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
Taran Saini created SPARK-21678: --- Summary: Disabling quotes while writing a dataframe Key: SPARK-21678 URL: https://issues.apache.org/jira/browse/SPARK-21678 Project: Spark Issue Type: Bug

[jira] [Created] (SPARK-21677) json_tuple throws NullPointException when column is null as string type.

2017-08-09 Thread Hyukjin Kwon (JIRA)
Hyukjin Kwon created SPARK-21677: Summary: json_tuple throws NullPointException when column is null as string type. Key: SPARK-21677 URL: https://issues.apache.org/jira/browse/SPARK-21677 Project:

[jira] [Commented] (SPARK-21677) json_tuple throws NullPointException when column is null as string type.

2017-08-09 Thread Liang-Chi Hsieh (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119928#comment-16119928 ] Liang-Chi Hsieh commented on SPARK-21677: - [~hyukjin.kwon] Thanks! Definitely we are interested

[jira] [Created] (SPARK-21679) KMeans Clustering is Not Deterministic

2017-08-09 Thread JIRA
Christoph Brücke created SPARK-21679: Summary: KMeans Clustering is Not Deterministic Key: SPARK-21679 URL: https://issues.apache.org/jira/browse/SPARK-21679 Project: Spark Issue Type:

[jira] [Resolved] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21678. --- Resolution: Fixed [~taransaini43] I read this and can tell you this is not what JIRA is for. I'm a

[jira] [Resolved] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen resolved SPARK-21678. --- Resolution: Invalid > Disabling quotes while writing a dataframe >

[jira] [Reopened] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen reopened SPARK-21678: --- > Disabling quotes while writing a dataframe > -- > >

[jira] [Closed] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Sean Owen (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean Owen closed SPARK-21678. - > Disabling quotes while writing a dataframe > -- > >

[jira] [Commented] (SPARK-21673) Spark local directory is not set correctly

2017-08-09 Thread Jake Charland (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119874#comment-16119874 ] Jake Charland commented on SPARK-21673: --- https://github.com/apache/spark/pull/18894 > Spark local

[jira] [Updated] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taran Saini updated SPARK-21678: Description: Hi, I have the my dataframe cloumn values which can contain commas, double quotes

[jira] [Commented] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Takeshi Yamamuro (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16120037#comment-16120037 ] Takeshi Yamamuro commented on SPARK-21678: -- I think, if spark sets

[jira] [Comment Edited] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119898#comment-16119898 ] Taran Saini edited comment on SPARK-21678 at 8/9/17 1:30 PM: - this is not a

[jira] [Commented] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119898#comment-16119898 ] Taran Saini commented on SPARK-21678: - this is not a question. This is a bug! Only if somebody reads

[jira] [Commented] (SPARK-21453) Cached Kafka consumer may be closed too early

2017-08-09 Thread Pablo Panero (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21453?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119988#comment-16119988 ] Pablo Panero commented on SPARK-21453: -- [~zsxwing] Concerning the cached consumer failure all I

[jira] [Updated] (SPARK-21678) Disabling quotes while writing a dataframe

2017-08-09 Thread Taran Saini (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Taran Saini updated SPARK-21678: Description: Hi, I have the my dataframe cloumn values which can contain commas, double quotes

[jira] [Comment Edited] (SPARK-21677) json_tuple throws NullPointException when column is null as string type.

2017-08-09 Thread Hyukjin Kwon (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-21677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16119891#comment-16119891 ] Hyukjin Kwon edited comment on SPARK-21677 at 8/9/17 1:26 PM: -- cc [~viirya],

[jira] [Updated] (SPARK-21679) KMeans Clustering is Not Deterministic

2017-08-09 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Brücke updated SPARK-21679: - Description: I’m trying to figure out how to use KMeans in order to achieve reproducible

[jira] [Updated] (SPARK-21679) KMeans Clustering is Not Deterministic

2017-08-09 Thread JIRA
[ https://issues.apache.org/jira/browse/SPARK-21679?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Christoph Brücke updated SPARK-21679: - Description: I’m trying to figure out how to use KMeans in order to achieve reproducible