[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20915 **[Test build #91501 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91501/testReport)** for PR 20915 at commit

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20915 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91501/ Test FAILed. ---

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20915 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20915 **[Test build #91500 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91500/testReport)** for PR 20915 at commit

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20915 **[Test build #91501 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91501/testReport)** for PR 20915 at commit

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21498 When the condition is satisfied and we know children of Union have same partitioning, this goes to let the first partition of union result includes first partitions of children RDDs, and 2nd, 3rd

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread sabanas
Github user sabanas commented on the issue: https://github.com/apache/spark/pull/20915 @cloud-fan thanks for the feedback, I fixed accordingly. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread icexelloss
Github user icexelloss commented on the issue: https://github.com/apache/spark/pull/21082 @HyukjinKwon Thanks for the review! I will address the comments shortly. And yes, I will work on bounded windows on top of this PR. ---

[GitHub] spark issue #21449: [SPARK-24385][SQL] Resolve self-join condition ambiguity...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21449 @daniel-shields do you want to open a PR for that? I'll leave this PR open as it is a more general fix so we can go on with the long-term discussion here in this PR. Do you agree with this

[GitHub] spark pull request #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21497#discussion_r193374662 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -256,6 +246,66 @@ class

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91499/ Test PASSed. ---

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21498 > In aggregation we are replacing a needed shuffle with gathering only the needed rows from the other partitions. I don't know what this means actually. If we decided we don't need a

[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

2018-06-06 Thread DazhuangSu
Github user DazhuangSu commented on a diff in the pull request: https://github.com/apache/spark/pull/19691#discussion_r193356194 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -510,40 +511,86 @@ case class

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-06 Thread TomaszGaweda
Github user TomaszGaweda commented on the issue: https://github.com/apache/spark/pull/21500 @HeartSaVioR IMHO we should consider new state provider such as RocksDB, like Flink and Databricks Delta did. It is not a direct fix, but will improve latency and memory consumption, maybe

[GitHub] spark pull request #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21497#discussion_r193372564 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -256,6 +246,66 @@ class

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21498 @viirya I may be wrong, but I am not sure about the performance improvement brought by this. The goal here is to avoid a shuffle after the `union` operator (when it is followed by operators

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/21498 > Because they have same partitioning, for example, I suppose that first partitions of all RDDs are located at the same place? I really don't think so. In aggregation we are

[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19691#discussion_r193347767 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -510,40 +511,86 @@ case class AlterTableRenamePartitionCommand(

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21082 **[Test build #91499 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91499/testReport)** for PR 21082 at commit

[GitHub] spark pull request #19691: [SPARK-14922][SPARK-17732][SQL]ALTER TABLE DROP P...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19691#discussion_r193358172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala --- @@ -510,40 +511,86 @@ case class AlterTableRenamePartitionCommand(

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread cxzl25
Github user cxzl25 commented on the issue: https://github.com/apache/spark/pull/18900 **Modify the partition will lose createTime.** Reading the hive partitions ignores createTime when converting the CatalogTablePartition, it will also be lost when modifying partitions.

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21497 **[Test build #91502 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91502/testReport)** for PR 21497 at commit

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3822/

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21082 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21501 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-06 Thread dongjinleekr
GitHub user dongjinleekr opened a pull request: https://github.com/apache/spark/pull/21501 [SPARK-15064][ML] Locale support in StopWordsRemover ## What changes were proposed in this pull request? Add locale support for `StopWordsRemover`. ## How was this patch

[GitHub] spark issue #21501: [SPARK-15064][ML] Locale support in StopWordsRemover

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21501 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r193320743 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/PythonUDF.scala --- @@ -34,7 +34,12 @@ object PythonUDF {

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r193323414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -424,6 +424,21 @@ abstract class SparkStrategies extends

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r193323738 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/windowExpressions.scala --- @@ -297,6 +297,37 @@ trait WindowFunction

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21497 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91498/ Test PASSed. ---

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21497 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21497 **[Test build #91498 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91498/testReport)** for PR 21497 at commit

[GitHub] spark issue #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured Stream...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21477 Seems fine so far to me otherwise --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21498 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21467 @HyukjinKwon Yeah, that is also what I thought at the first glance. :) --- - To unsubscribe, e-mail:

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r193306421 --- Diff: python/pyspark/tests.py --- @@ -1291,27 +1291,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect()

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21467 So .. @e-dorigatti, mind if I ask to elaborate and describe the current approach within `fail_on_stopiteration`? Seems we will handle UDFs in worker side and RDD APIs (which takes a function

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21082 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21082: [SPARK-22239][SQL][Python] Enable grouped aggregate pand...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21082 **[Test build #91499 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91499/testReport)** for PR 21082 at commit

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193299622 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/sources/ForeachWriterProvider.scala --- @@ -44,40 +51,61 @@ case class

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r193302276 --- Diff: python/pyspark/sql/tests.py --- @@ -4096,6 +4080,43 @@ def foo(df): def foo(k, v, w): return k

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21498 **[Test build #91497 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91497/testReport)** for PR 21498 at commit

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21498 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91497/ Test PASSed. ---

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21498 @cloud-fan This is removed of WIP and can be review now. Please take a look when you are available, as supposed that you'll be busy in this week. Thanks. ---

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193304316 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala --- @@ -20,10 +20,48 @@ package org.apache.spark.sql import

[GitHub] spark issue #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop iteration ...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21467 Hm, I thought we could do almost all of it in worker side but I just took a closer look and just understood https://github.com/apache/spark/pull/21467#issuecomment-393907314. ---

[GitHub] spark pull request #21467: [SPARK-23754][PYTHON][FOLLOWUP] Move UDF stop ite...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21467#discussion_r193307267 --- Diff: python/pyspark/tests.py --- @@ -1291,27 +1291,31 @@ def test_pipe_unicode(self): result = rdd.pipe('cat').collect()

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r193326454 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/planning/patterns.scala --- @@ -268,3 +269,40 @@ object PhysicalAggregation {

[GitHub] spark pull request #21082: [SPARK-22239][SQL][Python] Enable grouped aggrega...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21082#discussion_r193327794 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/WindowInPandasExec.scala --- @@ -0,0 +1,173 @@ +/* + * Licensed to the

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21469 @jose-torres is it good to go? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r193332595 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/StopWordsRemover.scala --- @@ -84,7 +86,36 @@ class StopWordsRemover @Since("1.5.0")

[GitHub] spark pull request #21499: [SPARK-24468][SQL] Handle negative scale when adj...

2018-06-06 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/21499#discussion_r193341970 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -161,13 +161,17 @@ object DecimalType extends AbstractDataType {

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91509 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91509/testReport)** for PR 21469 at commit

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21469 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91509/ Test PASSed. ---

[GitHub] spark pull request #21499: [SPARK-24468][SQL] Handle negative scale when adj...

2018-06-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/21499#discussion_r193618762 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/types/DecimalType.scala --- @@ -161,13 +161,17 @@ object DecimalType extends AbstractDataType

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91512/ Test FAILed. ---

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18900 **[Test build #91512 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91512/testReport)** for PR 18900 at commit

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18900 **[Test build #91512 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91512/testReport)** for PR 18900 at commit

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18900 **[Test build #91513 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91513/testReport)** for PR 18900 at commit

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91513/ Test FAILed. ---

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18900 **[Test build #91513 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91513/testReport)** for PR 18900 at commit

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r193622940 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala --- @@ -231,7 +231,7 @@ class

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 @TomaszGaweda @aalobaidi Please correct me if I'm missing here. From every start of batch, state store loads previous version of state so that it can be read and written. If we

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21502 **[Test build #91504 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91504/testReport)** for PR 21502 at commit

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21502 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91504/ Test FAILed. ---

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21502 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21494: [WIP][SPARK-24375][Prototype] Support barrier scheduling

2018-06-06 Thread Ngone51
Github user Ngone51 commented on the issue: https://github.com/apache/spark/pull/21494 Hi, @jiangxb1987 , can you explain more for what is `barrier scheduling` in spark and elaborate an example which would only works with `barrier scheduling`( but could not work under current spark

[GitHub] spark issue #21504: SPARK-24480: Added config for registering streamingQuery...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21504 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #21504: SPARK-24479: Added config for registering streami...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r193603810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -55,6 +56,12 @@ class StreamingQueryManager

[GitHub] spark pull request #21501: [SPARK-15064][ML] Locale support in StopWordsRemo...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21501#discussion_r193604620 --- Diff: python/pyspark/ml/feature.py --- @@ -2582,25 +2582,27 @@ class StopWordsRemover(JavaTransformer, HasInputCol, HasOutputCol, JavaMLReadabl

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18900 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21498: [SPARK-24410][SQL][Core] Optimization for Union outputPa...

2018-06-06 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/21498 I set up a Spark cluster with 5 nodes on EC2. ```scala def benchmark(func: () => Unit): Unit = { val t0 = System.nanoTime() func() val t1 = System.nanoTime()

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18900 **[Test build #91511 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91511/testReport)** for PR 18900 at commit

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91511/ Test FAILed. ---

[GitHub] spark issue #21504: SPARK-24479: Added config for registering streamingQuery...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/21504 Mind fixing the PR title to `[SPARK-24479][SS] Added config for registering streamingQueryListeners`? --- - To unsubscribe,

[GitHub] spark pull request #21504: SPARK-24479: Added config for registering streami...

2018-06-06 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r193604356 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenersConfSuite.scala --- @@ -0,0 +1,66 @@ +/* + * Licensed

[GitHub] spark pull request #18900: [SPARK-21687][SQL] Spark SQL should set createTim...

2018-06-06 Thread debugger87
GitHub user debugger87 reopened a pull request: https://github.com/apache/spark/pull/18900 [SPARK-21687][SQL] Spark SQL should set createTime for Hive partition ## What changes were proposed in this pull request? Set createTime for every hive partition created in Spark SQL,

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread debugger87
Github user debugger87 commented on the issue: https://github.com/apache/spark/pull/18900 @cxzl25 OK, reopen it --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18900 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18900: [SPARK-21687][SQL] Spark SQL should set createTime for H...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18900 **[Test build #91511 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91511/testReport)** for PR 18900 at commit

[GitHub] spark pull request #21498: [SPARK-24410][SQL][Core] Optimization for Union o...

2018-06-06 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21498#discussion_r193618338 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -1099,6 +1099,17 @@ object SQLConf { .intConf

[GitHub] spark pull request #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20915#discussion_r193420453 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/sources/BucketedReadSuite.scala --- @@ -90,32 +96,37 @@ abstract class BucketedReadSuite extends

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20915 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20915 **[Test build #91500 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91500/testReport)** for PR 20915 at commit

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20915 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/91500/ Test PASSed. ---

[GitHub] spark pull request #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20915 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread arunmahadevan
Github user arunmahadevan commented on the issue: https://github.com/apache/spark/pull/21469 Nice, LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21502 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/3824/

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/21502 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21469 **[Test build #91503 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91503/testReport)** for PR 21469 at commit

[GitHub] spark issue #20915: [SPARK-23803][SQL] Support bucket pruning

2018-06-06 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20915 thanks, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/21502 **[Test build #91504 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/91504/testReport)** for PR 21502 at commit

[GitHub] spark pull request #21502: [SPARK-22575][SQL] Add destroy to Dataset

2018-06-06 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/21502 [SPARK-22575][SQL] Add destroy to Dataset ## What changes were proposed in this pull request? In the Dataset API we may acquire resources which we cannot deallocate. This happens for

  1   2   >