[GitHub] spark issue #20372: Improved block merging logic for partitions

2018-01-26 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20372 please see https://spark.apache.org/contributing.html open a JIRA and update this PR? --- - To unsubscribe, e-mail:

[GitHub] spark pull request #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkS...

2018-01-26 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/20404#discussion_r164266681 --- Diff: python/pyspark/sql/session.py --- @@ -225,6 +225,7 @@ def __init__(self, sparkContext, jsparkSession=None): if

[GitHub] spark issue #20383: [SPARK-23200] Reset Kubernetes-specific config on Checkp...

2018-01-26 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/20383 without this fix streaming app can't properly recovery from checkpoint, correct? that seems fairly important to me. was apache-spark-on-k8s/spark-integration done before this PR was

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20403 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20409 About test case, can't we just use the reproducer in the PR description to check it we change the deterministic status of udf? ---

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-26 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20409 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20413: [SPARK-23245][SS][TESTS] Don't access `lastExecut...

2018-01-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20413 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20413: [SPARK-23245][SS][TESTS] Don't access `lastExecution.exe...

2018-01-26 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20413 Thanks! Merging to master and 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #20415: [SPARK-23247][SQL]combines Unsafe operations and ...

2018-01-26 Thread heary-cao
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/20415 [SPARK-23247][SQL]combines Unsafe operations and statistics operations in Scan Data Source ## What changes were proposed in this pull request? Currently, we scan the execution plan of

[GitHub] spark issue #20413: [SPARK-23245][SS][TESTS] Don't access `lastExecution.exe...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20413 **[Test build #4079 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4079/testReport)** for PR 20413 at commit

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2018-01-26 Thread ConeyLiu
Github user ConeyLiu commented on the issue: https://github.com/apache/spark/pull/19285 thanks all. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164261850 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark issue #20401: [MINOR][SS][DOC] Fix `Trigger` Scala/Java doc examples

2018-01-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20401 Thank you for merging this. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164261725 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164261678 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164261639 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread huaxingao
Github user huaxingao commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164261413 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20409 test this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20409 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20403 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164260764 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20369 LGTM, pending jenkins --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164260619 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #20367: [SPARK-23166][ML] Add maxDF Parameter to CountVec...

2018-01-26 Thread ymazari
Github user ymazari commented on a diff in the pull request: https://github.com/apache/spark/pull/20367#discussion_r164260027 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/CountVectorizer.scala --- @@ -160,6 +187,11 @@ class CountVectorizer @Since("1.5.0")

[GitHub] spark issue #20369: [SPARK-23196] Unify continuous and microbatch V2 sinks

2018-01-26 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20369 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20407 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20407 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86722/ Test PASSed. ---

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20407 **[Test build #86722 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86722/testReport)** for PR 20407 at commit

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86725/ Test PASSed. ---

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #86725 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86725/testReport)** for PR 20208 at commit

[GitHub] spark issue #20372: Improved block merging logic for partitions

2018-01-26 Thread vgankidi
Github user vgankidi commented on the issue: https://github.com/apache/spark/pull/20372 I agree with @ash211. Applications shouldn't rely on the order of the files within a partition. This optimization looks good to me. ---

[GitHub] spark issue #20413: [SPARK-23245][SS][TESTS] Don't access `lastExecution.exe...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20413 **[Test build #4079 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4079/testReport)** for PR 20413 at commit

[GitHub] spark pull request #20401: [MINOR][SS][DOC] Fix `Trigger` Scala/Java doc exa...

2018-01-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20401 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20401: [MINOR][SS][DOC] Fix `Trigger` Scala/Java doc examples

2018-01-26 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/20401 Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20413: [SPARK-23245][SS][TESTS] Don't access `lastExecution.exe...

2018-01-26 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/20413 +1. I originally wrote this line, and I'm reasonably confident that (as indicated by the comment) I didn't intend to check anything other than the nullity of lastExecution. I continue to be

[GitHub] spark pull request #20394: [SPARK-23214][SQL] cached data should not carry e...

2018-01-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20394 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20413: [SPARK-23245][SS][TESTS] Don't access `lastExecution.exe...

2018-01-26 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20413 LGTM. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20394: [SPARK-23214][SQL] cached data should not carry extra hi...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20394 LGTM Thanks! Merged to master/2.3 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #20394: [SPARK-23214][SQL] cached data should not carry e...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/20394#discussion_r164254321 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/columnar/InMemoryRelation.scala --- @@ -73,11 +73,16 @@ case class InMemoryRelation(

[GitHub] spark pull request #20414: [SPARK-23243][SQL] Shuffle+Repartition on an RDD ...

2018-01-26 Thread jiangxb1987
GitHub user jiangxb1987 opened a pull request: https://github.com/apache/spark/pull/20414 [SPARK-23243][SQL] Shuffle+Repartition on an RDD could lead to incorrect answers ## What changes were proposed in this pull request? The RDD repartition also uses the round-robin way

[GitHub] spark pull request #20404: [SPARK-23228][PYSPARK] Add Python Created jsparkS...

2018-01-26 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/20404#discussion_r164253931 --- Diff: python/pyspark/sql/session.py --- @@ -225,6 +225,7 @@ def __init__(self, sparkContext, jsparkSession=None): if

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataFrame co...

2018-01-26 Thread shivaram
Github user shivaram commented on the issue: https://github.com/apache/spark/pull/20393 I'm fine with merging this -- I just dont want to this issue to be forgotten for RDDs as I think its a major correctness issue. @mridulm @sameeragarwal Lets continue the discussion on

[GitHub] spark pull request #20413: [SC-9624][SS][TESTS] Don't access `lastExecution....

2018-01-26 Thread zsxwing
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/20413 [SC-9624][SS][TESTS] Don't access `lastExecution.executedPlan` in StreamTest ## What changes were proposed in this pull request? `lastExecution.executedPlan` is lazy val so accessing it in

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164252270 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark issue #20413: [SC-9624][SS][TESTS] Don't access `lastExecution.execute...

2018-01-26 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20413 cc @jose-torres @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164252424 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164251821 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164251759 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaS...

2018-01-26 Thread zsxwing
Github user zsxwing closed the pull request at: https://github.com/apache/spark/pull/20412 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20412 Thanks! Merging to master and 2.3. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20412 **[Test build #4078 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4078/testReport)** for PR 20412 at commit

[GitHub] spark pull request #20405: [SPARK-23229][SQL] Dataset.hint should use planWi...

2018-01-26 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/20405#discussion_r164250206 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala --- @@ -1216,7 +1216,7 @@ class Dataset[T] private[sql]( */

[GitHub] spark pull request #19575: [SPARK-22221][DOCS] Adding User Documentation for...

2018-01-26 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19575#discussion_r164250138 --- Diff: docs/sql-programming-guide.md --- @@ -1640,6 +1640,133 @@ Configuration of Hive is done by placing your `hive-site.xml`, `core-site.xml` a

[GitHub] spark pull request #20395: [SPARK-23218][SQL] simplify ColumnVector.getArray

2018-01-26 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20395#discussion_r164249345 --- Diff: sql/core/src/main/java/org/apache/spark/sql/vectorized/ArrowColumnVector.java --- @@ -450,13 +439,11 @@ final boolean isNullAt(int rowId) {

[GitHub] spark pull request #20397: [SPARK-23219][SQL]Rename ReadTask to DataReaderFa...

2018-01-26 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20397#discussion_r164248501 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsScanColumnarBatch.java --- @@ -30,21 +30,21 @@

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164248098 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20412 **[Test build #4078 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/4078/testReport)** for PR 20412 at commit

[GitHub] spark pull request #20400: [SPARK-23084][PYTHON]Add unboundedPreceding(), un...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20400#discussion_r164245801 --- Diff: python/pyspark/sql/window.py --- @@ -124,16 +124,19 @@ def rangeBetween(start, end): values directly. :param

[GitHub] spark pull request #20410: [SPARK-23234][ML][PYSPARK] Remove setting default...

2018-01-26 Thread BryanCutler
Github user BryanCutler commented on a diff in the pull request: https://github.com/apache/spark/pull/20410#discussion_r164243811 --- Diff: python/pyspark/ml/wrapper.py --- @@ -118,10 +118,9 @@ def _transfer_params_to_java(self): """ Transforms the

[GitHub] spark issue #20409: [SPARK-23233][PYTHON] Reset the cache in asNondeterminis...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20409 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20398: [SPARK-23221][SS][TEST] Fix KafkaContinuousSourceStressF...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20398 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20398: [SPARK-23221][SS][TEST] Fix KafkaContinuousSourceStressF...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20398 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86723/ Test PASSed. ---

[GitHub] spark pull request #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataF...

2018-01-26 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/20393 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #20398: [SPARK-23221][SS][TEST] Fix KafkaContinuousSourceStressF...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20398 **[Test build #86723 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86723/testReport)** for PR 20398 at commit

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataFrame co...

2018-01-26 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20393 I opened https://issues.apache.org/jira/browse/SPARK-23243 to track the RDD.repartition() patch, thanks for all the discussions! @shivaram @mridulm @sameeragarwal @gatorsmile ---

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataFrame co...

2018-01-26 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20393 LGTM but we should get a broader consensus on this. In the meantime, I'm merging this patch to master/2.3. --- - To

[GitHub] spark pull request #20409: [SPARK-23233][PYTHON] Reset the cache in asNondet...

2018-01-26 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/20409#discussion_r164242518 --- Diff: python/pyspark/sql/udf.py --- @@ -188,6 +188,9 @@ def asNondeterministic(self): .. versionadded:: 2.3 """

[GitHub] spark issue #20408: [SPARK-23189][Core][Web UI] Reflect stage level blacklis...

2018-01-26 Thread ajbozarth
Github user ajbozarth commented on the issue: https://github.com/apache/spark/pull/20408 I'll take a look at the code when I have a moment, but from a UI perspective only have one issue. Having the status of `Blacklisted in Stages: [...]` when an exec is Active could be easily

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataFrame co...

2018-01-26 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/20393 Updated the title, does it sound good to have this PR? I'll open another one to address the RDD.repartition() issue (which will target to 2.4). ---

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on a DataFrame co...

2018-01-26 Thread sameeragarwal
Github user sameeragarwal commented on the issue: https://github.com/apache/spark/pull/20393 Another (possibly cleaner) approach here would be to make the shuffle block fetch order deterministic but I agree that it might not be safe to include it in 2.3 this late. ---

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/20412 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20412 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/86724/ Test FAILed. ---

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20412 **[Test build #86724 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86724/testReport)** for PR 20412 at commit

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20412 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20396: [SPARK-23217][ML] Add cosine distance measure to Cluster...

2018-01-26 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20396 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20410: [SPARK-23234][ML][PYSPARK] Remove setting defaults on Ja...

2018-01-26 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/20410 I think that the problem is not SPARK-22797. The problem is that before this PR, the Python API considers as Defined but not Set all the parameters with a default value, while the Scala/Java class

[GitHub] spark pull request #20332: [SPARK-23138][ML][DOC] Multiclass logistic regres...

2018-01-26 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/20332#discussion_r164237753 --- Diff: docs/ml-classification-regression.md --- @@ -111,10 +110,9 @@ Continuing the earlier example:

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20403 **[Test build #86726 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86726/testReport)** for PR 20403 at commit

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20403 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/302/

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20403 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Hi, @rxin , @cloud-fan , @sameeragarwal , @HyukjinKwon . Could you give me some opinions about this PR? I know that Xiao Li is busy for this period, so I didn't ping hime. For me,

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20403 this has been failing due to #19892 which was recently reverted --- - To unsubscribe, e-mail:

[GitHub] spark issue #20403: [SPARK-23238][PYTHON] Externalize SQLConf spark.sql.exec...

2018-01-26 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/20403 jenkins retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20412 **[Test build #86724 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86724/testReport)** for PR 20412 at commit

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20208 **[Test build #86725 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86725/testReport)** for PR 20208 at commit

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/301/

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20407 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/300/

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20412 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20407 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20412 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/299/

[GitHub] spark issue #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSu...

2018-01-26 Thread tdas
Github user tdas commented on the issue: https://github.com/apache/spark/pull/20412 LGTM. My bad, I should caught this in the original PR that @jose-torres made. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20208: [SPARK-23007][SQL][TEST] Add schema evolution test suite...

2018-01-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20208 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20407 **[Test build #86722 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86722/testReport)** for PR 20407 at commit

[GitHub] spark issue #20398: [SPARK-23221][SS][TEST] Fix KafkaContinuousSourceStressF...

2018-01-26 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20398 **[Test build #86723 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/86723/testReport)** for PR 20398 at commit

[GitHub] spark pull request #20412: [SPARK-23242][SS][Tests]Don't run tests in KafkaS...

2018-01-26 Thread zsxwing
GitHub user zsxwing opened a pull request: https://github.com/apache/spark/pull/20412 [SPARK-23242][SS][Tests]Don't run tests in KafkaSourceSuiteBase twice ## What changes were proposed in this pull request? KafkaSourceSuiteBase should be abstract class, otherwise

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20407 Retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20407: [SPARK-23124][SQL] Allow to disable BroadcastNestedLoopJ...

2018-01-26 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/20407 SPARK-23234 is reverted now. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #20393: [SPARK-23207][SQL] Shuffle+Repartition on an RDD/DataFra...

2018-01-26 Thread mridulm
Github user mridulm commented on the issue: https://github.com/apache/spark/pull/20393 @jiangxb1987 Btw, we could argue this is a correctness issue since we added repartition - so not necessarily blocker :-) --- -

  1   2   3   4   5   >