[GitHub] [spark] huaxingao commented on pull request #29119: Update RandomForestClassifierExample.scala

2020-07-14 Thread GitBox


huaxingao commented on pull request #29119:
URL: https://github.com/apache/spark/pull/29119#issuecomment-658582560


   @kevinyu1949 Thanks for submitting a PR. Actually we intentionally changed 
```labelIndexer.labels``` to ```labelIndexer.labelsArray(0)``` because 
```StringIndexerModel.labels``` is deprecated and will be removed in future 
release. 
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox


HyukjinKwon commented on pull request #29077:
URL: https://github.com/apache/spark/pull/29077#issuecomment-658581488


   @HeartSaVioR, no big deal but let's make sure to mention which branch this 
PR went through as a comment.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon edited a comment on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox


HyukjinKwon edited a comment on pull request #29077:
URL: https://github.com/apache/spark/pull/29077#issuecomment-658581488


   @HeartSaVioR, no big deal but let's make sure to leave a comment to mention 
which branch this PR went through.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


adjordan edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658579068


   Yes, I know the difference between the two. I just assumed that 
`MLUtils.kFold` was doing the splits according to the k-fold method, given then 
name, and not the random sub-sampling method. But I suppose changing the name 
of that method is outside the scope of what I'm trying to add.
   
   In that case, it seems that I should add an addition `method` parameter 
where you can select k-fold or random sub-sampling. If I end up doing that, 
should I continue with this PR or open a new one?
   
   Thoughts @viirya?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


adjordan commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658579068


   Yes, I know the difference between the two. I just assumed that 
`MLUtils.kFold` was doing the splits according to the k-fold method, not the 
random sub-sampling method. But I suppose changing the name of that method is 
outside the scope of what I'm trying to add.
   
   In that case, it seems that I should add an addition `method` parameter 
where you can select k-fold or random sub-sampling. If I end up doing that, 
should I continue with this PR or open a new one?
   
   Thoughts @viirya?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r454827741



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2645,21 +2645,22 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED =
-buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.enabled")
+  val COALESCE_BUCKETS_IN_JOIN_ENABLED =
+buildConf("spark.sql.bucketing.coalesceBucketsInJoin.enabled")
   .doc("When true, if two bucketed tables with the different number of 
buckets are joined, " +
 "the side with a bigger number of buckets will be coalesced to have 
the same number " +
-"of buckets as the other side. Bucket coalescing is applied only to 
sort-merge joins " +
-"and only when the bigger number of buckets is divisible by the 
smaller number of buckets.")
+"of buckets as the other side. Bigger number of buckets is divisible 
by the smaller " +
+"number of buckets. Bucket coalescing is applied to sort-merge joins 
and " +
+"shuffled hash join.")
   .version("3.1.0")
   .booleanConf
   .createWithDefault(false)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO =
-
buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.maxBucketRatio")
+  val COALESCE_BUCKETS_IN_JOIN_MAX_BUCKET_RATIO =
+buildConf("spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio")

Review comment:
   Also, I think we need to describe the risk in `.doc`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r454827741



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2645,21 +2645,22 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED =
-buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.enabled")
+  val COALESCE_BUCKETS_IN_JOIN_ENABLED =
+buildConf("spark.sql.bucketing.coalesceBucketsInJoin.enabled")
   .doc("When true, if two bucketed tables with the different number of 
buckets are joined, " +
 "the side with a bigger number of buckets will be coalesced to have 
the same number " +
-"of buckets as the other side. Bucket coalescing is applied only to 
sort-merge joins " +
-"and only when the bigger number of buckets is divisible by the 
smaller number of buckets.")
+"of buckets as the other side. Bigger number of buckets is divisible 
by the smaller " +
+"number of buckets. Bucket coalescing is applied to sort-merge joins 
and " +
+"shuffled hash join.")
   .version("3.1.0")
   .booleanConf
   .createWithDefault(false)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO =
-
buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.maxBucketRatio")
+  val COALESCE_BUCKETS_IN_JOIN_MAX_BUCKET_RATIO =
+buildConf("spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio")

Review comment:
   Also, I think we need to describe the risk clearly in `.doc`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] Fokko commented on pull request #29109: [SPARK-32311][PYSPARK][TESTS] Remove duplicate import

2020-07-14 Thread GitBox


Fokko commented on pull request #29109:
URL: https://github.com/apache/spark/pull/29109#issuecomment-658578504


   These PR's are a bit small indeed, but there are a few coming up that are 
much bigger. I would like to split them a bit to make it easier to digest for 
the reviewers/committers.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29079: [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29079:
URL: https://github.com/apache/spark/pull/29079#discussion_r454827484



##
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##
@@ -2645,21 +2645,22 @@ object SQLConf {
   .booleanConf
   .createWithDefault(true)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_ENABLED =
-buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.enabled")
+  val COALESCE_BUCKETS_IN_JOIN_ENABLED =
+buildConf("spark.sql.bucketing.coalesceBucketsInJoin.enabled")
   .doc("When true, if two bucketed tables with the different number of 
buckets are joined, " +
 "the side with a bigger number of buckets will be coalesced to have 
the same number " +
-"of buckets as the other side. Bucket coalescing is applied only to 
sort-merge joins " +
-"and only when the bigger number of buckets is divisible by the 
smaller number of buckets.")
+"of buckets as the other side. Bigger number of buckets is divisible 
by the smaller " +
+"number of buckets. Bucket coalescing is applied to sort-merge joins 
and " +
+"shuffled hash join.")
   .version("3.1.0")
   .booleanConf
   .createWithDefault(false)
 
-  val COALESCE_BUCKETS_IN_SORT_MERGE_JOIN_MAX_BUCKET_RATIO =
-
buildConf("spark.sql.bucketing.coalesceBucketsInSortMergeJoin.maxBucketRatio")
+  val COALESCE_BUCKETS_IN_JOIN_MAX_BUCKET_RATIO =
+buildConf("spark.sql.bucketing.coalesceBucketsInJoin.maxBucketRatio")

Review comment:
   Is it okay to share this parameter between sort-merge/hash joins? As 
@viirya suggested, we have some risk of OOM. So, I think we need a different 
threshold policy for the hash-join case.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-14 Thread GitBox


HyukjinKwon commented on pull request #29117:
URL: https://github.com/apache/spark/pull/29117#issuecomment-658574515


   retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29118:
URL: https://github.com/apache/spark/pull/29118#discussion_r454819744



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318 should not remove orderBy in distribute statement") {

Review comment:
   Yea, I know you just forgot it





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] yaooqinn commented on pull request #29064: [SPARK-32272][SQL] Add SQL standard command SET TIME ZONE

2020-07-14 Thread GitBox


yaooqinn commented on pull request #29064:
URL: https://github.com/apache/spark/pull/29064#issuecomment-658569092


   cc @maropu @cloud-fan @huaxingao. Please check the reference  doc for set tz 
command.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] kevinyu1949 opened a new pull request #29119: Update RandomForestClassifierExample.scala

2020-07-14 Thread GitBox


kevinyu1949 opened a new pull request #29119:
URL: https://github.com/apache/spark/pull/29119


   Refine wrong code.
   
   
   
   ### What changes were proposed in this pull request?
   
   
   
   ### Why are the changes needed?
   
   
   
   ### Does this PR introduce _any_ user-facing change?
   
   
   
   ### How was this patch tested?
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


viirya commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658567012


   okay, sounds good.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525


   Actually, the file size check test cases are very ~flaky~ fragile. We hit 
many issues before when we added `Spark Version` metadata on Parquet/ORC/Avro.
   > Do you think it is easy to add a test that checks file size like in the 
description? Or current one is enough?
   
   I believe this one is enough because file generations cost us 
write/read/full execution time in Jenkins and GitHub~



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525


   Actually, the file size check test cases are very flaky. We hit many issues 
before when we add `Spark Version` metadata on Parquet/ORC/Avro.
   > Do you think it is easy to add a test that checks file size like in the 
description? Or current one is enough?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565525







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


viirya commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565157


   Do you think it is easy to add a test that checks file size like in the 
description? Or current one is enough?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454812360



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
##
@@ -2353,6 +2354,43 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.schema.last == StructField("col_mixed_types", StringType, 
true))
 }
   }
+
+  test("Support write BOM to file before writing data if encoded by UTF-8 
charset") {
+// scalastyle:off nonascii
+val chinese = "我爱中文"
+val korean = "나는 한국인을 좋아한다"
+val japanese = "私は日本人が好き"

Review comment:
   I guess Japanese is the same case @ueshin or @maropu?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658565064


   Thank you, @maropu and @viirya .
   Yes. The commit log and JIRA will explain the situation. I made the test 
case minimally.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


HyukjinKwon commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454812272



##
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
##
@@ -2353,6 +2354,43 @@ abstract class CSVSuite extends QueryTest with 
SharedSparkSession with TestCsvDa
   assert(df.schema.last == StructField("col_mixed_types", StringType, 
true))
 }
   }
+
+  test("Support write BOM to file before writing data if encoded by UTF-8 
charset") {
+// scalastyle:off nonascii
+val chinese = "我爱中文"
+val korean = "나는 한국인을 좋아한다"

Review comment:
   Oh, @wangyum BTW, do you mean "I like Korean" but Korean as a language? 
If that's the case, I think you should write like "나는 한국어를 좋아한다". The current 
one is more like I like Korean people.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #29118:
URL: https://github.com/apache/spark/pull/29118#discussion_r454811733



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318 should not remove orderBy in distribute statement") {

Review comment:
   Oh.. Right. I missed.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


viirya edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658563157


   Yeah, because the different data distribution, physical encoding of data 
could result in different size, that is what I meant.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


viirya commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658563157


   Yeah, because the different data distribution, physical encoding of data 
could result in different size.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29118:
URL: https://github.com/apache/spark/pull/29118#discussion_r454810395



##
File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/optimizer/EliminateSortsSuite.scala
##
@@ -284,6 +284,15 @@ class EliminateSortsSuite extends PlanTest {
 comparePlans(optimized, correctAnswer)
   }
 
+  test("SPARK-32318 should not remove orderBy in distribute statement") {

Review comment:
   super nit: in most cases, add `:` in the prefix, `SPARK-32318:`?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SaurabhChawla100 commented on pull request #29045: [SPARK-32234][SQL] Spark sql commands are failing on selecting the orc tables

2020-07-14 Thread GitBox


SaurabhChawla100 commented on pull request #29045:
URL: https://github.com/apache/spark/pull/29045#issuecomment-658561486


   Retest this please



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629


   Also, cc @cloud-fan , @HyukjinKwon , @maropu 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339


   Could you review this, @viirya ? This will protect us from the future 
regression. This part is tricky.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560629


   Also, cc @cloud-fan and @HyukjinKwon .



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29118:
URL: https://github.com/apache/spark/pull/29118#issuecomment-658560339


   Could you review this, @viirya ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706


   The most big factor is file formats instead of Spark side.
   For example, in the above example, ORC files are small because it supports a 
special encoding when the input data is sorted with a fixed increment. For 
Parquet files, the result will be different.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706


   No~ It depends on file formats instead of Spark side.
   For example, in the above example, ORC files are small because it supports a 
special encoding when the input data is sorted with a fixed increment. For 
Parquet files, the result will be different.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658559706


   No~ It depends on file formats instead of Spark side.
   For example, in the above example, ORC files are small because it supports a 
special encoding when the data is sorted with a fixed increment. For Parquet 
files, the result will be different.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658558813


   I made a PR to add a test coverage for the above case.
   - https://github.com/apache/spark/pull/29118



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


viirya commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658558946


   Oh, this is interesting. I know removing `Sort` before `Repartition` will 
result in different data distribution because `Repartition` uses 
`RoundRobinPartitioning`. Because I think repartition doesn't guarantee 
shuffled data distribution, so I thought it is okay.
   
   Now seems different data distribution causes difference storage output size. 
I think it is because to repartition sorted data using `RoundRobinPartitioning` 
can generate more compact output.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun opened a new pull request #29118: [SPARK-32318][SQL][TESTS] Add a test case to EliminateSortsSuite for ORDER BY in DISTRIBUTE BY

2020-07-14 Thread GitBox


dongjoon-hyun opened a new pull request #29118:
URL: https://github.com/apache/spark/pull/29118


   ### What changes were proposed in this pull request?
   
   This PR aims to add a test case to EliminateSortsSuite to protect a valid 
use case which is using ORDER BY in DISTRIBUTE BY statement.
   
   ### Why are the changes needed?
   
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")
   
   $ ls -al /tmp/master/
   total 56
   drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
   drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
   -rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   ```
   
   If we remove the inner `ORDER BY`, the file size increases.
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")
   
   $ ls -al /tmp/SPARK-32276/
   total 632
   drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
   drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
   -rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   ```
   
   ### Does this PR introduce _any_ user-facing change?
   
   No. This only improves the test coverage.
   
   ### How was this patch tested?
   
   Pass the GitHub Action or Jenkins.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


viirya commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814


   Do you read the above too links? The current approach is repeated random 
sub-sampling validation, this PR changes to k-fold cross-validation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


viirya edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658556814


   Do you read the above two links? The current approach is repeated random 
sub-sampling validation, this PR changes to k-fold cross-validation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox


SparkQA commented on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658555806


   **[Test build #125876 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)**
 for PR 27694 at commit 
[`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #27694: [SPARK-30946][SS] Serde entry via DataInputStream/DataOutputStream with LZ4 compression on FileStream(Source/Sink)Log

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #27694:
URL: https://github.com/apache/spark/pull/27694#issuecomment-658519508


   **[Test build #125876 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125876/testReport)**
 for PR 27694 at commit 
[`86131af`](https://github.com/apache/spark/commit/86131afcf995fee64a629a7a440f03df8cabdd48).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #28931:
URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220


   Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside 
`core` module. I'm sure that this is a meaningful step inside Spark. However, 
we didn't test anything on IPv6. Like what we did for JDK11, I expect lots of 
hurdle both inside and outside Spark.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #28931:
URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220


   Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside 
`core` module. I'm sure that this is a meaningful step inside Spark. However, 
we didn't test anything on IPv6. Like JDK11, I expects lots of hurdle both 
inside and outside Spark.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #28931: [SPARK-32103][CORE] Support IPv6 host/port in core module

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #28931:
URL: https://github.com/apache/spark/pull/28931#issuecomment-658553220


   Hi, @gatorsmile . Technically, this only handles `host/port` parsing inside 
`core` module only. I'm sure that this is a meaningful step inside Spark. 
However, we didn't test anything on IPv6. Like JDK11, I expects lots of hurdle 
both inside and outside Spark.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


adjordan edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236


   @viirya Sorry, can you explain? I don't see how it changes the technique, it 
just allows models from multiple folds to be run in parallel. `MLUtils.kFold` 
is doing k-fold cross validation, not repeated random sub-sampling validation, 
right?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658550248


   Very sorry, guys. Due to the above regression, I'll revert this commit 
urgently. We can rethink about this PR.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r454795948



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.TaskContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{CircularBuffer, RedirectThread}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be passed to the script.
+ * @param script the command that should be executed.
+ * @param output the attributes that are produced by the script.
+ */
+case class SparkScriptTransformationExec(
+input: Seq[Expression],
+script: String,
+output: Seq[Attribute],
+child: SparkPlan,
+ioschema: SparkScriptIOSchema)
+  extends BaseScriptTransformationExec {
+
+  override def processIterator(inputIterator: Iterator[InternalRow], 
hadoopConf: Configuration)
+  : Iterator[InternalRow] = {
+val cmd = List("/bin/bash", "-c", script)

Review comment:
   Seems like the implementation of `processIterator` is pretty similar to 
the Hive one. Could we share the code between them more?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658549984


   **AFTER SPARK-32276**
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/SPARK-32276")
   ```
   
   ```
   $ ls -al /tmp/SPARK-32276/
   total 632
   drwxr-xr-x  10 dongjoon  wheel 320 Jul 14 22:08 ./
   drwxrwxrwt  14 root  wheel 448 Jul 14 22:08 ../
   -rw-r--r--   1 dongjoon  wheel   8 Jul 14 22:08 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel  12 Jul 14 22:08 
.part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel1188 Jul 14 22:08 
.part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   0 Jul 14 22:08 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel 119 Jul 14 22:08 
part-0-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150735 Jul 14 22:08 
part-00043-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  150741 Jul 14 22:08 
part-00191-ba5049f9-b835-49b7-9fdb-bdd11b9891cb-c000.snappy.orc
   ```
   
   **BEFORE**
   ```
   scala> scala.util.Random.shuffle((1 to 10).map(x => (x % 2, 
x))).toDF("a", "b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b) distribute by 
a").write.orc("/tmp/master")
   ```
   
   ```
   $ ls -al /tmp/master/
   total 56
   drwxr-xr-x  10 dongjoon  wheel  320 Jul 14 22:12 ./
   drwxrwxrwt  15 root  wheel  480 Jul 14 22:12 ../
   -rw-r--r--   1 dongjoon  wheel8 Jul 14 22:12 ._SUCCESS.crc
   -rw-r--r--   1 dongjoon  wheel   12 Jul 14 22:12 
.part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel   16 Jul 14 22:12 
.part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc.crc
   -rw-r--r--   1 dongjoon  wheel0 Jul 14 22:12 _SUCCESS
   -rw-r--r--   1 dongjoon  wheel  119 Jul 14 22:12 
part-0-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  932 Jul 14 22:12 
part-00043-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   -rw-r--r--   1 dongjoon  wheel  939 Jul 14 22:12 
part-00191-2cd3a50e-eded-49a4-b7cf-94e3f090b8c1-c000.snappy.orc
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] maropu commented on a change in pull request #29085: [SPARK-32106][SQL]Implement SparkScriptTransformationExec in sql/core

2020-07-14 Thread GitBox


maropu commented on a change in pull request #29085:
URL: https://github.com/apache/spark/pull/29085#discussion_r454780673



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +90,60 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  def wrapper(data: String, dt: DataType): Any = {
+dt match {
+  case StringType => data
+  case ByteType => JavaUtils.stringToBytes(data)
+  case IntegerType => data.toInt
+  case ShortType => data.toShort
+  case LongType => data.toLong
+  case FloatType => data.toFloat
+  case DoubleType => data.toDouble
+  case dt: DecimalType => BigDecimal(data)
+  case DateType if conf.datetimeJava8ApiEnabled =>
+DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.daysToLocalDate).orNull
+  case DateType =>
+DateTimeUtils.stringToDate(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.toJavaDate).orNull
+  case TimestampType if conf.datetimeJava8ApiEnabled =>
+DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.microsToInstant).orNull
+  case TimestampType =>
+DateTimeUtils.stringToTimestamp(
+  UTF8String.fromString(data),
+  DateTimeUtils.getZoneId(conf.sessionLocalTimeZone))
+  .map(DateTimeUtils.toJavaTimestamp).orNull
+  case CalendarIntervalType => 
IntervalUtils.stringToInterval(UTF8String.fromString(data))
+  case dataType: DataType => data
+}
+  }
 }
 
-abstract class BaseScriptTransformationWriterThread(
-iter: Iterator[InternalRow],
-inputSchema: Seq[DataType],
-ioSchema: BaseScriptTransformIOSchema,
-outputStream: OutputStream,
-proc: Process,
-stderrBuffer: CircularBuffer,
-taskContext: TaskContext,
-conf: Configuration) extends Thread with Logging {
+abstract class BaseScriptTransformationWriterThread extends Thread with 
Logging {
+
+  def iter: Iterator[InternalRow]
+
+  def inputSchema: Seq[DataType]
+
+  def ioSchema: BaseScriptTransformIOSchema
+
+  def outputStream: OutputStream
+
+  def proc: Process
+
+  def stderrBuffer: CircularBuffer
+
+  def taskContext: TaskContext
+
+  def conf: Configuration

Review comment:
   nit: we don't need line breaks?
   ```
 def inputRowFormat: Seq[(String, String)]
 def outputRowFormat: Seq[(String, String)]
 def inputSerdeClass: Option[String]
 def outputSerdeClass: Option[String]
 def inputSerdeProps: Seq[(String, String)]
 def outputSerdeProps: Seq[(String, String)]
 def recordReaderClass: Option[String]
 def recordWriterClass: Option[String]
 def schemaLess: Boolean
   ```

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/BaseScriptTransformationExec.scala
##
@@ -87,17 +90,60 @@ trait BaseScriptTransformationExec extends UnaryExecNode {
   }
 }
   }
+
+  def wrapper(data: String, dt: DataType): Any = {

Review comment:
   `protected`

##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/SparkScriptTransformationExec.scala
##
@@ -0,0 +1,187 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution
+
+import java.io._
+import java.nio.charset.StandardCharsets
+
+import scala.collection.JavaConverters._
+import scala.util.control.NonFatal
+
+import org.apache.hadoop.conf.Configuration
+
+import org.apache.spark.TaskContext
+import org.apache.spark.sql.catalyst.{CatalystTypeConverters, InternalRow}
+import org.apache.spark.sql.catalyst.expressions._
+import org.apache.spark.sql.catalyst.plans.logical.ScriptInputOutputSchema
+import org.apache.spark.sql.types._
+import org.apache.spark.util.{CircularBuffer, RedirectThread}
+
+/**
+ * Transforms the input by forking and running the specified script.
+ *
+ * @param input the set of expression that should be p

[GitHub] [spark] adjordan edited a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


adjordan edited a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236


   @viirya Sorry, can you explain? I don't see how it changes the technique, it 
just allows models from multiple folds to be run in parallel.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] adjordan commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


adjordan commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658547236


   @viirya Sorry, can you explain? I don't see how it changes anything, it just 
allows models from multiple folds to be run in parallel.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox


srowen commented on a change in pull request #29111:
URL: https://github.com/apache/spark/pull/29111#discussion_r454792607



##
File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala
##
@@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage 
{
* @return fitted models, matching the input parameter maps
*/
   @Since("2.0.0")
-  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M] = {
+  def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[M] = {

Review comment:
   Yeah, this fixes the weird compile error (Arrays + generic types are 
stricter in Scala 2.13) though I don't directly see what it has to do with type 
M. Still, this is an API change I think MiMa will fail and I think I need 
another workaround for _that_. This is an obscure method that isn't even called 
by tests, AFAICT, so not sure it even has coverage. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] srowen commented on pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox


srowen commented on pull request #29111:
URL: https://github.com/apache/spark/pull/29111#issuecomment-658546568


   I think I understand the last test failures, will fix too.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] MaxGekk commented on pull request #27366: [SPARK-30648][SQL] Support filters pushdown in JSON datasource

2020-07-14 Thread GitBox


MaxGekk commented on pull request #27366:
URL: https://github.com/apache/spark/pull/27366#issuecomment-658546141


   @cloud-fan Anything else should I do in the PR to be merged?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] stczwd commented on a change in pull request #29088: [SPARK-32289][SQL] Some characters are garbled when opening csv files with Excel

2020-07-14 Thread GitBox


stczwd commented on a change in pull request #29088:
URL: https://github.com/apache/spark/pull/29088#discussion_r454791986



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CsvOutputWriter.scala
##
@@ -39,6 +39,10 @@ class CsvOutputWriter(
 
   private val gen = new UnivocityGenerator(dataSchema, writer, params)
 
+  if (params.bom) {
+writer.write(0xFEFF)

Review comment:
   Excel. It will change the actual value if we add `0xFEFF` in the front.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small final Parquet/ORC files, we do the above tricks, don't we? 
This may cause a regression on the size of output storage.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small final Parquet/ORC files, we do the above tricks, don't we? 
This PR may cause a regression on the size of output storage.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun edited a comment on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small final Parquet/ORC files, we do the above tricks, don't we?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658544475


   To generate small Parquet/ORC files, we do the above tricks, don't we?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 edited a comment on pull request #29044: [WIP][SPARK-32227] Fix regression bug in load-spark-env.cmd with Spark 3.0.0

2020-07-14 Thread GitBox


warrenzhu25 edited a comment on pull request #29044:
URL: https://github.com/apache/spark/pull/29044#issuecomment-656771107


   > It's directly relevant to this PR because your patch is changing 
`environment` variable.
   > 
   > * Please see this for the detail (https://github.com/cdarlint/winutils)
   > * You can run AppVeyor in your Spark fork, too.
   
   winutils only impacted by PATH and HADOOP_HOME, and I don't touch both. 
Also, my change is just reverting into the version as 2.4.4. Could you help 
rerun the tests?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658543717


   Oops. Sorry, guys. It seems that I missed something during testing. For the 
following case, we should not remove `Sort`.
   
   **BEFORE THIS PR**
   ```scala
   scala> Seq((1,10),(1,20),(2,30),(2,40)).toDF("a", 
"b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b desc) distribute by 
a").show()
   +---+---+
   |  a|  b|
   +---+---+
   |  1| 20|
   |  1| 10|
   |  2| 40|
   |  2| 30|
   +---+---+
   ```
   
   **AFTER THIS PR**
   ```scala
   scala> Seq((1,10),(1,20),(2,30),(2,40)).toDF("a", 
"b").repartition(2).createOrReplaceTempView("t")
   
   scala> sql("select * from (select * from t order by b desc) distribute by 
a").show()
   +---+---+
   |  a|  b|
   +---+---+
   |  1| 10|
   |  1| 20|
   |  2| 30|
   |  2| 40|
   +---+---+
   ```
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] warrenzhu25 commented on pull request #28942: [SPARK-32125][UI] Support get taskList by status in Web UI and SHS Rest API

2020-07-14 Thread GitBox


warrenzhu25 commented on pull request #28942:
URL: https://github.com/apache/spark/pull/28942#issuecomment-658543670


   @gengliangwang Tests passed, could you help merge this?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon opened a new pull request #29117: [WIP] Debug flaky pip installation test failure

2020-07-14 Thread GitBox


HyukjinKwon opened a new pull request #29117:
URL: https://github.com/apache/spark/pull/29117


   ### What changes were proposed in this pull request?
   
   TBD
   
   ### Why are the changes needed?
   
   TBD
   
   ### Does this PR introduce _any_ user-facing change?
   
   TBD
   
   ### How was this patch tested?
   
   TBD
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR closed pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox


HeartSaVioR closed pull request #29077:
URL: https://github.com/apache/spark/pull/29077


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HeartSaVioR commented on pull request #29077: [SPARK-31985][SS] Remove incomplete/undocumented stateful aggregation in continuous mode

2020-07-14 Thread GitBox


HeartSaVioR commented on pull request #29077:
URL: https://github.com/apache/spark/pull/29077#issuecomment-658539797


   Thanks for the reviewing and kind words :) I'll deal with merging.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #29111:
URL: https://github.com/apache/spark/pull/29111#discussion_r454784921



##
File path: mllib/src/main/scala/org/apache/spark/ml/Estimator.scala
##
@@ -76,7 +76,7 @@ abstract class Estimator[M <: Model[M]] extends PipelineStage 
{
* @return fitted models, matching the input parameter maps
*/
   @Since("2.0.0")
-  def fit(dataset: Dataset[_], paramMaps: Array[ParamMap]): Seq[M] = {
+  def fit(dataset: Dataset[_], paramMaps: Seq[ParamMap]): Seq[M] = {

Review comment:
   cc @mengxr and @gatorsmile





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #29111: [SPARK-29292][SQL][ML] Update rest of default modules (Hive, ML, etc) for Scala 2.13 compilation

2020-07-14 Thread GitBox


dongjoon-hyun commented on a change in pull request #29111:
URL: https://github.com/apache/spark/pull/29111#discussion_r454784282



##
File path: examples/src/main/scala/org/apache/spark/examples/SparkKMeans.scala
##
@@ -102,5 +102,10 @@ object SparkKMeans {
 kPoints.foreach(println)
 spark.stop()
   }
+
+  private def mergeResults(a: (Vector[Double], Int),
+   b: (Vector[Double], Int)): (Vector[Double], Int) = {

Review comment:
   nit. Indentation?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] aokolnychyi commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


aokolnychyi commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658538432


   Thanks, everyone!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun commented on pull request #29089:
URL: https://github.com/apache/spark/pull/29089#issuecomment-658538140


   Also, cc @gatorsmile and @cloud-fan 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658519469


   **[Test build #125874 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)**
 for PR 29080 at commit 
[`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536762


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125866/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658537135


   Merged build finished. Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658537137


   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125874/
   Test PASSed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536619







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29080: [SPARK-32271][ML] Update CrossValidator to train folds in parallel

2020-07-14 Thread GitBox


SparkQA commented on pull request #29080:
URL: https://github.com/apache/spark/pull/29080#issuecomment-658536994


   **[Test build #125874 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125874/testReport)**
 for PR 29080 at commit 
[`6dd0a4d`](https://github.com/apache/spark/commit/6dd0a4d9a2157086ef33bd810f9e250114b33c7d).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536758







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658491516


   **[Test build #125865 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)**
 for PR 29114 at commit 
[`5630999`](https://github.com/apache/spark/commit/5630999689a555f5e026cabe5f7c200ff8b24256).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536691







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536613


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658536417


   **[Test build #125865 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125865/testReport)**
 for PR 29114 at commit 
[`5630999`](https://github.com/apache/spark/commit/5630999689a555f5e026cabe5f7c200ff8b24256).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun closed pull request #29089: [SPARK-32276][SQL] Remove redundant sorts before repartition nodes

2020-07-14 Thread GitBox


dongjoon-hyun closed pull request #29089:
URL: https://github.com/apache/spark/pull/29089


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658535423


   **[Test build #125878 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125878/testReport)**
 for PR 29114 at commit 
[`465fd8a`](https://github.com/apache/spark/commit/465fd8a5f4773c3fee69df9c5cf8d3ad57160d03).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534819


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125867/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534813


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658493500


   **[Test build #125867 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)**
 for PR 28708 at commit 
[`fe5ba7b`](https://github.com/apache/spark/commit/fe5ba7befc243a30377b0d3057ec3862726db2d3).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658503907


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/30475/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-07-14 Thread GitBox


SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-658534225


   **[Test build #125867 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125867/testReport)**
 for PR 28708 at commit 
[`fe5ba7b`](https://github.com/apache/spark/commit/fe5ba7befc243a30377b0d3057ec3862726db2d3).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658533895







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658533186


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/125863/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on pull request #29116: [SPARK-32316][TESTS][INFRA] Test PySpark with Python 3.8 in Github Actions

2020-07-14 Thread GitBox


HyukjinKwon commented on pull request #29116:
URL: https://github.com/apache/spark/pull/29116#issuecomment-658533425


   Thanks, @dongjoon-hyun 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is l

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox


SparkQA removed a comment on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658485359


   **[Test build #125863 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)**
 for PR 28848 at commit 
[`0e00862`](https://github.com/apache/spark/commit/0e0086288f6279569e8a11cef9d928b87c40469b).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658533182







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #28848: [SPARK-32003][CORE] When external shuffle service is used, unregister outputs for executor on fetch failure after executor is lost

2020-07-14 Thread GitBox


SparkQA commented on pull request #28848:
URL: https://github.com/apache/spark/pull/28848#issuecomment-658532861


   **[Test build #125863 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125863/testReport)**
 for PR 28848 at commit 
[`0e00862`](https://github.com/apache/spark/commit/0e0086288f6279569e8a11cef9d928b87c40469b).
* This patch **fails PySpark pip packaging tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins removed a comment on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


AmplabJenkins commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658529664







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on pull request #29114: [SPARK-32094][PYTHON] Update cloudpickle to v1.5.0

2020-07-14 Thread GitBox


SparkQA commented on pull request #29114:
URL: https://github.com/apache/spark/pull/29114#issuecomment-658529122


   **[Test build #125877 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/125877/testReport)**
 for PR 29114 at commit 
[`bdf31a8`](https://github.com/apache/spark/commit/bdf31a8035ae15c4fb496df173e408453c0ec2a4).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >