date:20201104

[GitHub] [spark] SparkQA commented on pull request #30260: [SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode

2020-11-04 Thread GitBox



SparkQA commented on pull request #30260:
URL: https://github.com/apache/spark/pull/30260#issuecomment-722210109


   **[Test build #130642 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130642/testReport)**
 for PR 30260 at commit 
[`fe0fe48`](https://github.com/apache/spark/commit/fe0fe4872c54550b44ab9458edb63d91d78b2ae0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zhouyejoe commented on a change in pull request #30163: [SPARK-32918][SHUFFLE] RPC implementation to support control plane coordination for push-based shuffle

2020-11-04 Thread GitBox



zhouyejoe commented on a change in pull request #30163:
URL: https://github.com/apache/spark/pull/30163#discussion_r517852660



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/ExternalBlockStoreClient.java
##
@@ -158,6 +158,42 @@ public void pushBlocks(
 }
   }
 
+  /**
+   * Invoked by Spark driver to notify external shuffle services to finalize 
the shuffle merge
+   * for a given shuffle. This allows the driver to start the shuffle reducer 
stage after properly
+   * finishing the shuffle merge process associated with the shuffle mapper 
stage.
+   *
+   * @param host host of shuffle server
+   * @param port port of shuffle server.
+   * @param shuffleId shuffle ID of the shuffle to be finalized
+   * @param listener the listener to receive MergeStatuses
+   */
+  public void finalizeShuffleMerge(

Review comment:
   Good suggestion. I will try it out and update the PR accordingly.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy edited a comment on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy edited a comment on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722185651


   @li36909 Could you update the description? Currently it's misleading. 
   
   @HyukjinKwon Actually `system.exit` can happen when user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned).   
   We have found several such cases from unskilled PySpark customers.  
   
   It would be better for Spark to throw an exception to the user, rather than 
hanging and making user not know what to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang opened a new pull request #30260: [SPARK-33354][SQL] New explicit cast syntax rules in ANSI mode

2020-11-04 Thread GitBox



gengliangwang opened a new pull request #30260:
URL: https://github.com/apache/spark/pull/30260


   
   
   ### What changes were proposed in this pull request?
   
   In section 6.13 of the ANSI SQL standard, there are syntax rules for valid 
combinations of the source and target data types.
   
![image](https://user-images.githubusercontent.com/1097932/98212874-17356f80-1ef9-11eb-8f2b-385f32db404a.png)
   
   To make Spark's ANSI mode more ANSI SQL Compatible，I propose to disallow the 
following casting in ANSI mode:
   ```
   TimeStamp <=> Boolean
   Date <=> Boolean
   Numeric <=> Timestamp
   Numeric <=> Date
   Numeric <=> Binary
   String <=> Array
   String <=> Map
   String <=> Struct
   ```
   The following castings are considered invalid in ANSI SQL standard, but they 
are quite straight forward. Let's Allow them for now
   ```
   Numeric <=> Boolean
   String <=> Boolean
   String <=> Binary
   ```
   ### Why are the changes needed?
   
   Better ANSI SQL compliance
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the following castings will not be allowed in ANSI mode:
   ```
   TimeStamp <=> Boolean
   Date <=> Boolean
   Numeric <=> Timestamp
   Numeric <=> Date
   Numeric <=> Binary
   String <=> Array
   String <=> Map
   String <=> Struct
   ```
   
   ### How was this patch tested?
   
   Unit test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722203652


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35251/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



SparkQA commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722203628


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35251/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722203647


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722203647







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30253: [SPARK-33162][INFRA][3.0] Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30253:
URL: https://github.com/apache/spark/pull/30253#issuecomment-722202289







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30253: [SPARK-33162][INFRA][3.0] Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30253:
URL: https://github.com/apache/spark/pull/30253#issuecomment-722202289







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30253: [SPARK-33162][INFRA][3.0] Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread GitBox



SparkQA removed a comment on pull request #30253:
URL: https://github.com/apache/spark/pull/30253#issuecomment-722132309


   **[Test build #130635 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130635/testReport)**
 for PR 30253 at commit 
[`80f5b4b`](https://github.com/apache/spark/commit/80f5b4b645ff31e70ebceb7532b526baf2564c5f).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30253: [SPARK-33162][INFRA][3.0] Use pre-built image at GitHub Action PySpark jobs

2020-11-04 Thread GitBox



SparkQA commented on pull request #30253:
URL: https://github.com/apache/spark/pull/30253#issuecomment-722200789


   **[Test build #130635 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130635/testReport)**
 for PR 30253 at commit 
[`80f5b4b`](https://github.com/apache/spark/commit/80f5b4b645ff31e70ebceb7532b526baf2564c5f).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] li36909 commented on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



li36909 commented on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722200121


   > @li36909 Could you update the description? Currently it's misleading.
   > 
   > @HyukjinKwon Actually `system.exit` can happen when user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned).
   > We have found several such cases from unskilled PySpark customers.
   > 
   > It would be better for Spark to throw an exception to the user, rather 
than hanging and making users not know what to do.
   
   updated



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 edited a comment on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-04 Thread GitBox



Ngone51 edited a comment on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-722196568


   > I ran an end-to-end test. One of the bugs was:
   > 
   > ```
   > Caused by: java.lang.IndexOutOfBoundsException
   > at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
   > at 
org.sparkproject.io.netty.buffer.UnpooledHeapByteBuf.nioBuffer(UnpooledHeapByteBuf.java:306)
   > at 
org.apache.spark.network.protocol.Encoders$Bitmaps.encode(Encoders.java:65)
   > at 
org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShufflePartitionInfo.writeChunkTracker(RemoteBlockPushResolver.java:871)
   > at 
org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShufflePartitionInfo.updateChunkInfo(RemoteBlockPushResolver.java:845)
   > at 
org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.onComplete(RemoteBlockPushResolver.java:653)
   > at 
org.apache.spark.network.server.TransportRequestHandler$3.onComplete(TransportRequestHandler.java:230)
   > ```
   > 
   > Fix this by adding `buf.ensureWritable(encodedLength)` in 
`Encoders.Bitmaps.encode`.
   
   Why would the error happen? Does it mean we calculate the wrong message 
length somewhere when allocating the buffer?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-04 Thread GitBox



Ngone51 commented on pull request #30062:
URL: https://github.com/apache/spark/pull/30062#issuecomment-722196568


   > I ran an end-to-end test. One of the bugs was:
   > 
   > ```
   > Caused by: java.lang.IndexOutOfBoundsException
   > at java.nio.ByteBuffer.wrap(ByteBuffer.java:375)
   > at 
org.sparkproject.io.netty.buffer.UnpooledHeapByteBuf.nioBuffer(UnpooledHeapByteBuf.java:306)
   > at 
org.apache.spark.network.protocol.Encoders$Bitmaps.encode(Encoders.java:65)
   > at 
org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShufflePartitionInfo.writeChunkTracker(RemoteBlockPushResolver.java:871)
   > at 
org.apache.spark.network.shuffle.RemoteBlockPushResolver$AppShufflePartitionInfo.updateChunkInfo(RemoteBlockPushResolver.java:845)
   > at 
org.apache.spark.network.shuffle.RemoteBlockPushResolver$PushBlockStreamCallback.onComplete(RemoteBlockPushResolver.java:653)
   > at 
org.apache.spark.network.server.TransportRequestHandler$3.onComplete(TransportRequestHandler.java:230)
   > ```
   > 
   > Fix this by adding `buf.ensureWritable(encodedLength)` in 
`Encoders.Bitmaps.encode`.
   
   Why would the error happen? Does it mean we calculate the wrong message 
length when allocating the buffer somewhere?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy commented on a change in pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy commented on a change in pull request #30248:
URL: https://github.com/apache/spark/pull/30248#discussion_r517839698



##
File path: python/pyspark/tests/test_worker.py
##
@@ -95,6 +95,14 @@ def raise_exception(_):
 self.assertRaises(Exception, lambda: rdd.foreach(raise_exception))
 self.assertEqual(100, rdd.map(str).count())
 
+def test_after_workerExit(self):

Review comment:
   `test_after_workerExit` -> `test_after_non_exception_error` ?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



HeartSaVioR commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722195809


   Thanks for the contribution! Merged into master, branch-3.0, branch-2.4 
respectively.
   
   * master: 
https://github.com/apache/spark/commit/e66201b30bc1f3da7284af14b32e5e6200768dbd
   * branch-3.0: 
https://github.com/apache/spark/commit/c43231c6cd2d38fb9e6c6435ca08b735c9aacb61
   * branch-2.4: 
https://github.com/apache/spark/commit/868472040a9eb67169086ed0fab0afcbbbd321f9
   
   Please note that the website will be updated separately - the change will be 
reflected in future releases.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



cloud-fan commented on a change in pull request #30154:
URL: https://github.com/apache/spark/pull/30154#discussion_r517837665



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
##
@@ -146,5 +158,25 @@ private[v2] trait V2JDBCTest extends SharedSparkSession {
 }.getMessage
 assert(msg.contains("Table not found"))
   }
+
+  test("CREATE TABLE with table comment") {
+withTable(s"$catalogName.new_table") {
+  val logAppender = new LogAppender("table comment")
+  withLogAppender(logAppender) {
+sql(s"CREATE TABLE $catalogName.new_table(i INT) USING _ COMMENT 'this 
is a comment'")
+  }
+  val createCommentWarning = logAppender.loggingEvents
+.filter(_.getLevel == Level.WARN)
+.map(_.getRenderedMessage)
+.exists(_.contains("Cannot create JDBC table comment"))

Review comment:
   No DB supports it?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30148: [SPARK-33244][SQL] Unify the code paths for spark.table and spark.read.table

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30148:
URL: https://github.com/apache/spark/pull/30148#issuecomment-722194009







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR closed pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



HeartSaVioR closed pull request #30252:
URL: https://github.com/apache/spark/pull/30252


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



cloud-fan commented on a change in pull request #30154:
URL: https://github.com/apache/spark/pull/30154#discussion_r517837382



##
File path: 
external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/v2/V2JDBCTest.scala
##
@@ -146,5 +158,25 @@ private[v2] trait V2JDBCTest extends SharedSparkSession {
 }.getMessage
 assert(msg.contains("Table not found"))
   }
+
+  test("CREATE TABLE with table comment") {
+withTable(s"$catalogName.new_table") {
+  val logAppender = new LogAppender("table comment")
+  withLogAppender(logAppender) {
+sql(s"CREATE TABLE $catalogName.new_table(i INT) USING _ COMMENT 'this 
is a comment'")
+  }
+  val createCommentWarning = logAppender.loggingEvents
+.filter(_.getLevel == Level.WARN)
+.map(_.getRenderedMessage)
+.exists(_.contains("Cannot create JDBC table comment"))
+  assert(createCommentWarning === notSupportsTableComment)
+}
+  }
+
+  test("CREATE TABLE with table property") {
+withTable(s"$catalogName.new_table") {
+  testCreateTableWithProperty(s"$catalogName.new_table")

Review comment:
   We can test some table properties that always fail, e.g. 
`TBLPROPERTIES(a=1)`.
   
   Then we can have `def testCreateTableWithProperty` for positive test, which 
by default is empty: `def testCreateTableWithProperty = {}`





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30148: [SPARK-33244][SQL] Unify the code paths for spark.table and spark.read.table

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30148:
URL: https://github.com/apache/spark/pull/30148#issuecomment-722194009







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



SparkQA commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722193471


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35251/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30154:
URL: https://github.com/apache/spark/pull/30154#issuecomment-722192680







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30148: [SPARK-33244][SQL] Unify the code paths for spark.table and spark.read.table

2020-11-04 Thread GitBox



SparkQA removed a comment on pull request #30148:
URL: https://github.com/apache/spark/pull/30148#issuecomment-722086287


   **[Test build #130631 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130631/testReport)**
 for PR 30148 at commit 
[`bcd7dff`](https://github.com/apache/spark/commit/bcd7dff6126bad7da2333e8a0b9768c142acc653).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



cloud-fan commented on a change in pull request #30154:
URL: https://github.com/apache/spark/pull/30154#discussion_r517836327



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/jdbc/JdbcUtils.scala
##
@@ -863,15 +863,24 @@ object JdbcUtils extends Logging {
   schema: StructType,
   caseSensitive: Boolean,
   options: JdbcOptionsInWrite): Unit = {
+val dialect = JdbcDialects.get(options.url)
 val strSchema = schemaString(
   schema, caseSensitive, options.url, options.createTableColumnTypes)
 val createTableOptions = options.createTableOptions
 // Create the table if the table does not exist.
 // To allow certain options to append when create a new table, which can be
 // table_options or partition_options.
 // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT CHARSET=utf8"
-val sql = s"CREATE TABLE $tableName ($strSchema) $createTableOptions"
-executeStatement(conn, options, sql)
+val sql = dialect.createTable(tableName, strSchema, createTableOptions, 
options.tableComment)

Review comment:
   Now I agree that comment is just a special case. Can we revert it to the 
previous version that we explicitly run an extra statement to set table comment?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30154:
URL: https://github.com/apache/spark/pull/30154#issuecomment-722192680







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30148: [SPARK-33244][SQL] Unify the code paths for spark.table and spark.read.table

2020-11-04 Thread GitBox



SparkQA commented on pull request #30148:
URL: https://github.com/apache/spark/pull/30148#issuecomment-722192944


   **[Test build #130631 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130631/testReport)**
 for PR 30148 at commit 
[`bcd7dff`](https://github.com/apache/spark/commit/bcd7dff6126bad7da2333e8a0b9768c142acc653).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



SparkQA commented on pull request #30154:
URL: https://github.com/apache/spark/pull/30154#issuecomment-722192669


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35250/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



cloud-fan commented on a change in pull request #30154:
URL: https://github.com/apache/spark/pull/30154#discussion_r517835283



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
##
@@ -117,14 +117,33 @@ class JDBCTableCatalog extends TableCatalog with Logging {
 if (partitions.nonEmpty) {
   throw new UnsupportedOperationException("Cannot create JDBC table with 
partition")
 }
-// TODO (SPARK-32405): Apply table options while creating tables in JDBC 
Table Catalog
+
+var tableOptions = options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> 
getTableName(ident))
+var tableComment: String = ""
+var tableProperties: String = ""
 if (!properties.isEmpty) {
-  logWarning("Cannot create JDBC table with properties, these properties 
will be " +
-"ignored: " + properties.asScala.map { case (k, v) => s"$k=$v" 
}.mkString("[", ", ", "]"))
+  properties.asScala.map {
+case (k, v) => k match {
+  case "comment" => tableComment = v
+  case "provider" | "owner" | "location" => // provider, owner and 
location can't be set.

Review comment:
   let's ignore `provider` only, with a TODO to fail it once the CREATE 
TABLE syntax doesn't force people to specify a provider. We should fail if 
`owner` or `location` is present.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30177: [SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends.

2020-11-04 Thread GitBox



HyukjinKwon commented on pull request #30177:
URL: https://github.com/apache/spark/pull/30177#issuecomment-722190743


   @ueshin, for some reasons, the tests became pretty flaky .. I will just 
revert as you said.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



HyukjinKwon closed pull request #30244:
URL: https://github.com/apache/spark/pull/30244


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



HyukjinKwon commented on pull request #30244:
URL: https://github.com/apache/spark/pull/30244#issuecomment-722188937


   Merged to master.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy edited a comment on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy edited a comment on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722185651


   @li36909 Could you update the description? Currently it's misleading. 
   
   @HyukjinKwon Actually `system.exit` can happen when user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned).   
   We have found several such cases from unskilled PySpark customers.  
   
   It would be better for Spark to throw an exception to the user, rather than 
hanging and making users not know what to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



HyukjinKwon commented on pull request #30244:
URL: https://github.com/apache/spark/pull/30244#issuecomment-722188129


   Let me merge and see how it goes.
   
   Thanks for this work and thanks for the followup in advance. BTW would you 
mind file a separate JIRA to follow up?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-04 Thread GitBox



Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r517831144



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,966 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  @VisibleForTesting
+  static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated metadata. If not present and the 
corresponding merged
+   * shuffle does not

[GitHub] [spark] kbendick commented on pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



kbendick commented on pull request #30244:
URL: https://github.com/apache/spark/pull/30244#issuecomment-722187128


   > Thanks for working on this @kbendick. Let me know when you think it's 
ready to merge. We can merge and see how it gose.
   
   Thanks @HyukjinKwon! This is ready to be merged.
   
   It might need a small tweak or two given it was a pretty complicated config 
relative to the others that I've done so far but I would say it's tested and 
ready to merge so people can finally have their labels back.  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy edited a comment on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy edited a comment on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722185651


   @li36909 Could you update the description? Currently it's misleading. 
   
   @HyukjinKwon Actually `system.exit` is throw because user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned).   
   We have found several such cases from unskilled PySpark customers.  
   
   It would be better for Spark to throw an exception to the user, rather than 
hanging and making users not know what to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kbendick commented on a change in pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



kbendick commented on a change in pull request #30244:
URL: https://github.com/apache/spark/pull/30244#discussion_r517829806



##
File path: .github/labeler.yml
##
@@ -0,0 +1,144 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# Pull Request Labeler Github Action Configuration: 
https://github.com/marketplace/actions/labeler
+#
+# Note that we currently cannot use the negatioon operator  (i.e. `!`)  for 
miniglob matches as they
+# would match any file that doesn't touch them. What's needed is the concept 
of `any `, which takes a
+# list of constraints / globs and then matches all of the constraints for 
either `any` of the files or
+# `all` of the files.
+#
+# However, `any`/`all` are not supported in a released version and testing off 
of the `main` branch
+# had them not working proprly. While we wait for this issue to be handled 
upstream, we can remove
+# the negative / not matches for now and at least have labels again.
+INFRA:
+  - ".github/**/*"
+  - "appveyor.yml"
+  - "tools/**/*"
+  - "dev/create-release/**/*"
+  - ".asf.yaml"
+  - ".gitattributes"
+  - ".gitignore"
+  - "dev/github_jira_sync.py"
+  - "dev/merge_spark_pr.py"
+  - "dev/run-tests-jenkins*"
+BUILD:
+ # Can be supported when a stable release with correct all/any is released
+ #- any: ['dev/**/*', '!dev/github_jira_sync.py', '!dev/merge_spark_pr.py', 
'!dev/.rat-excludes']
+ - "dev/**/*"
+ - "build/**/*"
+ - "project/**/*"
+ - "assembly/**/*"
+ - "**/*pom.xml"
+ - "bin/docker-image-tool.sh"
+ - "bin/find-spark-home*"
+ - "scalastyle-config.xml"
+ # These can be added in the above `any` clause (and the /dev/**/* glob 
removed) when
+ # `any`/`all` support is released
+ # - "!dev/github_jira_sync.py"
+ # - "!dev/merge_spark_pr.py"
+ # - "!dev/run-tests-jenkins*"
+ # - "!dev/.rat-excludes"
+DOCS:
+  - "docs/**/*"
+  - "**/README.md"
+  - "**/CONTRIBUTING.md"
+EXAMPLES:
+  - "examples/**/*"
+  - "bin/run-example*"
+# CORE needs to be updated when all/any are released upstream.
+CORE:
+  # - any: ["core/**/*", "!**/*UI.scala", "!**/ui/**/*"] # If any file matches 
all of the globs defined in the list started by `any`, label is applied.
+  - "core/**/*"

Review comment:
   Ah ok. I wasn't sure on the original intended behavior for a few of them 
but I agree let's keep it focused on converting it to the labeler action for 
now (other than the few bits that don't seem to be stable yet - though I'm 
going to give those another go now that I've gotten a lot more experience with 
this labeler).
   
   TLDR: I will leave it as originally intended  





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy edited a comment on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy edited a comment on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722185651


   @li36909 Could you update the description? Currently it's misleading. 
   
   @HyukjinKwon Actually `system.exit` is throw because user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned).   
   We have found several such cases from unskilled Spark customers.  
   
   It would be better for Spark to throw an exception to the user, rather than 
hanging and making users not know what to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy edited a comment on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy edited a comment on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722185651


   @li36909 Could you update the description? Currently it's misleading. 
   
   @HyukjinKwon Actually `system.exit` is throw because user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned). We 
have found several such cases from unskilled Spark customers.  
   It would be better for Spark to throw an exception to the user, rather than 
hanging and making users not know what to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] wzhfy commented on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



wzhfy commented on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722185651


   @li36909 Could you update the description? Currently it's misleading. 
   @HyukjinKwon Actually `system.exit` is throw because user writes some 
incorrect code (e.g. try to create a SparkContext in a executor task while no 
SPARK_HOME is available on that node, or the udf case @li36909 mentioned). We 
have found several such cases from unskilled Spark customers.  
   It would be better for Spark to throw an exception to the user, rather than 
hanging and making users not know what to do.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kbendick commented on a change in pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



kbendick commented on a change in pull request #30244:
URL: https://github.com/apache/spark/pull/30244#discussion_r517829493



##
File path: .github/labeler.yml
##
@@ -0,0 +1,144 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+# Pull Request Labeler Github Action Configuration: 
https://github.com/marketplace/actions/labeler
+#
+# Note that we currently cannot use the negatioon operator  (i.e. `!`)  for 
miniglob matches as they
+# would match any file that doesn't touch them. What's needed is the concept 
of `any `, which takes a
+# list of constraints / globs and then matches all of the constraints for 
either `any` of the files or
+# `all` of the files.
+#
+# However, `any`/`all` are not supported in a released version and testing off 
of the `main` branch
+# had them not working proprly. While we wait for this issue to be handled 
upstream, we can remove
+# the negative / not matches for now and at least have labels again.
+INFRA:
+  - ".github/**/*"
+  - "appveyor.yml"
+  - "tools/**/*"
+  - "dev/create-release/**/*"
+  - ".asf.yaml"
+  - ".gitattributes"
+  - ".gitignore"
+  - "dev/github_jira_sync.py"
+  - "dev/merge_spark_pr.py"
+  - "dev/run-tests-jenkins*"
+BUILD:
+ # Can be supported when a stable release with correct all/any is released
+ #- any: ['dev/**/*', '!dev/github_jira_sync.py', '!dev/merge_spark_pr.py', 
'!dev/.rat-excludes']
+ - "dev/**/*"
+ - "build/**/*"
+ - "project/**/*"
+ - "assembly/**/*"
+ - "**/*pom.xml"
+ - "bin/docker-image-tool.sh"
+ - "bin/find-spark-home*"
+ - "scalastyle-config.xml"
+ # These can be added in the above `any` clause (and the /dev/**/* glob 
removed) when
+ # `any`/`all` support is released
+ # - "!dev/github_jira_sync.py"
+ # - "!dev/merge_spark_pr.py"
+ # - "!dev/run-tests-jenkins*"
+ # - "!dev/.rat-excludes"
+DOCS:
+  - "docs/**/*"
+  - "**/README.md"

Review comment:
   Ok. I'll leave them as is. I'm not sure what exactly their purpose is so 
I'll follow up in another PR on those two files.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] kbendick commented on a change in pull request #30244: [SPARK-33282] Migrate from deprecated probot autolabeler to GitHub labeler action

2020-11-04 Thread GitBox



kbendick commented on a change in pull request #30244:
URL: https://github.com/apache/spark/pull/30244#discussion_r517828993



##
File path: .github/labeler.yml
##
@@ -0,0 +1,152 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one
+# or more contributor license agreements.  See the NOTICE file
+# distributed with this work for additional information
+# regarding copyright ownership.  The ASF licenses this file
+# to you under the Apache License, Version 2.0 (the
+# "License"); you may not use this file except in compliance
+# with the License.  You may obtain a copy of the License at
+#
+#   http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing,
+# software distributed under the License is distributed on an
+# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+# KIND, either express or implied.  See the License for the
+# specific language governing permissions and limitations
+# under the License.
+#
+
+#
+# Pull Request Labeler Github Action Configuration: 
https://github.com/marketplace/actions/labeler
+#
+# Note that we currently cannot use the negatioon operator  (i.e. `!`)  for 
miniglob matches as they
+# would match any file that doesn't touch them. What's needed is the concept 
of `any `, which takes a
+# list of constraints / globs and then matches all of the constraints for 
either `any` of the files or
+# `all` of the files in the change set.
+#
+# However, `any`/`all` are not supported in a released version and testing off 
of the `main` branch
+# resulted in some other errors when testing.
+#
+# An issue has been opened upstream requesting that a release be cut that has 
support for all/any:
+#   - https://github.com/actions/labeler/issues/111
+#
+# While we wait for this issue to be handled upstream, we can remove
+# the negated / `!` matches for now and at least have labels again.
+#
+INFRA:
+  - ".github/**/*"
+  - "appveyor.yml"
+  - "tools/**/*"
+  - "dev/create-release/**/*"
+  - ".asf.yaml"
+  - ".gitattributes"
+  - ".gitignore"
+  - "dev/github_jira_sync.py"
+  - "dev/merge_spark_pr.py"
+  - "dev/run-tests-jenkins*"
+BUILD:
+ # Can be supported when a stable release with correct all/any is released
+ #- any: ['dev/**/*', '!dev/github_jira_sync.py', '!dev/merge_spark_pr.py', 
'!dev/.rat-excludes']
+ - "dev/**/*"
+ - "build/**/*"
+ - "project/**/*"
+ - "assembly/**/*"
+ - "**/*pom.xml"
+ - "bin/docker-image-tool.sh"
+ - "bin/find-spark-home*"
+ - "scalastyle-config.xml"
+ # These can be added in the above `any` clause (and the /dev/**/* glob 
removed) when
+ # `any`/`all` support is released

Review comment:
   Cool. I opened an issue upstream asking when they thought they might be 
releasing a stable build with `any` / `all`. We can follow up on them later so 
that at least the large majority of PRs will be correctly labeled.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #30062: [SPARK-32916][SHUFFLE] Implementation of shuffle service that leverages push-based shuffle in YARN deployment mode

2020-11-04 Thread GitBox



Ngone51 commented on a change in pull request #30062:
URL: https://github.com/apache/spark/pull/30062#discussion_r517828519



##
File path: 
common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RemoteBlockPushResolver.java
##
@@ -0,0 +1,959 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.network.shuffle;
+
+import java.io.BufferedOutputStream;
+import java.io.DataOutputStream;
+import java.io.File;
+import java.io.FileOutputStream;
+import java.io.IOException;
+import java.nio.ByteBuffer;
+import java.nio.channels.FileChannel;
+import java.nio.file.Files;
+import java.nio.file.Path;
+import java.nio.file.Paths;
+import java.util.Arrays;
+import java.util.Collection;
+import java.util.Iterator;
+import java.util.LinkedList;
+import java.util.List;
+import java.util.Map;
+import java.util.concurrent.ConcurrentMap;
+import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Executor;
+import java.util.concurrent.Executors;
+
+import com.google.common.annotations.VisibleForTesting;
+import com.google.common.base.Objects;
+import com.google.common.base.Preconditions;
+import com.google.common.cache.CacheBuilder;
+import com.google.common.cache.CacheLoader;
+import com.google.common.cache.LoadingCache;
+import com.google.common.cache.Weigher;
+import com.google.common.collect.Maps;
+import io.netty.buffer.ByteBuf;
+import io.netty.buffer.Unpooled;
+import org.roaringbitmap.RoaringBitmap;
+import org.slf4j.Logger;
+import org.slf4j.LoggerFactory;
+
+import org.apache.spark.network.buffer.FileSegmentManagedBuffer;
+import org.apache.spark.network.buffer.ManagedBuffer;
+import org.apache.spark.network.client.StreamCallbackWithID;
+import org.apache.spark.network.protocol.Encoders;
+import org.apache.spark.network.shuffle.protocol.ExecutorShuffleInfo;
+import org.apache.spark.network.shuffle.protocol.FinalizeShuffleMerge;
+import org.apache.spark.network.shuffle.protocol.MergeStatuses;
+import org.apache.spark.network.shuffle.protocol.PushBlockStream;
+import org.apache.spark.network.util.JavaUtils;
+import org.apache.spark.network.util.NettyUtils;
+import org.apache.spark.network.util.TransportConf;
+
+/**
+ * An implementation of {@link MergedShuffleFileManager} that provides the 
most essential shuffle
+ * service processing logic to support push based shuffle.
+ */
+public class RemoteBlockPushResolver implements MergedShuffleFileManager {
+
+  private static final Logger logger = 
LoggerFactory.getLogger(RemoteBlockPushResolver.class);
+  private static final String SHUFFLE_PUSH_BLOCK_PREFIX = "shufflePush";
+  @VisibleForTesting
+  static final String MERGE_MANAGER_DIR = "merge_manager";
+
+  private final ConcurrentMap appsPathInfo;
+  private final ConcurrentMap> partitions;
+
+  private final Executor directoryCleaner;
+  private final TransportConf conf;
+  private final int minChunkSize;
+  private final ErrorHandler.BlockPushErrorHandler errorHandler;
+
+  @SuppressWarnings("UnstableApiUsage")
+  private final LoadingCache indexCache;
+
+  @SuppressWarnings("UnstableApiUsage")
+  public RemoteBlockPushResolver(TransportConf conf) {
+this.conf = conf;
+this.partitions = Maps.newConcurrentMap();
+this.appsPathInfo = Maps.newConcurrentMap();
+this.directoryCleaner = Executors.newSingleThreadExecutor(
+  // Add `spark` prefix because it will run in NM in Yarn mode.
+  
NettyUtils.createThreadFactory("spark-shuffle-merged-shuffle-directory-cleaner"));
+this.minChunkSize = conf.minChunkSizeInMergedShuffleFile();
+CacheLoader indexCacheLoader =
+  new CacheLoader() {
+public ShuffleIndexInformation load(File file) throws IOException {
+  return new ShuffleIndexInformation(file);
+}
+  };
+indexCache = CacheBuilder.newBuilder()
+  .maximumWeight(conf.mergedIndexCacheSize())
+  .weigher((Weigher) (file, indexInfo) -> 
indexInfo.getSize())
+  .build(indexCacheLoader);
+this.errorHandler = new ErrorHandler.BlockPushErrorHandler();
+  }
+
+  /**
+   * Given the appShuffleId and reduceId that uniquely identifies a given 
shuffle partition of an
+   * application, retrieves the associated

[GitHub] [spark] mridulm commented on a change in pull request #30204: [SPARK-33288][SPARK-32661][K8S] Stage level scheduling support for Kubernetes

2020-11-04 Thread GitBox



mridulm commented on a change in pull request #30204:
URL: https://github.com/apache/spark/pull/30204#discussion_r517828319



##
File path: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/entrypoint.sh
##
@@ -85,6 +85,7 @@ case "$1" in
   --cores $SPARK_EXECUTOR_CORES
   --app-id $SPARK_APPLICATION_ID
   --hostname $SPARK_EXECUTOR_POD_IP
+  --resourceProfileId $SPARK_RESOURCE_PROFILE_ID

Review comment:
   Yes, thanks for clarifying !





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



SparkQA removed a comment on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722175227


   **[Test build #130641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130641/testReport)**
 for PR 30252 at commit 
[`b03f87f`](https://github.com/apache/spark/commit/b03f87f4863df6c93b48ba0ad24eb88d3719cc7c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722181015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722181015







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



SparkQA commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722180817


   **[Test build #130641 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130641/testReport)**
 for PR 30252 at commit 
[`b03f87f`](https://github.com/apache/spark/commit/b03f87f4863df6c93b48ba0ad24eb88d3719cc7c).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722178855







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



SparkQA commented on pull request #30154:
URL: https://github.com/apache/spark/pull/30154#issuecomment-722179520


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35250/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722178855







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



SparkQA commented on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722178835


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35249/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] sarutak commented on a change in pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



sarutak commented on a change in pull request #30259:
URL: https://github.com/apache/spark/pull/30259#discussion_r517821837



##
File path: .github/workflows/build_and_test.yml
##
@@ -111,13 +111,13 @@ jobs:
 key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-${{ 
hashFiles('**/pom.xml') }}
 restore-keys: |
   ${{ matrix.java }}-${{ matrix.hadoop }}-maven-
-- name: Cache Ivy local repository
+- name: Cache Coursier local repository
   uses: actions/cache@v2
   with:
-path: ~/.ivy2/cache
-key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ 
hashFiles('**/pom.xml', '**/plugins.sbt') }}
+path: ~/.cache/coursier

Review comment:
   > BTW, I cannot find this directory in my Mac. Is it dependent on OS?
   
   Yeah, seems so.
   On macOS, it is `~/Library/Caches/Coursier`.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] otterc commented on a change in pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging

2020-11-04 Thread GitBox



otterc commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r517814517



##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -249,6 +249,8 @@ private[spark] class DAGScheduler(
   private[spark] val eventProcessLoop = new DAGSchedulerEventProcessLoop(this)
   taskScheduler.setDAGScheduler(this)
 
+  private val pushBasedShuffleEnabled = 
Utils.isPushBasedShuffleEnabled(sc.getConf)

Review comment:
   Where is this being used?

##
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##
@@ -95,6 +97,30 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
   val shuffleHandle: ShuffleHandle = 
_rdd.context.env.shuffleManager.registerShuffle(
 shuffleId, this)
 
+  // By default, shuffle merge is enabled for ShuffleDependency if push based 
shuffle is enabled
+  private[spark] var _shuffleMergeEnabled =

Review comment:
   Why does this variable name start with `_`? 
   Other variables in this class don't follow this.

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -657,6 +681,28 @@ class BlockManagerMasterEndpoint(
 }
   }
 
+  private def getShufflePushMergerLocations(
+  numMergersNeeded: Int,
+  hostsToFilter: Set[String]): Seq[BlockManagerId] = {
+val activeBlockManagers = blockManagerIdByExecutor.groupBy(_._2.host)
+.mapValues(_.head).values.map(_._2).toSet

Review comment:
   Nit: indentation

##
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##
@@ -95,6 +97,30 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
   val shuffleHandle: ShuffleHandle = 
_rdd.context.env.shuffleManager.registerShuffle(
 shuffleId, this)
 
+  // By default, shuffle merge is enabled for ShuffleDependency if push based 
shuffle is enabled
+  private[spark] var _shuffleMergeEnabled =
+Utils.isPushBasedShuffleEnabled(rdd.sparkContext.getConf)
+
+  def setShuffleMergeEnabled(shuffleMergeEnabled: Boolean): Unit = {
+_shuffleMergeEnabled = shuffleMergeEnabled
+  }
+
+  def shuffleMergeEnabled : Boolean = _shuffleMergeEnabled

Review comment:
   Nit: `isShuffleMergeEnabled` seems better.

##
File path: core/src/main/scala/org/apache/spark/storage/BlockManagerMaster.scala
##
@@ -125,6 +125,17 @@ class BlockManagerMaster(
 driverEndpoint.askSync[Seq[BlockManagerId]](GetPeers(blockManagerId))
   }
 
+  /**
+   * Get list of unique shuffle service locations where an executor is 
successfully

Review comment:
   Nit: `Get a list...`

##
File path: core/src/main/scala/org/apache/spark/Dependency.scala
##
@@ -95,6 +97,30 @@ class ShuffleDependency[K: ClassTag, V: ClassTag, C: 
ClassTag](
   val shuffleHandle: ShuffleHandle = 
_rdd.context.env.shuffleManager.registerShuffle(
 shuffleId, this)
 
+  // By default, shuffle merge is enabled for ShuffleDependency if push based 
shuffle is enabled
+  private[spark] var _shuffleMergeEnabled =
+Utils.isPushBasedShuffleEnabled(rdd.sparkContext.getConf)
+
+  def setShuffleMergeEnabled(shuffleMergeEnabled: Boolean): Unit = {
+_shuffleMergeEnabled = shuffleMergeEnabled
+  }
+
+  def shuffleMergeEnabled : Boolean = _shuffleMergeEnabled
+
+  /**
+   * Stores the location of the list of chosen external shuffle services for 
handling the
+   * shuffle merge requests from mappers in this shuffle map stage.
+   */
+  private[spark] var _mergerLocs: Seq[BlockManagerId] = Nil

Review comment:
   This starts with `_` as well.

##
File path: core/src/test/scala/org/apache/spark/storage/BlockManagerSuite.scala
##
@@ -1974,7 +1974,51 @@ class BlockManagerSuite extends SparkFunSuite with 
Matchers with BeforeAndAfterE
 }
   }
 
-  class MockBlockTransferService(val maxFailures: Int) extends 
BlockTransferService {
+  test("mergerLocations should be bounded with in" +

Review comment:
   Can you re-word the the test name. It's not easy to understand.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



SparkQA commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722175227


   **[Test build #130641 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130641/testReport)**
 for PR 30252 at commit 
[`b03f87f`](https://github.com/apache/spark/commit/b03f87f4863df6c93b48ba0ad24eb88d3719cc7c).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30245: [SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30245:
URL: https://github.com/apache/spark/pull/30245#issuecomment-722173682







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



HeartSaVioR commented on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-722173750


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30245: [SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30245:
URL: https://github.com/apache/spark/pull/30245#issuecomment-722173682







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30252: [MINOR][SS][DOCS] Update join type in stream static joins code examples

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30252:
URL: https://github.com/apache/spark/pull/30252#issuecomment-721917138


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30245: [SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions

2020-11-04 Thread GitBox



SparkQA removed a comment on pull request #30245:
URL: https://github.com/apache/spark/pull/30245#issuecomment-722074922


   **[Test build #130627 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130627/testReport)**
 for PR 30245 at commit 
[`cd3776c`](https://github.com/apache/spark/commit/cd3776c1dc997e30f1c44d873b8e1ab35e567726).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30245: [SPARK-33337][SQL] Support subexpression elimination in branches of conditional expressions

2020-11-04 Thread GitBox



SparkQA commented on pull request #30245:
URL: https://github.com/apache/spark/pull/30245#issuecomment-722172383


   **[Test build #130627 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130627/testReport)**
 for PR 30245 at commit 
[`cd3776c`](https://github.com/apache/spark/commit/cd3776c1dc997e30f1c44d873b8e1ab35e567726).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



SparkQA commented on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722169586


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35249/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30259:
URL: https://github.com/apache/spark/pull/30259#issuecomment-722166837







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] li36909 commented on pull request #30248: [SPARK-33339][PYTHON] Pyspark application maybe hangup because of wor…

2020-11-04 Thread GitBox



li36909 commented on pull request #30248:
URL: https://github.com/apache/spark/pull/30248#issuecomment-722168240


   > Just curious. Why would you run `system.exit`? You should better just 
throw a proper exception. Calling `system.exit` inside the UDFs wouldn't 
terminate user applications as intended.
   
   Below is the case I encountered, and it's cause by the code had a method 
named 'testUdf' and a udf obj also named 'testUdf', then worker exit. And I 
think the application should not hangup without any information anyway
   ```
   import pyspark.sql.functions as F
   from pyspark.sql.types import *
   from pyspark.sql import SparkSession
   
   def testUdf(x):
   return x+1
   
   spark = SparkSession.builder.appName("test").getOrCreate()
   testUdf=F.udf(lambda x: testUdf(x),DoubleType())
   
   df = spark.sql("select float(2.145025959889993) as a")
   df.withColumn("test", testUdf(df.a)).show()
   ```



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30259:
URL: https://github.com/apache/spark/pull/30259#issuecomment-722166837







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



SparkQA commented on pull request #30259:
URL: https://github.com/apache/spark/pull/30259#issuecomment-722166817


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35248/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722164813


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35245/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722164805


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722164324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722164805







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



SparkQA commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722164789


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35245/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722164324







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



SparkQA commented on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722164312


   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35246/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722163392


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/35247/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



SparkQA commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722163368


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35247/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722163387







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722163387


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



SparkQA commented on pull request #30154:
URL: https://github.com/apache/spark/pull/30154#issuecomment-722161086


   **[Test build #130640 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130640/testReport)**
 for PR 30154 at commit 
[`bb9c4fc`](https://github.com/apache/spark/commit/bb9c4fcb554cc3747f812fb443afe6713898716a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] Ngone51 commented on a change in pull request #30164: [SPARK-32919][SHUFFLE] Driver side changes for coordinating push based shuffle by selecting external shuffle services for merging

2020-11-04 Thread GitBox



Ngone51 commented on a change in pull request #30164:
URL: https://github.com/apache/spark/pull/30164#discussion_r516635097



##
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
##
@@ -161,6 +169,32 @@ private[spark] abstract class YarnSchedulerBackend(
 totalRegisteredExecutors.get() >= totalExpectedExecutors * 
minRegisteredRatio
   }
 
+  override def getShufflePushMergerLocations(
+  numPartitions: Int,
+  resourceProfileId: Int): Seq[BlockManagerId] = {
+// Currently this is naive way of calculating numMergersNeeded for a 
stage. In future,
+// we can use better heuristics to calculate numMergersNeeded for a stage.
+val maxExecutors = if (Utils.isDynamicAllocationEnabled(sc.getConf)) {
+  maxNumExecutors
+} else {
+  Int.MaxValue
+}
+val tasksPerExecutor = sc.resourceProfileManager
+.resourceProfileFromId(resourceProfileId).maxTasksPerExecutor(sc.conf)

Review comment:
   nit: 2 indentations

##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1281,6 +1309,12 @@ private[spark] class DAGScheduler(
 stage match {
   case s: ShuffleMapStage =>
 outputCommitCoordinator.stageStart(stage = s.id, maxPartitionId = 
s.numPartitions - 1)
+// Only generate merger location for a given shuffle dependency once. 
This way, even if
+// this stage gets retried, it would still be merging blocks using the 
same set of
+// shuffle services.
+if (s.shuffleDep.shuffleMergeEnabled) {

Review comment:
   I don't think you need to add the variable `_shuffleMergeEnabled` to 
each shuffle dependency. You can add it as a global variable of `DAScheduler`. 
   
   And then, you don't need `setShuffleMergeEnabled()` either. You don't need 
to call `setShuffleMergeEnabled(false)` when `__mergerLocs` is empty. The 
caller can decide not to push shuffle after realizing `_mergerLocs` is empty.

##
File path: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala
##
@@ -1252,6 +1254,32 @@ private[spark] class DAGScheduler(
 execCores.map(cores => 
properties.setProperty(EXECUTOR_CORES_LOCAL_PROPERTY, cores))
   }
 
+  /**
+   * If push based shuffle is enabled, set the shuffle services to be used for 
the given
+   * shuffle map stage for block push/merge.
+   *
+   * Even with DRA kicking in and significantly reducing the number of 
available active

Review comment:
   Could you replace `DRA` with the full representation? Seems like other 
people usually call it "dynamic executor allocation". So I wonder some people 
might not get it.

##
File path: 
resource-managers/yarn/src/main/scala/org/apache/spark/scheduler/cluster/YarnSchedulerBackend.scala
##
@@ -161,6 +170,27 @@ private[spark] abstract class YarnSchedulerBackend(
 totalRegisteredExecutors.get() >= totalExpectedExecutors * 
minRegisteredRatio
   }
 
+  override def getMergerLocations(
+  numPartitions: Int,
+  resourceProfileId: Int): Seq[BlockManagerId] = {
+// Currently this is naive way of calculating numMergersNeeded for a 
stage. In future,
+// we can use better heuristics to calculate numMergersNeeded for a stage.
+val tasksPerExecutor = sc.resourceProfileManager
+.resourceProfileFromId(resourceProfileId).maxTasksPerExecutor(sc.conf)
+val numMergersNeeded = math.min(
+  math.max(1, math.ceil(numPartitions / tasksPerExecutor).toInt), 
maxNumExecutors)
+val minMergersThreshold = math.max(minMergersStaticThreshold,
+  math.floor(numMergersNeeded * minMergersThresholdRatio).toInt)

Review comment:
   > If numMergersNeeded < minMergersThreshold we would want to return the 
mergers obtained from BlockManagerMaster. 
   
   So if `numMergersNeeded=3` and `minMergersThreshold=5` and mergers returned 
from BlockManagerMaster is 100, you want to return the 100 mergers here? If so, 
it doesn't look reasonable.

##
File path: 
core/src/main/scala/org/apache/spark/storage/BlockManagerMasterEndpoint.scala
##
@@ -360,6 +371,17 @@ class BlockManagerMasterEndpoint(
 
   }
 
+  private def addMergerLocation(blockManagerId: BlockManagerId): Unit = {
+if (!shuffleMergerLocations.contains(blockManagerId.host) && 
!blockManagerId.isDriver) {
+  val shuffleServerId = BlockManagerId(blockManagerId.executorId, 
blockManagerId.host,
+StorageUtils.externalShuffleServicePort(conf))
+  if (shuffleMergerLocations.size >= maxRetainedMergerLocations) {
+shuffleMergerLocations -= shuffleMergerLocations.head._1

Review comment:
   hmm..why we need to set a threshold of the `shuffleMergerLocations`? I 
think the number of external shuffle service should be static within a 
cluster(no matter what the resource allocation mode of the application is, 
dynamic or static) and wouldn't increase unlimitedly.
   
   Besides, the

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



dongjoon-hyun commented on a change in pull request #30259:
URL: https://github.com/apache/spark/pull/30259#discussion_r517808737



##
File path: .github/workflows/build_and_test.yml
##
@@ -111,13 +111,13 @@ jobs:
 key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-${{ 
hashFiles('**/pom.xml') }}
 restore-keys: |
   ${{ matrix.java }}-${{ matrix.hadoop }}-maven-
-- name: Cache Ivy local repository
+- name: Cache Coursier local repository
   uses: actions/cache@v2
   with:
-path: ~/.ivy2/cache
-key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ 
hashFiles('**/pom.xml', '**/plugins.sbt') }}
+path: ~/.cache/coursier

Review comment:
   Thanks! @HyukjinKwon .





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on a change in pull request #30154: [SPARK-32405][SQL] Apply table options while creating tables in JDBC Table Catalog

2020-11-04 Thread GitBox



huaxingao commented on a change in pull request #30154:
URL: https://github.com/apache/spark/pull/30154#discussion_r517808712



##
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/jdbc/JDBCTableCatalog.scala
##
@@ -117,14 +117,33 @@ class JDBCTableCatalog extends TableCatalog with Logging {
 if (partitions.nonEmpty) {
   throw new UnsupportedOperationException("Cannot create JDBC table with 
partition")
 }
-// TODO (SPARK-32405): Apply table options while creating tables in JDBC 
Table Catalog
+
+var tableOptions = options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> 
getTableName(ident))
+var tableComment: String = ""
+var tableProperties: String = ""
 if (!properties.isEmpty) {
-  logWarning("Cannot create JDBC table with properties, these properties 
will be " +
-"ignored: " + properties.asScala.map { case (k, v) => s"$k=$v" 
}.mkString("[", ", ", "]"))
+  properties.asScala.map {
+case (k, v) => k match {
+  case "comment" => tableComment = v
+  case "provider" | "owner" | "location" => // provider, owner and 
location can't be set.
+  case _ => tableProperties = tableProperties + " " + s"$k=$v"
+}
+  }
 }
 
-val writeOptions = new JdbcOptionsInWrite(
-  options.parameters + (JDBCOptions.JDBC_TABLE_NAME -> 
getTableName(ident)))
+if (tableComment != "") {
+  tableOptions = tableOptions + (JDBCOptions.JDBC_TABLE_COMMENT -> 
tableComment)
+}
+if (tableProperties != "") {
+  // table property is set in JDBC_CREATE_TABLE_OPTIONS, which will be 
appended
+  // to CREATE TABLE statement.
+  // E.g., "CREATE TABLE t (name string) ENGINE=InnoDB DEFAULT 
CHARSET=utf8"

Review comment:
   seems no need for now. 





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun closed pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



dongjoon-hyun closed pull request #30258:
URL: https://github.com/apache/spark/pull/30258


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



dongjoon-hyun commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722159723


   Thank you always, @HyukjinKwon ! Merged to branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on a change in pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



HyukjinKwon commented on a change in pull request #30259:
URL: https://github.com/apache/spark/pull/30259#discussion_r517807972



##
File path: .github/workflows/build_and_test.yml
##
@@ -111,13 +111,13 @@ jobs:
 key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-${{ 
hashFiles('**/pom.xml') }}
 restore-keys: |
   ${{ matrix.java }}-${{ matrix.hadoop }}-maven-
-- name: Cache Ivy local repository
+- name: Cache Coursier local repository
   uses: actions/cache@v2
   with:
-path: ~/.ivy2/cache
-key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ 
hashFiles('**/pom.xml', '**/plugins.sbt') }}
+path: ~/.cache/coursier

Review comment:
   The directories are different in OSs (https://get-coursier.io/docs/cache)





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



dongjoon-hyun commented on a change in pull request #30259:
URL: https://github.com/apache/spark/pull/30259#discussion_r517807768



##
File path: .github/workflows/build_and_test.yml
##
@@ -111,13 +111,13 @@ jobs:
 key: ${{ matrix.java }}-${{ matrix.hadoop }}-maven-${{ 
hashFiles('**/pom.xml') }}
 restore-keys: |
   ${{ matrix.java }}-${{ matrix.hadoop }}-maven-
-- name: Cache Ivy local repository
+- name: Cache Coursier local repository
   uses: actions/cache@v2
   with:
-path: ~/.ivy2/cache
-key: ${{ matrix.java }}-${{ matrix.hadoop }}-ivy-${{ 
hashFiles('**/pom.xml', '**/plugins.sbt') }}
+path: ~/.cache/coursier

Review comment:
   Thank you, @sarutak . BTW, I cannot find this directory in my Mac. Is it 
dependent on OS?
   ```
   $ ls ~/.cache/coursier
   ls: /Users/dongjoon/.cache/coursier: No such file or directory
   ```





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



SparkQA commented on pull request #30259:
URL: https://github.com/apache/spark/pull/30259#issuecomment-722159188


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35248/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30259: [SPARK-33353][BUILD] Cache dependencies for Coursier with new sbt in GitHub Actions

2020-11-04 Thread GitBox



HyukjinKwon commented on pull request #30259:
URL: https://github.com/apache/spark/pull/30259#issuecomment-722158840


   Nice! I have been thinking about fixing it but I couldn't have a time to fix.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.

2020-11-04 Thread GitBox



HyukjinKwon commented on pull request #30242:
URL: https://github.com/apache/spark/pull/30242#issuecomment-722158599


   Yup, it is to completely fix the issue. If we fail to make a fix, we plan to 
just remove the flaky tests for now.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #30255: [SPARK-33352][CORE] Fix procedure-like declaration compilation warnings of core module in Scala 2.13

2020-11-04 Thread GitBox



AmplabJenkins removed a comment on pull request #30255:
URL: https://github.com/apache/spark/pull/30255#issuecomment-722158119







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #30255: [SPARK-33352][CORE] Fix procedure-like declaration compilation warnings of core module in Scala 2.13

2020-11-04 Thread GitBox



AmplabJenkins commented on pull request #30255:
URL: https://github.com/apache/spark/pull/30255#issuecomment-722158119







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



SparkQA commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722157722


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/35245/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #30255: [SPARK-33352][CORE] Fix procedure-like declaration compilation warnings of core module in Scala 2.13

2020-11-04 Thread GitBox



SparkQA removed a comment on pull request #30255:
URL: https://github.com/apache/spark/pull/30255#issuecomment-722110388


   **[Test build #130632 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130632/testReport)**
 for PR 30255 at commit 
[`24febfe`](https://github.com/apache/spark/commit/24febfe54465a321661de62dfe46eb0bd5eb3287).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30255: [SPARK-33352][CORE] Fix procedure-like declaration compilation warnings of core module in Scala 2.13

2020-11-04 Thread GitBox



SparkQA commented on pull request #30255:
URL: https://github.com/apache/spark/pull/30255#issuecomment-722157285


   **[Test build #130632 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130632/testReport)**
 for PR 30255 at commit 
[`24febfe`](https://github.com/apache/spark/commit/24febfe54465a321661de62dfe46eb0bd5eb3287).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LuciferYang edited a comment on pull request #30255: [SPARK-33352][CORE] Fix procedure-like declaration compilation warnings of core module in Scala 2.13

2020-11-04 Thread GitBox



LuciferYang edited a comment on pull request #30255:
URL: https://github.com/apache/spark/pull/30255#issuecomment-722155074


   cc @srowen & @dongjoon-hyun ,this pr is first part to cleanup this warnings 
, I'm not sure if I should do it now



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



dongjoon-hyun commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722156593


   Could you review this please, @HyukjinKwon ?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #30258: [SPARK-33239][INFRA][3.0] Use pre-built image at GitHub Action SparkR job

2020-11-04 Thread GitBox



dongjoon-hyun commented on pull request #30258:
URL: https://github.com/apache/spark/pull/30258#issuecomment-722156442


   SparkR passed already and `pyspark` is flaky due to another known reason.
   ![Screen Shot 2020-11-04 at 9 45 37 
PM](https://user-images.githubusercontent.com/9700541/98202539-14318380-1ee7-11eb-984a-0ef91f56fa04.png)
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #30251: [SPARK-33347][CORE]Cleanup useless variables of MutableApplicationInfo

2020-11-04 Thread GitBox



SparkQA commented on pull request #30251:
URL: https://github.com/apache/spark/pull/30251#issuecomment-722156019


   **[Test build #130639 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/130639/testReport)**
 for PR 30251 at commit 
[`cff57d8`](https://github.com/apache/spark/commit/cff57d8a2c0e384e5987f58e3b1415b5ccdd0c1a).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on pull request #30242: [SPARK-33277][PYSPARK][SQL][FOLLOW-UP] Block TaskCompletion event until the thread ends.

2020-11-04 Thread GitBox



dongjoon-hyun commented on pull request #30242:
URL: https://github.com/apache/spark/pull/30242#issuecomment-722155672


   Hi, @ueshin and @HyukjinKwon 
   Is the flakiness fixed?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 >

1 - 100 of 768 matches

Mail list logo