date:20200604

[GitHub] [spark] karuppayya closed pull request #28686: [SPARK-31877][SQL]Avoid stats computation for Hive table

2020-06-04 Thread GitBox



karuppayya closed pull request #28686:
URL: https://github.com/apache/spark/pull/28686


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on a change in pull request #28723: [SPARK-28624][SQL][TESTS][3.0] Run date.sql via Thrift Server

2020-06-04 Thread GitBox



cloud-fan commented on a change in pull request #28723:
URL: https://github.com/apache/spark/pull/28723#discussion_r435695747



##
File path: 
sql/hive-thriftserver/src/test/scala/org/apache/spark/sql/hive/thriftserver/ThriftServerQueryTestSuite.scala
##
@@ -68,8 +68,6 @@ class ThriftServerQueryTestSuite extends SQLQueryTestSuite 
with SharedThriftServ
 // Missing UDF
 "postgreSQL/boolean.sql",
 "postgreSQL/case.sql",
-// SPARK-28624
-"date.sql",

Review comment:
   thriftserver doesn't support negative year, I think we still need to 
ignore this test.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] karuppayya commented on pull request #28662: [SPARK-31850][SQL]Prevent DetermineTableStats from computing stats multiple times for same table

2020-06-04 Thread GitBox



karuppayya commented on pull request #28662:
URL: https://github.com/apache/spark/pull/28662#issuecomment-639264681


   The above condition is already present.
   But we return a **copy** of relation(code: 
https://github.com/apache/spark/blob/master/sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveStrategies.scala#L137)
 with the updated Table Stats at the end of the method
   - When ResolvedAggregateFunction rule runs again(to achieve Fixed point), it 
will not be aware of the updated relation. `executeWithSameContext` with rerun 
the Stats collection as part of DetermineTableStats rule.
   - When the DetermineTableStats rule actually runs as part of Analysis phase, 
it will not be aware of the updated relation
   @viirya  



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



HyukjinKwon commented on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639252018


   Merged to master and branch-3.0.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



HyukjinKwon closed pull request #28732:
URL: https://github.com/apache/spark/pull/28732


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639250427







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639250427







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639243418


   **[Test build #123549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123549/testReport)**
 for PR 28732 at commit 
[`f27f080`](https://github.com/apache/spark/commit/f27f0802fb12575f8eb7cef4dcaca9e0e01c88d0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



SparkQA commented on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639250134


   **[Test build #123549 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123549/testReport)**
 for PR 28732 at commit 
[`f27f080`](https://github.com/apache/spark/commit/f27f0802fb12575f8eb7cef4dcaca9e0e01c88d0).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28645: [SPARK-31826][SQL] Support composed type of case class for typed Scala UDF

2020-06-04 Thread GitBox



SparkQA commented on pull request #28645:
URL: https://github.com/apache/spark/pull/28645#issuecomment-639244877


   **[Test build #123550 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123550/testReport)**
 for PR 28645 at commit 
[`86035fa`](https://github.com/apache/spark/commit/86035fa42edbb847419c22bd7b37cf8bd8234b60).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



HyukjinKwon edited a comment on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639243858


   Merged to master and branch-3.0. I don't mind porting it back if anyone 
needs. I didn't here just because there's a conflict, and it's just a matter of 
monitoring.
   
   I will leave it to you @ueshin :D.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



HyukjinKwon commented on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639243858


   Merged to master and branch-3.0. I don't mind porting it back if anyone 
needs. I didn't here just because there's a conflict, and it's just a matter of 
monitoring.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639243664







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639243664







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



SparkQA commented on pull request #28732:
URL: https://github.com/apache/spark/pull/28732#issuecomment-639243418


   **[Test build #123549 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123549/testReport)**
 for PR 28732 at commit 
[`f27f080`](https://github.com/apache/spark/commit/f27f0802fb12575f8eb7cef4dcaca9e0e01c88d0).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



HyukjinKwon closed pull request #28730:
URL: https://github.com/apache/spark/pull/28730


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon opened a new pull request #28732: [MINOR][PYTHON] Add one more newline between JVM and Python tracebacks

2020-06-04 Thread GitBox



HyukjinKwon opened a new pull request #28732:
URL: https://github.com/apache/spark/pull/28732


   ### What changes were proposed in this pull request?
   
   This PR proposes to add one more newline to clearly separate JVM and Python 
tracebacks:
   
   Before:
   
   ```
   Traceback (most recent call last):
 ...
   pyspark.sql.utils.AnalysisException: Reference 'column' is ambiguous, could 
be: column, column.;
   JVM stacktrace:
   org.apache.spark.sql.AnalysisException: Reference 'column' is ambiguous, 
could be: column, column.;
 ...
   ```
   
   After:
   
   ```
   Traceback (most recent call last):
 ...
   pyspark.sql.utils.AnalysisException: Reference 'column' is ambiguous, could 
be: column, column.;
   
   JVM stacktrace:
   org.apache.spark.sql.AnalysisException: Reference 'column' is ambiguous, 
could be: column, column.;
 ...
   ```
   
   This is kind of a followup of 
https://github.com/apache/spark/commit/e69466056fb2c121b7bbb6ad082f09deb1c41063 
(SPARK-31849).
   
   ### Why are the changes needed?
   
   To make it easier to read.
   
   ### Does this PR introduce _any_ user-facing change?
   
   It's in the unreleased branches.
   
   ### How was this patch tested?
   
   Manually tested.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-04 Thread GitBox



HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639239790


   My alternative with wrapping state store is something like below:
   
   ```
 class RowValidatingStateStore(
 underlying: StateStore,
 keyType: Seq[DataType],
 valueType: Seq[DataType]) extends StateStore {
   private var isValidated = false
   
   override def get(key: UnsafeRow): UnsafeRow = {
 val value = underlying.get(key)
 if (!isValidated) {
   validateRow(value, valueType)
   isValidated = true
 }
 value
   }
   
   override def id: StateStoreId = underlying.id
   override def version: Long = underlying.version
   override def put(key: UnsafeRow, value: UnsafeRow): Unit = 
underlying.put(key, value)
   override def remove(key: UnsafeRow): Unit = underlying.remove(key)
   override def commit(): Long = underlying.commit()
   override def abort(): Unit = underlying.abort()
   override def iterator(): Iterator[UnsafeRowPair] = underlying.iterator()
   override def metrics: StateStoreMetrics = underlying.metrics
   override def hasCommitted: Boolean = underlying.hasCommitted
   
   private def validateRow(row: UnsafeRow, rowDataType: Seq[DataType]): 
Unit = {
 // TODO: call util method with row and data type to validate - note 
that it can only check with value schema
   }
 }
   
 def get(...): StateStore = {
   require(version >= 0)
   val storeProvider = loadedProviders.synchronized {
 ...
   }
   // TODO: add if statement to see whether it should wrap state store or 
not
   new RowValidatingStateStore(storeProvider.getStore(version, keySchema, 
valueSchema))
 }
   ```
   
   The example code only checks in get operation, which is insufficient to 
check "key" row in state. That said, iterator approach still provides more 
possibility of validation, though the validation of unsafe row itself doesn't 
have enough coverage of checking various incompatibility issues (Definitely we 
should have another guards as well) so that's a sort of OK to only cover value 
side.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-04 Thread GitBox



HeartSaVioR commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639239790


   My alternative with wrapping state store is something like below:
   
   ```
 class RowValidatingStateStore(
 underlying: StateStore,
 keyType: Seq[DataType],
 valueType: Seq[DataType]) extends StateStore {
   private var isValidated = false
   
   override def get(key: UnsafeRow): UnsafeRow = {
 val value = underlying.get(key)
 if (!isValidated) {
   validateRow(value)
   isValidated = true
 }
 value
   }
   
   override def id: StateStoreId = underlying.id
   override def version: Long = underlying.version
   override def put(key: UnsafeRow, value: UnsafeRow): Unit = 
underlying.put(key, value)
   override def remove(key: UnsafeRow): Unit = underlying.remove(key)
   override def commit(): Long = underlying.commit()
   override def abort(): Unit = underlying.abort()
   override def iterator(): Iterator[UnsafeRowPair] = underlying.iterator()
   override def metrics: StateStoreMetrics = underlying.metrics
   override def hasCommitted: Boolean = underlying.hasCommitted
   
   private def validateRow(row: UnsafeRow): Unit = {
 // TODO: call util method with row and schema to validate
   }
 }
   
 def get(...): StateStore = {
   require(version >= 0)
   val storeProvider = loadedProviders.synchronized {
 ...
   }
   // TODO: add if statement to see whether it should wrap state store or 
not
   new RowValidatingStateStore(storeProvider.getStore(version, keySchema, 
valueSchema))
 }
   ```
   
   The example code only checks in get operation, which is insufficient to 
check "key" row in state. That said, iterator approach still provides more 
possibility of validation, though the validation of unsafe row itself doesn't 
have enough coverage of checking various incompatibility issues (Definitely we 
should have another guards as well) so that's a sort of OK to only cover value 
side.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR edited a comment on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-04 Thread GitBox



HeartSaVioR edited a comment on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639200645


   > @HeartSaVioR After taking a further look. Instead of dealing with the 
iterator, how about adding the invalidation for all state store operations in 
StateStoreProvider? Since we can get the key/value row during load map. WDYT?
   
   It would be nice to see the proposed change by code to avoid 
misunderstanding, like I proposed in previous comment. (anything including 
commit in your fork or text comment is OK) I'll try out my alternative 
(wrapping State Store) and show the code change. Thanks!
   
   EDIT: Please deal with interface whenever possible - there're different 
implementations of state store providers and we should avoid sticking to the 
specific implementation.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639226292







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639226292







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639151923


   **[Test build #123547 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123547/testReport)**
 for PR 28730 at commit 
[`5705e15`](https://github.com/apache/spark/commit/5705e1523f108e66afcf266c066615503a98a7cb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



SparkQA commented on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639225813


   **[Test build #123547 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123547/testReport)**
 for PR 28730 at commit 
[`5705e15`](https://github.com/apache/spark/commit/5705e1523f108e66afcf266c066615503a98a7cb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



holdenk commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639222935


   The K8s test failure appears unrelated (`- Run in client mode. *** FAILED 
***`) we don't do anything with the tokens. I'll investigate more tomorrow.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639222847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639222847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-63971


   **[Test build #123548 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123548/testReport)**
 for PR 28708 at commit 
[`60bec89`](https://github.com/apache/spark/commit/60bec89a67253ec823d4497bc3eef8bbc30b7949).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639183660


   **[Test build #123548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123548/testReport)**
 for PR 28708 at commit 
[`60bec89`](https://github.com/apache/spark/commit/60bec89a67253ec823d4497bc3eef8bbc30b7949).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott commented on pull request #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-06-04 Thread GitBox



bmarcott commented on pull request #27096:
URL: https://github.com/apache/spark/pull/27096#issuecomment-639221995


   @cloud-fan @viirya could you help review or add a suggested reviewer here?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon edited a comment on pull request #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests failed on Python 3.6 and macOS High Sierra

2020-06-04 Thread GitBox



HyukjinKwon edited a comment on pull request #22480:
URL: https://github.com/apache/spark/pull/22480#issuecomment-639212453


   @pquentin, yes, it's kind of difficult to avoid in PySpark side for now. The 
problem isn't solely because we use `fork()` but it binds to other conditions. 
I didn't take a very close look at that time but the error was thrown when a 
particular instance is pickled in the forked process.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on pull request #22480: [SPARK-25473][PYTHON][SS][TEST] ForeachWriter tests failed on Python 3.6 and macOS High Sierra

2020-06-04 Thread GitBox



HyukjinKwon commented on pull request #22480:
URL: https://github.com/apache/spark/pull/22480#issuecomment-639212453


   @pquentin, yes, it's kind of difficult to avoid in PySpark side for now. The 
problem isn't solely because we use `fork()` but it binds to other conditions. 
I didn't take a very close look at that time but the error was thrown when a 
particular instance is pickled.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639202750


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/28172/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639202742


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639202723


   Kubernetes integration test status failure
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28172/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639202742







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



holdenk commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639201319


   > > > So @attilapiros looking at the Jenkins console logs we aren't leaking 
any threads during testing (nor would I expect us to). But I'll add something 
to more aggressively stop the shuffle migration threads.
   > > 
   > > 
   > > It will come when the `BlockManager` will be tested in 
`BlockManagerSuite`:
   > > ```
   > >  = POSSIBLE THREAD LEAK IN SUITE o.a.s.storage.BlockManagerSuite, 
thread names: rpc-boss-3-1, migrate-shuffle-to-BlockManagerId(exec2, localhost, 
50804, None), shuffle-boss-9-1  , shuffle-boss-6-1 =
   > > ```
   > 
   > Gotcha was looking for the explicit decom test. I'll eagerly shutdown the 
migrate-shuffle-to threads then.
   
   I think the latest changes have fixed this (e.g. `grep "THREAD LEAK" 
consoleFull  |grep BlockManager` returns nothing). Worth noting  we do leak 
threads in ~283 tests so I'm not sure how important this is.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HeartSaVioR commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-04 Thread GitBox



HeartSaVioR commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639200645


   > @HeartSaVioR After taking a further look. Instead of dealing with the 
iterator, how about adding the invalidation for all state store operations in 
StateStoreProvider? Since we can get the key/value row during load map. WDYT?
   
   It would be nice to see the proposed change by code to avoid 
misunderstanding, like I proposed in previous comment. (anything including 
commit in your fork or text comment is OK) I'll try out my alternative 
(wrapping State Store) and show the code change. Thanks!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639197112







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639197112







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639149691


   **[Test build #123546 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123546/testReport)**
 for PR 28720 at commit 
[`87e1d67`](https://github.com/apache/spark/commit/87e1d67c5be394ae514e38a50958f88ecc721287).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



SparkQA commented on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639196496


   **[Test build #123546 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123546/testReport)**
 for PR 28720 at commit 
[`87e1d67`](https://github.com/apache/spark/commit/87e1d67c5be394ae514e38a50958f88ecc721287).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639195695


   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/28172/
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639187847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639187847







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] skambha commented on pull request #28707: [SPARK-31894][SS] Introduce UnsafeRow format validation for streaming state store

2020-06-04 Thread GitBox



skambha commented on pull request #28707:
URL: https://github.com/apache/spark/pull/28707#issuecomment-639187545


   > @skambha You can check the integrated tests in #28725. If we delete the 
validation, we'll get a NPE for [this 
test](https://github.com/apache/spark/pull/28725/files#diff-492f0d70824a58ef2ea94a54dc6f9707R79),
 and get an assertion in the unsafe row for [this 
test](https://github.com/apache/spark/pull/28725/files#diff-492f0d70824a58ef2ea94a54dc6f9707R185).
 That is to say, we will get random failures during reusing the checkpoint 
written by the old Spark version.
   
   Thanks for adding the test. 



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] github-actions[bot] commented on pull request #27172: [WIP] [SPARK-29644][SQL] Fixed ByteType JDBCUtils to map to TinyInt at write read and ShortType on read

2020-06-04 Thread GitBox



github-actions[bot] commented on pull request #27172:
URL: https://github.com/apache/spark/pull/27172#issuecomment-639187219


   We're closing this PR because it hasn't been updated in a while. This isn't 
a judgement on the merit of the PR in any way. It's just a way of keeping the 
PR queue manageable.
   If you'd like to revive this PR, please reopen it and ask a committer to 
remove the Stale tag!



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639134384


   **[Test build #123545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123545/testReport)**
 for PR 28710 at commit 
[`7b01b63`](https://github.com/apache/spark/commit/7b01b63f9ce6549eaf248296b0d48e98a2dd7a25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



SparkQA commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639187095


   **[Test build #123545 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123545/testReport)**
 for PR 28710 at commit 
[`7b01b63`](https://github.com/apache/spark/commit/7b01b63f9ce6549eaf248296b0d48e98a2dd7a25).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639183660


   **[Test build #123548 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123548/testReport)**
 for PR 28708 at commit 
[`60bec89`](https://github.com/apache/spark/commit/60bec89a67253ec823d4497bc3eef8bbc30b7949).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] siknezevic commented on pull request #27246: [SPARK-30536][CORE][SQL] Sort-merge join operator spilling performance improvements

2020-06-04 Thread GitBox



siknezevic commented on pull request #27246:
URL: https://github.com/apache/spark/pull/27246#issuecomment-639172725


   Thank you for the comments. I will addressed them soon



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] jacobwu123 opened a new pull request #28731: [SPARK-31909][CORE] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-04 Thread GitBox



jacobwu123 opened a new pull request #28731:
URL: https://github.com/apache/spark/pull/28731


   
   
   ### What changes were proposed in this pull request?
   
   Added the SPARK_SUBMIT_OPTS environment available to beeline.
   
   ### Why are the changes needed?
   
   The beeline is not able to pick up the krb5.conf variable specified in the 
SPARK_SUBMIT_OPTS, located in spark_env.sh. 
   
   ### Does this PR introduce _any_ user-facing change?
   
   No.
   
   ### How was this patch tested?
   
   ./dev/run-tests
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28731: [SPARK-31909][CORE] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28731:
URL: https://github.com/apache/spark/pull/28731#issuecomment-639158213


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28731: [SPARK-31909][CORE] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28731:
URL: https://github.com/apache/spark/pull/28731#issuecomment-639158594


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28731: [SPARK-31909][CORE] Add SPARK_SUBMIT_OPTS to Beeline Script

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28731:
URL: https://github.com/apache/spark/pull/28731#issuecomment-639158213


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639152306







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639152306







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28730: [SPARK-31903][SQL][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



SparkQA commented on pull request #28730:
URL: https://github.com/apache/spark/pull/28730#issuecomment-639151923


   **[Test build #123547 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123547/testReport)**
 for PR 28730 at commit 
[`5705e15`](https://github.com/apache/spark/commit/5705e1523f108e66afcf266c066615503a98a7cb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] ueshin opened a new pull request #28730: [SPARK-31903][PYSPARK][R] Fix toPandas with Arrow enabled to show metrics in Query UI.

2020-06-04 Thread GitBox



ueshin opened a new pull request #28730:
URL: https://github.com/apache/spark/pull/28730


   ### What changes were proposed in this pull request?
   
   In `Dataset.collectAsArrowToR` and `Dataset.collectAsArrowToPython`, since 
the code block for `serveToStream` is run in the separate thread, `withAction` 
finishes as soon as it starts the thread. As a result, it doesn't collect the 
metrics of the actual action and Query UI shows the plan graph without metrics.
   
   We should call `serveToStream` first, then `withAction` in it.
   
   ### Why are the changes needed?
   
   When calling toPandas, usually Query UI shows each plan node's metric and 
corresponding Stage ID and Task ID:
   
   ```py
   >>> df = spark.createDataFrame([(1, 10, 'abc'), (2, 20, 'def')], 
schema=['x', 'y', 'z'])
   >>> df.toPandas()
  x   yz
   0  1  10  abc
   1  2  20  def
   ```
   
   ![Screen Shot 2020-06-03 at 4 47 07 
PM](https://user-images.githubusercontent.com/506656/83815735-bec22380-a675-11ea-8ecc-bf2954731f35.png)
   
   but if Arrow execution is enabled, it shows only plan nodes and the duration 
is not correct:
   
   ```py
   >>> spark.conf.set('spark.sql.execution.arrow.pyspark.enabled', True)
   >>> df.toPandas()
  x   yz
   0  1  10  abc
   1  2  20  def
   ```
   
   ![Screen Shot 2020-06-03 at 4 47 27 
PM](https://user-images.githubusercontent.com/506656/83815804-de594c00-a675-11ea-933a-d0ffc0f534dd.png)
   
   ### Does this PR introduce _any_ user-facing change?
   
   Yes, the Query UI will show the plan with the correct metrics.
   
   ### How was this patch tested?
   
   I checked it manually in my local.
   
   ![Screen Shot 2020-06-04 at 3 19 41 
PM](https://user-images.githubusercontent.com/506656/83816265-d77f0900-a676-11ea-84b8-2a8d80428bc6.png)
   
   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



SparkQA commented on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639149691


   **[Test build #123546 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123546/testReport)**
 for PR 28720 at commit 
[`87e1d67`](https://github.com/apache/spark/commit/87e1d67c5be394ae514e38a50958f88ecc721287).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639147791







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-638456255


   Can one of the admins verify this patch?



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639147791







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on pull request #28720: [SPARK-31900][SPARK-SUBMIT] Client memory passed unvalidated to the JVM Xmx

2020-06-04 Thread GitBox



gatorsmile commented on pull request #28720:
URL: https://github.com/apache/spark/pull/28720#issuecomment-639147436


   ok to test



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



holdenk commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r435575853



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1790,6 +1822,108 @@ private[spark] class BlockManager(
 }
   }
 
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(Int, Long)] = None
+  val storageLevel = StorageLevel(
+useDisk = true,
+useMemory = false,
+useOffHeap = false,
+deserialized = false,
+replication = 1)
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+while (running) {
+  val migrating = Option(shufflesToMigrate.poll())
+  migrating match {
+case None =>
+  logInfo("Nothing to migrate")
+  // Nothing to do right now, but maybe a transfer will fail or a 
new block
+  // will finish being committed.
+  val SLEEP_TIME_SECS = 1
+  Thread.sleep(SLEEP_TIME_SECS * 1000L)
+case Some((shuffleId, mapId)) =>
+  logInfo(s"Trying to migrate shuffle ${shuffleId},${mapId} to 
${peer}")
+  val blocks =
+migratableResolver.getMigrationBlocks(shuffleId, mapId)
+  logInfo(s"Got migration sub-blocks ${blocks}")
+  blocks.foreach { case (blockId, buffer) =>
+logInfo(s"Migrating sub-block ${blockId}")
+blockTransferService.uploadBlockSync(
+  peer.host,
+  peer.port,
+  peer.executorId,
+  blockId,
+  buffer,
+  storageLevel,
+  null)// class tag, we don't need for shuffle
+logInfo(s"Migrated sub block ${blockId}")
+  }
+  logInfo(s"Migrated ${shuffleId},${mapId} to ${peer}")
+  }
+}
+// This catch is intentionally outside of the while running block.
+// if we encounter errors migrating to an executor we want to stop.
+  } catch {
+case e: Exception =>
+  migrating match {
+case Some(shuffleMap) =>
+  logError("Error ${e} during migration, adding ${shuffleMap} back 
to migration queue")
+  shufflesToMigrate.add(shuffleMap)
+case None =>
+  logError(s"Error ${e} while waiting for block to migrate")
+  }
+  }
+}
+  }
+
+  private val migrationPeers = mutable.HashMap[BlockManagerId, 
ShuffleMigrationRunnable]()
+
+  /**
+   * Tries to offload all shuffle blocks that are registered with the shuffle 
service locally.
+   * Note: this does not delete the shuffle files in-case there is an 
in-progress fetch
+   * but rather shadows them.
+   * Requires an Indexed based shuffle resolver.
+   */
+  def offloadShuffleBlocks(): Unit = {
+// Update the queue of shuffles to be migrated
+logInfo("Offloading shuffle blocks")
+val localShuffles = migratableResolver.getStoredShuffles()
+logInfo(s"My local shuffles are ${localShuffles.toList}")
+val newShufflesToMigrate = localShuffles.&~(migratingShuffles).toSeq

Review comment:
   This is for computing the change needed, readability isn't a big concern.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



holdenk commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r435575539



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1790,6 +1822,108 @@ private[spark] class BlockManager(
 }
   }
 
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(Int, Long)] = None
+  val storageLevel = StorageLevel(
+useDisk = true,
+useMemory = false,
+useOffHeap = false,
+deserialized = false,
+replication = 1)
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+while (running) {
+  val migrating = Option(shufflesToMigrate.poll())
+  migrating match {
+case None =>
+  logInfo("Nothing to migrate")
+  // Nothing to do right now, but maybe a transfer will fail or a 
new block
+  // will finish being committed.
+  val SLEEP_TIME_SECS = 1
+  Thread.sleep(SLEEP_TIME_SECS * 1000L)
+case Some((shuffleId, mapId)) =>
+  logInfo(s"Trying to migrate shuffle ${shuffleId},${mapId} to 
${peer}")
+  val blocks =
+migratableResolver.getMigrationBlocks(shuffleId, mapId)
+  logInfo(s"Got migration sub-blocks ${blocks}")
+  blocks.foreach { case (blockId, buffer) =>
+logInfo(s"Migrating sub-block ${blockId}")
+blockTransferService.uploadBlockSync(
+  peer.host,
+  peer.port,
+  peer.executorId,
+  blockId,
+  buffer,
+  storageLevel,
+  null)// class tag, we don't need for shuffle
+logInfo(s"Migrated sub block ${blockId}")
+  }
+  logInfo(s"Migrated ${shuffleId},${mapId} to ${peer}")
+  }
+}
+// This catch is intentionally outside of the while running block.
+// if we encounter errors migrating to an executor we want to stop.
+  } catch {
+case e: Exception =>
+  migrating match {
+case Some(shuffleMap) =>
+  logError("Error ${e} during migration, adding ${shuffleMap} back 
to migration queue")
+  shufflesToMigrate.add(shuffleMap)
+case None =>
+  logError(s"Error ${e} while waiting for block to migrate")
+  }
+  }
+}
+  }
+
+  private val migrationPeers = mutable.HashMap[BlockManagerId, 
ShuffleMigrationRunnable]()
+
+  /**
+   * Tries to offload all shuffle blocks that are registered with the shuffle 
service locally.
+   * Note: this does not delete the shuffle files in-case there is an 
in-progress fetch
+   * but rather shadows them.
+   * Requires an Indexed based shuffle resolver.
+   */
+  def offloadShuffleBlocks(): Unit = {
+// Update the queue of shuffles to be migrated
+logInfo("Offloading shuffle blocks")
+val localShuffles = migratableResolver.getStoredShuffles()
+logInfo(s"My local shuffles are ${localShuffles.toList}")

Review comment:
   Looking at it not I think I'll just take it out, was useful while I was 
doing dev but shouldn't be needed for any operations stuff. Good call on it 
maybe being too long in production environments.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



holdenk commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r435575094



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1790,6 +1822,108 @@ private[spark] class BlockManager(
 }
   }
 
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(Int, Long)] = None
+  val storageLevel = StorageLevel(
+useDisk = true,
+useMemory = false,
+useOffHeap = false,
+deserialized = false,
+replication = 1)
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+while (running) {
+  val migrating = Option(shufflesToMigrate.poll())
+  migrating match {
+case None =>
+  logInfo("Nothing to migrate")
+  // Nothing to do right now, but maybe a transfer will fail or a 
new block
+  // will finish being committed.
+  val SLEEP_TIME_SECS = 1
+  Thread.sleep(SLEEP_TIME_SECS * 1000L)
+case Some((shuffleId, mapId)) =>
+  logInfo(s"Trying to migrate shuffle ${shuffleId},${mapId} to 
${peer}")
+  val blocks =
+migratableResolver.getMigrationBlocks(shuffleId, mapId)
+  logInfo(s"Got migration sub-blocks ${blocks}")
+  blocks.foreach { case (blockId, buffer) =>
+logInfo(s"Migrating sub-block ${blockId}")
+blockTransferService.uploadBlockSync(
+  peer.host,
+  peer.port,
+  peer.executorId,
+  blockId,
+  buffer,
+  storageLevel,
+  null)// class tag, we don't need for shuffle
+logInfo(s"Migrated sub block ${blockId}")
+  }
+  logInfo(s"Migrated ${shuffleId},${mapId} to ${peer}")
+  }
+}
+// This catch is intentionally outside of the while running block.
+// if we encounter errors migrating to an executor we want to stop.
+  } catch {
+case e: Exception =>
+  migrating match {
+case Some(shuffleMap) =>
+  logError("Error ${e} during migration, adding ${shuffleMap} back 
to migration queue")
+  shufflesToMigrate.add(shuffleMap)
+case None =>
+  logError(s"Error ${e} while waiting for block to migrate")
+  }
+  }
+}
+  }
+
+  private val migrationPeers = mutable.HashMap[BlockManagerId, 
ShuffleMigrationRunnable]()
+
+  /**
+   * Tries to offload all shuffle blocks that are registered with the shuffle 
service locally.
+   * Note: this does not delete the shuffle files in-case there is an 
in-progress fetch
+   * but rather shadows them.
+   * Requires an Indexed based shuffle resolver.
+   */
+  def offloadShuffleBlocks(): Unit = {
+// Update the queue of shuffles to be migrated
+logInfo("Offloading shuffle blocks")
+val localShuffles = migratableResolver.getStoredShuffles()

Review comment:
   No, if we get a class cast exception we want to bubble it up because 
there isn't anything we can do in that situation besides report it.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639137871


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123544/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639137856


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639122853


   **[Test build #123544 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123544/testReport)**
 for PR 28729 at commit 
[`3c35cf5`](https://github.com/apache/spark/commit/3c35cf5920c6e4216adcefc866bd518dfe635def).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639137856







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



SparkQA commented on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639137804


   **[Test build #123544 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123544/testReport)**
 for PR 28729 at commit 
[`3c35cf5`](https://github.com/apache/spark/commit/3c35cf5920c6e4216adcefc866bd518dfe635def).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-639137018







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-639137018







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-638992217


   **[Test build #123538 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123538/testReport)**
 for PR 28728 at commit 
[`d7fc6d9`](https://github.com/apache/spark/commit/d7fc6d9db1244f681066415b14e798820fc6f61e).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



SparkQA commented on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-639136125


   **[Test build #123538 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123538/testReport)**
 for PR 28728 at commit 
[`d7fc6d9`](https://github.com/apache/spark/commit/d7fc6d9db1244f681066415b14e798820fc6f61e).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639134843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639134843







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



SparkQA commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639134384


   **[Test build #123545 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123545/testReport)**
 for PR 28710 at commit 
[`7b01b63`](https://github.com/apache/spark/commit/7b01b63f9ce6549eaf248296b0d48e98a2dd7a25).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639123368







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639123368







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28729: [SPARK-30808][SQL] Enable Java 8 time API in Thrift server

2020-06-04 Thread GitBox



SparkQA commented on pull request #28729:
URL: https://github.com/apache/spark/pull/28729#issuecomment-639122853


   **[Test build #123544 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123544/testReport)**
 for PR 28729 at commit 
[`3c35cf5`](https://github.com/apache/spark/commit/3c35cf5920c6e4216adcefc866bd518dfe635def).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] holdenk commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



holdenk commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r43809



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1790,6 +1822,108 @@ private[spark] class BlockManager(
 }
   }
 
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(Int, Long)] = None
+  val storageLevel = StorageLevel(
+useDisk = true,
+useMemory = false,
+useOffHeap = false,
+deserialized = false,
+replication = 1)
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+while (running) {
+  val migrating = Option(shufflesToMigrate.poll())
+  migrating match {
+case None =>
+  logInfo("Nothing to migrate")
+  // Nothing to do right now, but maybe a transfer will fail or a 
new block
+  // will finish being committed.
+  val SLEEP_TIME_SECS = 1
+  Thread.sleep(SLEEP_TIME_SECS * 1000L)
+case Some((shuffleId, mapId)) =>
+  logInfo(s"Trying to migrate shuffle ${shuffleId},${mapId} to 
${peer}")
+  val blocks =
+migratableResolver.getMigrationBlocks(shuffleId, mapId)
+  logInfo(s"Got migration sub-blocks ${blocks}")
+  blocks.foreach { case (blockId, buffer) =>
+logInfo(s"Migrating sub-block ${blockId}")
+blockTransferService.uploadBlockSync(
+  peer.host,
+  peer.port,
+  peer.executorId,
+  blockId,
+  buffer,
+  storageLevel,
+  null)// class tag, we don't need for shuffle
+logInfo(s"Migrated sub block ${blockId}")
+  }
+  logInfo(s"Migrated ${shuffleId},${mapId} to ${peer}")

Review comment:
   We don't delete the file from the current host right away. Once the 
BlockUpdate message is processed on the master it will go to the peer it has 
been migrated to.





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639118857







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639118857







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639026461


   **[Test build #123542 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123542/testReport)**
 for PR 28708 at commit 
[`a904030`](https://github.com/apache/spark/commit/a904030d78ca9ad1e6da8de0359758cce8d58abb).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639117314


   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/123543/
   Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



SparkQA commented on pull request #28708:
URL: https://github.com/apache/spark/pull/28708#issuecomment-639117770


   **[Test build #123542 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123542/testReport)**
 for PR 28708 at commit 
[`a904030`](https://github.com/apache/spark/commit/a904030d78ca9ad1e6da8de0359758cce8d58abb).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639117310


   Merged build finished. Test FAILed.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639116348


   **[Test build #123543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123543/testReport)**
 for PR 28710 at commit 
[`f42fa8e`](https://github.com/apache/spark/commit/f42fa8e58d88dcee0107af4459103b4b5e9d9d18).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639116837







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



SparkQA commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639117298


   **[Test build #123543 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123543/testReport)**
 for PR 28710 at commit 
[`f42fa8e`](https://github.com/apache/spark/commit/f42fa8e58d88dcee0107af4459103b4b5e9d9d18).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `class _ClassificationSummary(JavaWrapper):`
 * `class _TrainingSummary(JavaWrapper):`
 * `class _BinaryClassificationSummary(_ClassificationSummary):`
 * `class LogisticRegressionSummary(_ClassificationSummary):`
 * `class LogisticRegressionTrainingSummary(LogisticRegressionSummary, 
_TrainingSummary):`
 * `class BinaryLogisticRegressionSummary(_BinaryClassificationSummary,`



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639117310







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639116837







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28710: [SPARK-31893][ML] Add a generic ClassificationSummary trait

2020-06-04 Thread GitBox



SparkQA commented on pull request #28710:
URL: https://github.com/apache/spark/pull/28710#issuecomment-639116348


   **[Test build #123543 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123543/testReport)**
 for PR 28710 at commit 
[`f42fa8e`](https://github.com/apache/spark/commit/f42fa8e58d88dcee0107af4459103b4b5e9d9d18).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



AmplabJenkins removed a comment on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-639112128







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



AmplabJenkins commented on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-639112128







This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



SparkQA removed a comment on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-638970542


   **[Test build #123537 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123537/testReport)**
 for PR 28728 at commit 
[`7671d96`](https://github.com/apache/spark/commit/7671d963215465dcc27dd69df966eba3bab2acea).



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on pull request #28728: [SPARK-31879][SQL][test-java11] Make week-based pattern invalid for formatting too

2020-06-04 Thread GitBox



SparkQA commented on pull request #28728:
URL: https://github.com/apache/spark/pull/28728#issuecomment-639111050


   **[Test build #123537 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/123537/testReport)**
 for PR 28728 at commit 
[`7671d96`](https://github.com/apache/spark/commit/7671d963215465dcc27dd69df966eba3bab2acea).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #28708: [SPARK-20629][CORE][K8S] Copy shuffle data when nodes are being shutdown

2020-06-04 Thread GitBox



viirya commented on a change in pull request #28708:
URL: https://github.com/apache/spark/pull/28708#discussion_r435522183



##
File path: core/src/main/scala/org/apache/spark/storage/BlockManager.scala
##
@@ -1790,6 +1822,108 @@ private[spark] class BlockManager(
 }
   }
 
+  private class ShuffleMigrationRunnable(peer: BlockManagerId) extends 
Runnable {
+@volatile var running = true
+override def run(): Unit = {
+  var migrating: Option[(Int, Long)] = None
+  val storageLevel = StorageLevel(
+useDisk = true,
+useMemory = false,
+useOffHeap = false,
+deserialized = false,
+replication = 1)
+  logInfo(s"Starting migration thread for ${peer}")
+  // Once a block fails to transfer to an executor stop trying to transfer 
more blocks
+  try {
+while (running) {
+  val migrating = Option(shufflesToMigrate.poll())
+  migrating match {
+case None =>
+  logInfo("Nothing to migrate")
+  // Nothing to do right now, but maybe a transfer will fail or a 
new block
+  // will finish being committed.
+  val SLEEP_TIME_SECS = 1
+  Thread.sleep(SLEEP_TIME_SECS * 1000L)
+case Some((shuffleId, mapId)) =>
+  logInfo(s"Trying to migrate shuffle ${shuffleId},${mapId} to 
${peer}")
+  val blocks =
+migratableResolver.getMigrationBlocks(shuffleId, mapId)
+  logInfo(s"Got migration sub-blocks ${blocks}")
+  blocks.foreach { case (blockId, buffer) =>
+logInfo(s"Migrating sub-block ${blockId}")
+blockTransferService.uploadBlockSync(
+  peer.host,
+  peer.port,
+  peer.executorId,
+  blockId,
+  buffer,
+  storageLevel,
+  null)// class tag, we don't need for shuffle
+logInfo(s"Migrated sub block ${blockId}")
+  }
+  logInfo(s"Migrated ${shuffleId},${mapId} to ${peer}")

Review comment:
   Once the block was migrated to peer here, does it exist both on peer and 
in current block manager at the same time? If so, the request for the shuffle 
block will go to peer or current block manager before the current one is 
decommissioned?





This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 >

1 - 100 of 362 matches

Mail list logo