date:20200106

[GitHub] [spark] SparkQA removed a comment on issue #25024: [SPARK-27296][SQL] Allows Aggregator to be registered as a UDF

2020-01-06 Thread GitBox

SparkQA removed a comment on issue #25024: [SPARK-27296][SQL] Allows Aggregator 
to be registered as a UDF
URL: https://github.com/apache/spark/pull/25024#issuecomment-571249550
 
 
   **[Test build #116183 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116183/testReport)**
 for PR 25024 at commit 
[`986a3b4`](https://github.com/apache/spark/commit/986a3b45a1b90b6311c1047abdbcadc1c4d1f7d8).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage 
Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571332914
 
 
   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder-K8s/20982/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26231: [SPARK-29572][SQL] add v1 read fallback API in DS v2

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26231: [SPARK-29572][SQL] add v1 read 
fallback API in DS v2
URL: https://github.com/apache/spark/pull/26231#issuecomment-571330348
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116182/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26231: [SPARK-29572][SQL] add v1 read fallback API in DS v2

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26231: [SPARK-29572][SQL] add v1 read 
fallback API in DS v2
URL: https://github.com/apache/spark/pull/26231#issuecomment-571330331
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26231: [SPARK-29572][SQL] add v1 read fallback API in DS v2

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26231: [SPARK-29572][SQL] add v1 read 
fallback API in DS v2
URL: https://github.com/apache/spark/pull/26231#issuecomment-571330348
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116182/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27053: 
[WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571329537
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116190/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26231: [SPARK-29572][SQL] add v1 read fallback API in DS v2

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26231: [SPARK-29572][SQL] add v1 read 
fallback API in DS v2
URL: https://github.com/apache/spark/pull/26231#issuecomment-571330331
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26231: [SPARK-29572][SQL] add v1 read fallback API in DS v2

2020-01-06 Thread GitBox

SparkQA removed a comment on issue #26231: [SPARK-29572][SQL] add v1 read 
fallback API in DS v2
URL: https://github.com/apache/spark/pull/26231#issuecomment-571234526
 
 
   **[Test build #116182 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116182/testReport)**
 for PR 26231 at commit 
[`eccadc7`](https://github.com/apache/spark/commit/eccadc71d0d60a4c6acb4a6fe242f0913e0856c3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] 
Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571327718
 
 
   **[Test build #116190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116190/testReport)**
 for PR 27053 at commit 
[`8edba2d`](https://github.com/apache/spark/commit/8edba2d7ecdf5b1cdc51f27b576f2a07b05cf69c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27053: 
[WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571329524
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] 
Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571329524
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage 
Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571329510
 
 
   **[Test build #116190 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116190/testReport)**
 for PR 27053 at commit 
[`8edba2d`](https://github.com/apache/spark/commit/8edba2d7ecdf5b1cdc51f27b576f2a07b05cf69c).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26231: [SPARK-29572][SQL] add v1 read fallback API in DS v2

2020-01-06 Thread GitBox

SparkQA commented on issue #26231: [SPARK-29572][SQL] add v1 read fallback API 
in DS v2
URL: https://github.com/apache/spark/pull/26231#issuecomment-571329654
 
 
   **[Test build #116182 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116182/testReport)**
 for PR 26231 at commit 
[`eccadc7`](https://github.com/apache/spark/commit/eccadc71d0d60a4c6acb4a6fe242f0913e0856c3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] 
Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571329537
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116190/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage 
Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571327718
 
 
   **[Test build #116190 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116190/testReport)**
 for PR 27053 at commit 
[`8edba2d`](https://github.com/apache/spark/commit/8edba2d7ecdf5b1cdc51f27b576f2a07b05cf69c).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

viirya commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid 
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#discussion_r363491931
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetIOSuite.scala
 ##
 @@ -204,6 +204,23 @@ class ParquetIOSuite extends QueryTest with ParquetTest 
with SharedSparkSession
 }
   }
 
+  testStandardAndLegacyModes("array of struct") {
 
 Review comment:
   Do we have a test for array of struct of struct?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27053: 
[WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316985
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116189/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] 
Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316219
 
 
   **[Test build #116189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116189/testReport)**
 for PR 27053 at commit 
[`eece6ae`](https://github.com/apache/spark/commit/eece6ae77df9dfbf8b6c62d7f54c796bd2180175).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] 
Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316985
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116189/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage 
Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316959
 
 
   **[Test build #116189 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116189/testReport)**
 for PR 27053 at commit 
[`eece6ae`](https://github.com/apache/spark/commit/eece6ae77df9dfbf8b6c62d7f54c796bd2180175).
* This patch **fails Python style tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] 
Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316975
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27053: 
[WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316975
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

SparkQA commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage 
Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571316219
 
 
   **[Test build #116189 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116189/testReport)**
 for PR 27053 at commit 
[`eece6ae`](https://github.com/apache/spark/commit/eece6ae77df9dfbf8b6c62d7f54c796bd2180175).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26993: [SPARK-30338][SQL] Avoid 
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571313973
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20981/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26993: [SPARK-30338][SQL] Avoid 
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571313967
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage Level Scheduling code for reference

2020-01-06 Thread GitBox

tgravescs commented on issue #27053: [WIP][SPARK-27495][Core][YARN][k8s] Stage 
Level Scheduling code for reference
URL: https://github.com/apache/spark/pull/27053#issuecomment-571314271
 
 
   I added in the pyspark support


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary 
InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571313967
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary 
InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571313973
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20981/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

SparkQA commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary 
InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571313406
 
 
   **[Test build #116188 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116188/testReport)**
 for PR 26993 at commit 
[`4651b2f`](https://github.com/apache/spark/commit/4651b2fd724a56515c087903284682c9ba947c31).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] joshrosen-stripe commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

joshrosen-stripe commented on a change in pull request #26993: 
[SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#discussion_r363479947
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ##
 @@ -318,10 +318,33 @@ private[parquet] class ParquetRowConverter(
 new ParquetMapConverter(parquetType.asGroupType(), t, updater)
 
   case t: StructType =>
+val wrappedUpdater = {
+  // SPARK-30338: avoid unnecessary InternalRow copying for nested 
structs:
+  if (updater.isInstanceOf[RowUpdater]) {
+// `updater` is a RowUpdater, implying that the parent container 
is a struct.
+// We do NOT need to perform defensive copying here because either:
+//
+//   1. The path from the schema root to this field consists only 
of nested
 
 Review comment:
   Yes, that's right. After thinking about this some more, I think I've come up 
with a clearer explanation and have updated the code comment: 
https://github.com/apache/spark/pull/26993/commits/4651b2fd724a56515c087903284682c9ba947c31


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality wait time be the time since a TSM's available slots were fully utilized

2020-01-06 Thread GitBox

tgravescs commented on issue #26696: [WIP][SPARK-18886][CORE] Make locality 
wait time be the time since a TSM's available slots were fully utilized
URL: https://github.com/apache/spark/pull/26696#issuecomment-571311572
 
 
   > One remaining case that isn't handled:
   > Before any "all free resource" offer, all free resources are offered one 
by one and all not rejected.
   > This case should reset the timer, but won't with current impl.
   
   So I assume by this you mean the startup case, but I'm not sure that is 
true. You get an "all free resource" case when you first submitTasks.  
   I think there are 2 cases - static allocation and dynamic allocation. 
Generally with static you will get your executors before you start any 
application code, so it won't matter if it makes offers before that.  With 
dynamic allocation generally you won't have any executors so this perhaps is 
the case on submitTasks you offer all but there are no offers because no 
executors yet.  Which case are you referring to?
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] Resolve missing match case in SparkUncaughtExceptionHandler and added tests

2020-01-06 Thread GitBox

tinhto-000 commented on a change in pull request #26955: [SPARK-30310] [Core] 
Resolve missing match case in SparkUncaughtExceptionHandler and added tests
URL: https://github.com/apache/spark/pull/26955#discussion_r363475923
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/util/SparkUncaughtExceptionHandler.scala
 ##
 @@ -48,11 +48,17 @@ private[spark] class SparkUncaughtExceptionHandler(val 
exitOnUncaughtException:
 System.exit(SparkExitCode.OOM)
   case _ if exitOnUncaughtException =>
 System.exit(SparkExitCode.UNCAUGHT_EXCEPTION)
+  case _ =>
+// SPARK-30310: Don't System.exit() when exitOnUncaughtException 
is false
 }
   }
 } catch {
-  case oom: OutOfMemoryError => Runtime.getRuntime.halt(SparkExitCode.OOM)
-  case t: Throwable => 
Runtime.getRuntime.halt(SparkExitCode.UNCAUGHT_EXCEPTION_TWICE)
+  case oom: OutOfMemoryError =>
+logError(s"Uncaught OutOfMemoryError in thread $thread, process 
halted.", oom)
 
 Review comment:
   Thanks for the comment.  
   
   Well the reason why for the logError is because it wasn't obvious to users 
or devs why the worker would just disappeared as DEAD on the UI, and there was 
nothing in the worker log file to tell what happened.  We couldn't find out why 
until we set SPARK_NO_DAEMONIZE=1 and examined the exit code.
   
   Is there any alternative to indicate the process halted unexpectedly?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363471020
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.concurrent.TimeoutException
+import scala.concurrent.duration._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend
+import org.apache.spark.util.{RpcUtils, SerializableBuffer, ThreadUtils}
+
+class WorkerDecommissionSuite extends SparkFunSuite with LocalSparkContext {
+
+
+  override def beforeEach(): Unit = {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.Worker.WORKER_DECOMMISSION_ENABLED.key, "true")
+
+sc = new SparkContext("local-cluster[2, 1, 1024]", "test", conf)
+  }
+
+  test("verify task with no decommissioning works as expected") {
+val input = sc.parallelize(1 to 10)
+input.count()
+val sleepyRdd = input.mapPartitions{ x =>
+  Thread.sleep(100)
+  x
+}
+assert(sleepyRdd.count() === 10)
+  }
+
+  test("verify a task with all workers decommissioned succeeds") {
+val input = sc.parallelize(1 to 10)
+// Do a count to wait for the executors to be registered.
+input.count()
+val sleepyRdd = input.mapPartitions{ x =>
+  Thread.sleep(100)
+  x
+}
+// Start the task.
+val asyncCount = sleepyRdd.countAsync()
+// Give the job long enough to start.
+Thread.sleep(20)
+// Decommission all the executors, this should not halt the current task.
+// The master passing message is tested with
 
 Review comment:
   tested with?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363470570
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.concurrent.TimeoutException
+import scala.concurrent.duration._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend
+import org.apache.spark.util.{RpcUtils, SerializableBuffer, ThreadUtils}
+
+class WorkerDecommissionSuite extends SparkFunSuite with LocalSparkContext {
+
 
 Review comment:
   too many empty lines


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363473266
 
 

 ##
 File path: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/decom.sh
 ##
 @@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+set -ex
+export LOG=/dev/termination-log
 
 Review comment:
   `/dev/` looks like a very weird place for a log file.
   
   In fact, is this log file useful at all? Won't it go away as soon as the 
container stops? (I looked at the k8s page around these event handlers but it 
doesn't seem to explain where the output of these commands end up.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363470002
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/SignalUtils.scala
 ##
 @@ -60,10 +60,11 @@ private[spark] object SignalUtils extends Logging {
 if (SystemUtils.IS_OS_UNIX) {
   try {
 val handler = handlers.getOrElseUpdate(signal, {
-  logInfo("Registered signal handler for " + signal)
+  logInfo("Registering signal handler for " + signal)
   new ActionHandler(new Signal(signal))
 })
 handler.register(action)
+logInfo("Registered signal handler for " + signal)
 
 Review comment:
   This seems unnecessary. If registration fails you'll get an error message 
from the exception handler below, no? Then the previous message is enough for 
the success case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363471925
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/KubernetesConf.scala
 ##
 @@ -55,6 +55,9 @@ private[spark] abstract class KubernetesConf(val sparkConf: 
SparkConf) {
   }
   }
 
+  def workerDecommissioning: Boolean =
 
 Review comment:
   I'd avoid adding getters for simple configs.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363470718
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.concurrent.TimeoutException
+import scala.concurrent.duration._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend
+import org.apache.spark.util.{RpcUtils, SerializableBuffer, ThreadUtils}
+
+class WorkerDecommissionSuite extends SparkFunSuite with LocalSparkContext {
+
+
+  override def beforeEach(): Unit = {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.Worker.WORKER_DECOMMISSION_ENABLED.key, "true")
 
 Review comment:
   ` .set(config.Worker.WORKER_DECOMMISSION_ENABLED, true)`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363471618
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.concurrent.TimeoutException
+import scala.concurrent.duration._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend
+import org.apache.spark.util.{RpcUtils, SerializableBuffer, ThreadUtils}
+
+class WorkerDecommissionSuite extends SparkFunSuite with LocalSparkContext {
+
+
+  override def beforeEach(): Unit = {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.Worker.WORKER_DECOMMISSION_ENABLED.key, "true")
+
+sc = new SparkContext("local-cluster[2, 1, 1024]", "test", conf)
+  }
+
+  test("verify task with no decommissioning works as expected") {
+val input = sc.parallelize(1 to 10)
+input.count()
+val sleepyRdd = input.mapPartitions{ x =>
+  Thread.sleep(100)
+  x
+}
+assert(sleepyRdd.count() === 10)
+  }
+
+  test("verify a task with all workers decommissioned succeeds") {
+val input = sc.parallelize(1 to 10)
+// Do a count to wait for the executors to be registered.
+input.count()
+val sleepyRdd = input.mapPartitions{ x =>
+  Thread.sleep(100)
+  x
+}
+// Start the task.
+val asyncCount = sleepyRdd.countAsync()
+// Give the job long enough to start.
+Thread.sleep(20)
 
 Review comment:
   20ms is enough? I recommend installing a listener and waiting on the job 
start (or the first task start) event.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363472016
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ##
 @@ -192,6 +193,21 @@ private[spark] class BasicExecutorFeatureStep(
   .endResources()
 .build()
 }.getOrElse(executorContainer)
+val containerWithLifecycle = kubernetesConf.workerDecommissioning match {
+  case false =>
 
 Review comment:
   if / else for a simple boolean


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363474668
 
 

 ##
 File path: 
resource-managers/kubernetes/core/src/main/scala/org/apache/spark/deploy/k8s/features/BasicExecutorFeatureStep.scala
 ##
 @@ -192,6 +193,21 @@ private[spark] class BasicExecutorFeatureStep(
   .endResources()
 .build()
 }.getOrElse(executorContainer)
+val containerWithLifecycle = kubernetesConf.workerDecommissioning match {
+  case false =>
+logInfo("Decommissioning not enabled, skipping shutdown script")
+containerWithLimitCores
+  case true =>
+logInfo("Adding decommission script to lifecycle")
+new ContainerBuilder(containerWithLimitCores).withNewLifecycle()
+  .withNewPreStop()
 
 Review comment:
   Will this get triggered when Spark itself stops the executor (i.e. when you 
turn on dynamic allocation)? Does the code behave as it should in that case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363471411
 
 

 ##
 File path: 
core/src/test/scala/org/apache/spark/scheduler/WorkerDecommissionSuite.scala
 ##
 @@ -0,0 +1,76 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.scheduler
+
+import scala.concurrent.TimeoutException
+import scala.concurrent.duration._
+
+import org.apache.spark.{LocalSparkContext, SparkConf, SparkContext, 
SparkException, SparkFunSuite}
+import org.apache.spark.internal.config
+import org.apache.spark.scheduler.cluster.StandaloneSchedulerBackend
+import org.apache.spark.util.{RpcUtils, SerializableBuffer, ThreadUtils}
+
+class WorkerDecommissionSuite extends SparkFunSuite with LocalSparkContext {
+
+
+  override def beforeEach(): Unit = {
+val conf = new SparkConf().setAppName("test").setMaster("local")
+  .set(config.Worker.WORKER_DECOMMISSION_ENABLED.key, "true")
+
+sc = new SparkContext("local-cluster[2, 1, 1024]", "test", conf)
+  }
+
+  test("verify task with no decommissioning works as expected") {
+val input = sc.parallelize(1 to 10)
+input.count()
+val sleepyRdd = input.mapPartitions{ x =>
+  Thread.sleep(100)
+  x
+}
+assert(sleepyRdd.count() === 10)
+  }
+
+  test("verify a task with all workers decommissioned succeeds") {
+val input = sc.parallelize(1 to 10)
+// Do a count to wait for the executors to be registered.
+input.count()
+val sleepyRdd = input.mapPartitions{ x =>
+  Thread.sleep(100)
+  x
+}
+// Start the task.
+val asyncCount = sleepyRdd.countAsync()
+// Give the job long enough to start.
+Thread.sleep(20)
+// Decommission all the executors, this should not halt the current task.
+// The master passing message is tested with
+val sched = sc.schedulerBackend.asInstanceOf[StandaloneSchedulerBackend]
+val execs = sched.getExecutorIds()
+execs.foreach(execId => sched.decommissionExecutor(execId))
+val asyncCountResult = ThreadUtils.awaitResult(asyncCount, 10.seconds)
+assert(asyncCountResult === 10)
+// Try and launch task after decommissioning, this should fail
+val postDecommissioned = input.map(x => x)
+val postDecomAsyncCount = postDecommissioned.countAsync()
+val thrown = intercept[java.util.concurrent.TimeoutException]{
+  val result = ThreadUtils.awaitResult(postDecomAsyncCount, 10.seconds)
+}
 
 Review comment:
   I'd look for a better way to check this. This test will intentionally wait 
10 seconds doing nothing in the success case.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363469587
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/cluster/CoarseGrainedSchedulerBackend.scala
 ##
 @@ -402,6 +408,27 @@ class CoarseGrainedSchedulerBackend(scheduler: 
TaskSchedulerImpl, val rpcEnv: Rp
   scheduler.workerRemoved(workerId, host, message)
 }
 
+/**
+ * Mark a given executor as decommissioned and stop making resource offers 
for it.
+ */
+private def decommissionExecutor(executorId: String): Boolean = {
+  val shouldDisable = CoarseGrainedSchedulerBackend.this.synchronized {
+// Only bother decommissioning executors which are alive.
+if (isExecutorActive(executorId)) {
+  executorsPendingDecommission += executorId
 
 Review comment:
   I see you adding things to this set but didn't notice anywhere removing the 
executor from it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363467451
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/executor/CoarseGrainedExecutorBackend.scala
 ##
 @@ -140,6 +144,16 @@ private[spark] class CoarseGrainedExecutorBackend(
   if (executor == null) {
 exitExecutor(1, "Received LaunchTask command but executor was null")
   } else {
+if (decommissioned) {
+  logError("Asked to launch a task while decommissioned.")
+  driver match {
+case Some(endpoint) =>
 
 Review comment:
   I think that instead of doing this here, it should be done in `onStart` 
where the driver reference is created. That means the decommission message is 
sent to the driver as soon as possible after the signal arrives, instead of 
waiting for the driver to try to use the executor for something.
   
   (That also means this block can go away and you can just keep the log 
message in `Executor.scala`.)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363468030
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/scheduler/ExecutorLossReason.scala
 ##
 @@ -58,3 +58,11 @@ private [spark] object LossReasonPending extends 
ExecutorLossReason("Pending los
 private[spark]
 case class SlaveLost(_message: String = "Slave lost", workerLost: Boolean = 
false)
   extends ExecutorLossReason(_message)
+
+/**
+ * A loss reason that means the worker is marked for decommissioning.
 
 Review comment:
   s/worker/executor


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on a change in pull request #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on a change in pull request #26440: 
[WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & 
preemption support
URL: https://github.com/apache/spark/pull/26440#discussion_r363472397
 
 

 ##
 File path: 
resource-managers/kubernetes/docker/src/main/dockerfiles/spark/decom.sh
 ##
 @@ -0,0 +1,38 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+
+set -ex
+export LOG=/dev/termination-log
+echo "Asked to decommission" > ${LOG}
+# Find the pid to signal
+date | tee -a ${LOG}
+WORKER_PID=$(ps axf | grep java |grep 
org.apache.spark.executor.CoarseGrainedExecutorBackend | grep -v grep)
 
 Review comment:
   nit: space after `|`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-571305449
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition 
after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-571305460
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20980/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #27096: [SPARK-28148][SQL] Repartition after 
join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-571305460
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20980/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][SQL] Repartition 
after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-571305449
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] zero323 commented on a change in pull request #27109: [SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' sub-package

2020-01-06 Thread GitBox

zero323 commented on a change in pull request #27109: 
[SPARK-30434][PYTHON][SQL] Move pandas related functionalities into 'pandas' 
sub-package
URL: https://github.com/apache/spark/pull/27109#discussion_r363472878
 
 

 ##
 File path: python/pyspark/sql/dataframe.py
 ##
 @@ -31,23 +31,23 @@
 
 from pyspark import copy_func, since, _NoValue
 from pyspark.rdd import RDD, _load_from_socket, _local_iterator_from_socket, \
-ignore_unicode_prefix, PythonEvalType
-from pyspark.serializers import ArrowCollectSerializer, BatchedSerializer, 
PickleSerializer, \
+ignore_unicode_prefix
+from pyspark.serializers import BatchedSerializer, PickleSerializer, \
 UTF8Deserializer
 from pyspark.storagelevel import StorageLevel
 from pyspark.traceback_utils import SCCallSiteSync
 from pyspark.sql.types import _parse_datatype_json_string
 from pyspark.sql.column import Column, _to_seq, _to_list, _to_java_column
 from pyspark.sql.readwriter import DataFrameWriter
 from pyspark.sql.streaming import DataStreamWriter
-from pyspark.sql.types import IntegralType
 from pyspark.sql.types import *
-from pyspark.util import _exception_message
+from pyspark.sql.pandas.conversion import PandasConversionMixin
+from pyspark.sql.pandas.map_ops import PandasMapOpsMixin
 
 __all__ = ["DataFrame", "DataFrameNaFunctions", "DataFrameStatFunctions"]
 
 
-class DataFrame(object):
+class DataFrame(PandasMapOpsMixin, PandasConversionMixin):
 
 Review comment:
   In general I am trying to get a better feeling of overall purpose of such 
refactoring.
   
   As for now there is no indication that any of these mixins will be ever used 
outside the current context (`DataFrame` and `GroupedData`). That impression is 
further enforced by explicit type checks 
([here](https://github.com/apache/spark/blob/cfd78393e76f454503e7cf5416f6d56f1efffd0a/python/pyspark/sql/pandas/group_ops.py#L96)
 and 
[here](https://github.com/apache/spark/blob/cfd78393e76f454503e7cf5416f6d56f1efffd0a/python/pyspark/sql/pandas/map_ops.py#L64)).
 So that doesn't really seem like a canonical use of mixin, especially when 
base core `DataFrame` is not designed for extensiblity. 
   
   > Ah you mean API usages like:
   > 
   >df.pandas.mapInPandas(...)
   
   That's one possible approach though not the one I was thinking about. I 
assumed (though I am not sure, as the amount of code moved, excluding docs, 
message and some static stuff is negligible, and  tightly coupled with 
`DataFrame` anyway) that the point is maintainability.
   
   So possible approach is either direct 
   
   def __init__(self, ...):
   ...
   self._pandasMapOpsMixin = PandasMapOpsMixin(self)
   ...
   
   def mapInPandas(self, udf):
   return self._pandasMapOpsMixin.mapInPandas(udf)
   
   or indirect (by overwriting `__geattr__`).
   
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is not optimized away

2020-01-06 Thread GitBox

SparkQA commented on issue #27096: [SPARK-28148][SQL] Repartition after join is 
not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-571304820
 
 
   **[Test build #116187 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116187/testReport)**
 for PR 27096 at commit 
[`ff573e8`](https://github.com/apache/spark/commit/ff573e864c8271d157acd2ae3b62de5aeb03117a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27096: [SPARK-28148][CORE]: repartition after join is not optimized away

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #27096: [SPARK-28148][CORE]: 
repartition after join is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-570843855
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dbtsai commented on issue #27096: [SPARK-28148][CORE]: repartition after join is not optimized away

2020-01-06 Thread GitBox

dbtsai commented on issue #27096: [SPARK-28148][CORE]: repartition after join 
is not optimized away
URL: https://github.com/apache/spark/pull/27096#issuecomment-571303226
 
 
   Jenkins, ok to test.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26956: [SPARK-30312][SQL] Preserve 
path permission and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#issuecomment-571302669
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116181/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26956: [SPARK-30312][SQL] Preserve path 
permission and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#issuecomment-571302669
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116181/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26956: [SPARK-30312][SQL] Preserve path 
permission and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#issuecomment-571302657
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26956: [SPARK-30312][SQL] Preserve 
path permission and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#issuecomment-571302657
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-06 Thread GitBox

SparkQA removed a comment on issue #26956: [SPARK-30312][SQL] Preserve path 
permission and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#issuecomment-571213066
 
 
   **[Test build #116181 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116181/testReport)**
 for PR 26956 at commit 
[`39fe234`](https://github.com/apache/spark/commit/39fe2343dbf332f12be95de2e845a8e197a87f73).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26956: [SPARK-30312][SQL] Preserve path permission and acl when truncate table

2020-01-06 Thread GitBox

SparkQA commented on issue #26956: [SPARK-30312][SQL] Preserve path permission 
and acl when truncate table
URL: https://github.com/apache/spark/pull/26956#issuecomment-571302080
 
 
   **[Test build #116181 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116181/testReport)**
 for PR 26956 at commit 
[`39fe234`](https://github.com/apache/spark/commit/39fe2343dbf332f12be95de2e845a8e197a87f73).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26682: [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26682: [SPARK-29306][CORE] Stage 
Level Sched: Executors need to track what ResourceProfile they are created with 
URL: https://github.com/apache/spark/pull/26682#issuecomment-571299651
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: 
Executors need to track what ResourceProfile they are created with 
URL: https://github.com/apache/spark/pull/26682#issuecomment-571299651
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26682: [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26682: [SPARK-29306][CORE] Stage 
Level Sched: Executors need to track what ResourceProfile they are created with 
URL: https://github.com/apache/spark/pull/26682#issuecomment-571299660
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20979/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: 
Executors need to track what ResourceProfile they are created with 
URL: https://github.com/apache/spark/pull/26682#issuecomment-571299660
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20979/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with

2020-01-06 Thread GitBox

SparkQA commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: 
Executors need to track what ResourceProfile they are created with 
URL: https://github.com/apache/spark/pull/26682#issuecomment-571299086
 
 
   **[Test build #116186 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116186/testReport)**
 for PR 26682 at commit 
[`6cb7023`](https://github.com/apache/spark/commit/6cb7023426b6f9d0806df07b5d58867ae472a04d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

dongjoon-hyun commented on issue #24457: [SPARK-27340][SS] Alias on TimeWindow 
expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457#issuecomment-571294006
 
 
   While preparing at 2.4.5 release, I just noticed that this was closed 
recently and we might need to fix the underlying issue. The test case failed in 
both `master` and `branch-2.4`.
   
   If watermarks are ignored, the internal state grows indefinitely. How do you 
think about the reported issue, @tdas , @zsxwing , @cloud-fan , @HeartSaVioR , 
@gatorsmile?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE 
FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571284825
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20978/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #24457: [SPARK-27340][SS] Alias on 
TimeWindow expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457#issuecomment-571283979
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE 
FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571284808
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION 
should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571284825
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20978/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION 
should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571284808
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #24457: [SPARK-27340][SS] Alias on TimeWindow 
expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457#issuecomment-571284551
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

SparkQA commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do 
multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571284101
 
 
   **[Test build #116185 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116185/testReport)**
 for PR 26890 at commit 
[`d9eb441`](https://github.com/apache/spark/commit/d9eb4410d5c77234746d13132f3cad5aa6092d1d).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #24457: [SPARK-27340][SS] Alias on 
TimeWindow expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457#issuecomment-531894048
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] viirya commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

viirya commented on a change in pull request #26993: [SPARK-30338][SQL] Avoid 
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#discussion_r363451729
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetRowConverter.scala
 ##
 @@ -318,10 +318,33 @@ private[parquet] class ParquetRowConverter(
 new ParquetMapConverter(parquetType.asGroupType(), t, updater)
 
   case t: StructType =>
+val wrappedUpdater = {
+  // SPARK-30338: avoid unnecessary InternalRow copying for nested 
structs:
+  if (updater.isInstanceOf[RowUpdater]) {
+// `updater` is a RowUpdater, implying that the parent container 
is a struct.
+// We do NOT need to perform defensive copying here because either:
+//
+//   1. The path from the schema root to this field consists only 
of nested
 
 Review comment:
   When we have deeply nested struct inside an array, is it the first case here?
   
   I think it is fine because at the element converter the top level struct 
inside an array element will do the defensive copying. So in nested struct 
converter, we will see RowUpdater from parent struct so don't need defensive 
copying too.
   
   Just maybe good to also update it in the doc.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #24457: [SPARK-27340][SS] Alias on TimeWindow 
expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457#issuecomment-571283979
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

dongjoon-hyun commented on a change in pull request #24457: [SPARK-27340][SS] 
Alias on TimeWindow expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457#discussion_r363451556
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/streaming/EventTimeWatermarkSuite.scala
 ##
 @@ -591,6 +591,17 @@ class EventTimeWatermarkSuite extends StreamTest with 
BeforeAndAfter with Matche
 }
   }
 
+  test("SPARK-27340: Alias on TimeWindow expression may cause watermark 
metadata lost") {
+val inputData = MemoryStream[Int]
+val aliasWindow = inputData.toDF()
+  .withColumn("eventTime", $"value".cast("timestamp"))
+  .withWatermark("eventTime", "10 seconds")
+  .select(window($"eventTime", "5 seconds") as 'aliasWindow)
+
+assert(aliasWindow.logicalPlan.output.exists(
+_.metadata.contains(EventTimeWatermark.delayKey)))
 
 Review comment:
   Since this test case seems to fail on the master branch (as of today), the 
issue seems to exist still.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] LiangchangZ opened a new pull request #24457: [SPARK-27340][SS] Alias on TimeWindow expression may cause watermark metadata lost

2020-01-06 Thread GitBox

LiangchangZ opened a new pull request #24457: [SPARK-27340][SS] Alias on 
TimeWindow expression may cause watermark metadata lost
URL: https://github.com/apache/spark/pull/24457
 
 
   ## What changes were proposed in this pull request?
   
   `window($"fooTime", "2 seconds").alias("fooWindow")` can generate an 
expression tree  `Alias(fooWindow) <- TimeWindow`. The tree will become 
`Alias(fooWindow) <- Alias(window) <- Window(start, end)`  after analyzed by 
TimeWindowing rule. The `Alias(window)` got metadata of watermark when created:
   ```
   val windowStruct = Alias(getWindow(0, 1), WINDOW_COL_NAME)(
   exprId = windowAttr.exprId, explicitMetadata = Some(metadata))
   ``` 
   but the `Alias(fooWindow)` is created before  TimeWindowing rule effected. 
Its code path is:
   ```
   ...
   case ne: NamedExpression => Alias(expr, alias)(explicitMetadata = 
Some(ne.metadata))
   ...
   ```
   before TimeWindowing rule effected, the `ne.metadata`  is  None and cause 
the watermark metadata lost
   
   We make the `def name(alias: String)` return a `Alias` which  get metadata 
from its child automatically, when not specifying metadata explicitly.
   
   Thank @LinhongLiu for helping analyzing this problem!
   
   ## How was this patch tested?
   Add a UT and do the integration tests by run the example in jira 
successfully and do not throw org.apache.spark.sql.AnalysisException anymore
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] vanzin commented on issue #26440: [WIP][SPARK-20628][CORE][K8S] Start to improve Spark decommissioning & preemption support

2020-01-06 Thread GitBox

vanzin commented on issue #26440: [WIP][SPARK-20628][CORE][K8S] Start to 
improve Spark decommissioning & preemption support
URL: https://github.com/apache/spark/pull/26440#issuecomment-571282142
 
 
   > recomissioning is something I considered out of scope
   
   Recommissioning might be tricky, but perhaps it would be good to have a 
fail-safe for the executors to exit by themselves (or be killed by the driver) 
if the decommission doesn't really happen? At least the YARN API documentation 
leaves that behavior as a possibility.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gengliangwang commented on a change in pull request #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-06 Thread GitBox

gengliangwang commented on a change in pull request #24938: [SPARK-27946][SQL] 
Hive DDL to Spark DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#discussion_r363444748
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -196,7 +196,7 @@ statement
 | SHOW PARTITIONS multipartIdentifier partitionSpec?   
#showPartitions
 | SHOW identifier? FUNCTIONS
 (LIKE? (multipartIdentifier | pattern=STRING))?
#showFunctions
-| SHOW CREATE TABLE multipartIdentifier
#showCreateTable
+| SHOW CREATE TABLE multipartIdentifier (AS SPARK)?
#showCreateTable
 
 Review comment:
   +1. The new proposal makes more sense!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] joshrosen-stripe commented on issue #27089: [SPARK-30414][SQL] ParquetRowConverter optimizations: arrays, maps, plus misc. constant factors

2020-01-06 Thread GitBox

joshrosen-stripe commented on issue #27089: [SPARK-30414][SQL] 
ParquetRowConverter optimizations: arrays, maps, plus misc. constant factors
URL: https://github.com/apache/spark/pull/27089#issuecomment-571276738
 
 
   @cloud-fan @HyukjinKwon @dongjoon-hyun @viirya, could you take a look at 
this PR which implements several small performance optimizations in 
`ParquetRowConverter`? These changes are aimed at improving performance when 
scanning very wide datasets with large numbers of columns, plus datasets with 
small maps and arrays. These changes are complementary but orthogonal to the 
changes in #26967.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] tgravescs commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: Executors need to track what ResourceProfile they are created with

2020-01-06 Thread GitBox

tgravescs commented on issue #26682: [SPARK-29306][CORE] Stage Level Sched: 
Executors need to track what ResourceProfile they are created with 
URL: https://github.com/apache/spark/pull/26682#issuecomment-571276795
 
 
   looks like new test added that I wasn't upmerged to, upmerging and looking 
at failure


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-06 Thread GitBox

gatorsmile commented on issue #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL 
conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#issuecomment-571276597
 
 
   cc @viirya Sorry for the late reply. Are you fine to address the above 
comment?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] gatorsmile commented on a change in pull request #24938: [SPARK-27946][SQL] Hive DDL to Spark DDL conversion USING "show create table"

2020-01-06 Thread GitBox

gatorsmile commented on a change in pull request #24938: [SPARK-27946][SQL] 
Hive DDL to Spark DDL conversion USING "show create table"
URL: https://github.com/apache/spark/pull/24938#discussion_r363444039
 
 

 ##
 File path: 
sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4
 ##
 @@ -196,7 +196,7 @@ statement
 | SHOW PARTITIONS multipartIdentifier partitionSpec?   
#showPartitions
 | SHOW identifier? FUNCTIONS
 (LIKE? (multipartIdentifier | pattern=STRING))?
#showFunctions
-| SHOW CREATE TABLE multipartIdentifier
#showCreateTable
+| SHOW CREATE TABLE multipartIdentifier (AS SPARK)?
#showCreateTable
 
 Review comment:
   After rethinking it, let us make it more aggressive here. Instead of 
creating Spark native tables for the existing Hive serde tables, we can try to 
always show how to create Spark native tables if possible. This will further 
simplify the migration from Hive to Spark.
   
   To the existing Spark users who prefer to keeping Hive serde formats, we can 
introduce a new option `AS SERDE` which will keep the behaviors in Spark 2.4 or 
prior. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE 
FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571275165
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116184/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] joshrosen-stripe edited a comment on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

joshrosen-stripe edited a comment on issue #26993: [SPARK-30338][SQL] Avoid 
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571273025
 
 
   @cloud-fan @dongjoon-hyun @viirya, could you take a look at this PR 
optimizing nested struct handling in `ParquetRowConverter`? I'm tagging this 
group because it looks like you've all helped to review recent changes to this 
file and I'd like some more eyes on this change.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

SparkQA removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION 
should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571274416
 
 
   **[Test build #116184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116184/testReport)**
 for PR 26890 at commit 
[`3796a3a`](https://github.com/apache/spark/commit/3796a3a2388c04e3896a57973a133fc481a64578).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE 
FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571275154
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

SparkQA commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do 
multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571275144
 
 
   **[Test build #116184 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116184/testReport)**
 for PR 26890 at commit 
[`3796a3a`](https://github.com/apache/spark/commit/3796a3a2388c04e3896a57973a133fc481a64578).
* This patch **fails Scala style tests**.
* This patch merges cleanly.
* This patch adds the following public classes _(experimental)_:
 * `case class CreateFunctionStatement(`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION 
should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571275154
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION 
should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571275165
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116184/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

SparkQA commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do 
multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571274416
 
 
   **[Test build #116184 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/116184/testReport)**
 for PR 26890 at commit 
[`3796a3a`](https://github.com/apache/spark/commit/3796a3a2388c04e3896a57973a133fc481a64578).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] joshrosen-stripe commented on issue #26993: [SPARK-30338][SQL] Avoid unnecessary InternalRow copies in ParquetRowConverter

2020-01-06 Thread GitBox

joshrosen-stripe commented on issue #26993: [SPARK-30338][SQL] Avoid 
unnecessary InternalRow copies in ParquetRowConverter
URL: https://github.com/apache/spark/pull/26993#issuecomment-571273025
 
 
   @cloud-fan @dongjoon-hyun @viirya, could you take a look at this PR 
optimizing nested struct handling in `ParquetRowConverter`? I'm tagging this 
group because it looks like you've all helped to review recent changes to this 
file.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #22721: 
[SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty
URL: https://github.com/apache/spark/pull/22721#issuecomment-571272335
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116179/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #22721: 
[SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty
URL: https://github.com/apache/spark/pull/22721#issuecomment-571272321
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] 
Refresh the table even table stats is empty
URL: https://github.com/apache/spark/pull/22721#issuecomment-571272321
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] Refresh the table even table stats is empty

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #22721: [SPARK-19784][SPARK-25403][SQL] 
Refresh the table even table stats is empty
URL: https://github.com/apache/spark/pull/22721#issuecomment-571272335
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/116179/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE 
FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571271761
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins removed a comment on issue #26890: [SPARK-30039][SQL] CREATE 
FUNCTION should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571271770
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/20977/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION should do multi-catalog resolution

2020-01-06 Thread GitBox

AmplabJenkins commented on issue #26890: [SPARK-30039][SQL] CREATE FUNCTION 
should do multi-catalog resolution
URL: https://github.com/apache/spark/pull/26890#issuecomment-571271761
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

< 2 3 4 5 6 7 8 9 10 11 >

601 - 700 of 1405 matches

Mail list logo