date:20200413

[GitHub] [spark] MaxGekk commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

MaxGekk commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code 
comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613241629
 
 
   @cloud-fan This is just changes in comments. It can be merged if the build 
succeed? I just want to avoid rebasing of my other PRs that might conflicts 
with this PR. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28200: 
[SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#discussion_r407884363
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2939,6 +2939,10 @@ To specify a different configuration directory other 
than the default "SPARK_HOM
 you can set SPARK_CONF_DIR. Spark will use the configuration files 
(spark-defaults.conf, spark-env.sh, log4j.properties, etc)
 from this directory.
 
+# Overriding classpath default directory
+
+To specify a different classpath directory other than the default 
"SPARK_HOME/jars"  you can set SPARK_JARS_DIR. Any of the services under 
"SPARK_HOME/sbin" folder will use it.
 
 Review comment:
   Like the PR title have `bin` and `sbin`, this PR changes the standalone 
ExternalShuffleService like the following. We had better mention that.
   ```
   bin/spark-class org.apache.spark.deploy.ExternalShuffleService
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28200: 
[SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#discussion_r407884363
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2939,6 +2939,10 @@ To specify a different configuration directory other 
than the default "SPARK_HOM
 you can set SPARK_CONF_DIR. Spark will use the configuration files 
(spark-defaults.conf, spark-env.sh, log4j.properties, etc)
 from this directory.
 
+# Overriding classpath default directory
+
+To specify a different classpath directory other than the default 
"SPARK_HOME/jars"  you can set SPARK_JARS_DIR. Any of the services under 
"SPARK_HOME/sbin" folder will use it.
 
 Review comment:
   Like the PR title have `bin` and `sbin`, this PR changes the standalone 
ExternalShuffleService like the following. We had better mention that together.
   ```
   bin/spark-class org.apache.spark.deploy.ExternalShuffleService
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28200: 
[SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#discussion_r407883142
 
 

 ##
 File path: 
launcher/src/test/java/org/apache/spark/launcher/CommandBuilderUtilsSuite.java
 ##
 @@ -99,6 +99,11 @@ public void testJavaMajorVersion() {
 assertEquals(10, javaMajorVersion("10"));
   }
 
+  @Test
+  public void testGetJarsDir() {
+assertEquals("/spark/home/jars", getJarsDir("/spark/home"));
 
 Review comment:
   Hi, @marblejenka .
   This seems to add a half of the test coverage.
   Since this PR's key contribution is to introduce `ENV_SPARK_JARS_DIR`, shall 
we have a test coverage for `ENV_SPARK_JARS_DIR`, too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28200: 
[SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#discussion_r407881833
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2939,6 +2939,10 @@ To specify a different configuration directory other 
than the default "SPARK_HOM
 you can set SPARK_CONF_DIR. Spark will use the configuration files 
(spark-defaults.conf, spark-env.sh, log4j.properties, etc)
 from this directory.
 
+# Overriding classpath default directory
+
+To specify a different classpath directory other than the default 
"SPARK_HOME/jars"  you can set SPARK_JARS_DIR. Any of the services under 
"SPARK_HOME/sbin" folder will use it.
 
 Review comment:
   Why do we limit to only `sbin` here? Is there a reason?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28200: 
[SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#discussion_r407881833
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2939,6 +2939,10 @@ To specify a different configuration directory other 
than the default "SPARK_HOM
 you can set SPARK_CONF_DIR. Spark will use the configuration files 
(spark-defaults.conf, spark-env.sh, log4j.properties, etc)
 from this directory.
 
+# Overriding classpath default directory
+
+To specify a different classpath directory other than the default 
"SPARK_HOME/jars"  you can set SPARK_JARS_DIR. Any of the services under 
"SPARK_HOME/sbin" folder will use it.
 
 Review comment:
   Why do we limit to only `sbin` here?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28200: 
[SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#discussion_r407881093
 
 

 ##
 File path: docs/configuration.md
 ##
 @@ -2939,6 +2939,10 @@ To specify a different configuration directory other 
than the default "SPARK_HOM
 you can set SPARK_CONF_DIR. Spark will use the configuration files 
(spark-defaults.conf, spark-env.sh, log4j.properties, etc)
 from this directory.
 
+# Overriding classpath default directory
+
+To specify a different classpath directory other than the default 
"SPARK_HOME/jars"  you can set SPARK_JARS_DIR. Any of the services under 
"SPARK_HOME/sbin" folder will use it.
 
 Review comment:
   nit, `"  you` -> `",  you`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

dongjoon-hyun commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts 
should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613237532
 
 
   Thank you for pinging me, @maropu .


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28192: [SPARK-31420][WEBUI] Infinite timeline redraw in job details page

2020-04-13 Thread GitBox

dongjoon-hyun commented on issue #28192: [SPARK-31420][WEBUI] Infinite timeline 
redraw in job details page
URL: https://github.com/apache/spark/pull/28192#issuecomment-613237036
 
 
   +1 for @gengliangwang 's opinion for backporting.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan edited a comment on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

cloud-fan edited a comment on issue #28189: [SPARK-31426][SQL] Fix perf 
regressions of toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189#issuecomment-613236857
 
 
   Sorry I didn't run tests when backporting as it only has a very small 
conflict. I've fixed it in branch-3.0 after local tests pass. Thanks for 
catching @dongjoon-hyun !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of 
toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189#issuecomment-613236857
 
 
   Sorry I didn't run tests when backporting as it only has a very small 
conflict. I've fixed it in branch-3.0 after local tests pass.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun edited a comment on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts

2020-04-13 Thread GitBox

dongjoon-hyun edited a comment on issue #27364: [SPARK-31394][K8S] Adds support 
for Kubernetes NFS volume mounts
URL: https://github.com/apache/spark/pull/27364#issuecomment-613236056
 
 
   If you are using non-root, that's good. The above link shows NFS mount is 
not working in case of non-root users. So, I assumed that you might use root to 
avoid that. I also don't want to have assumption like a root user.
   - https://discuss.kubernetes.io/t/nfs-mount-is-not-working/6458
   
   Then, could you describe how to verify your PR in the following section of 
the PR description by example with `Minikube`?
   ```
   ## How was this patch tested?
   
   Test cases were added just like the existing EmptyDir support.
   Also the code were ran against my Kubernetes and NFS using pyspark.
   ```
   
   As you know, Apache Spark is using Minikube as the official test environment 
instead of GCE or EKS. Also, without additional configuration like 
SecurityContext, this PR doesn't work in the vanilla `Minikube`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #27364: [SPARK-31394][K8S] Adds support for Kubernetes NFS volume mounts

2020-04-13 Thread GitBox

dongjoon-hyun commented on issue #27364: [SPARK-31394][K8S] Adds support for 
Kubernetes NFS volume mounts
URL: https://github.com/apache/spark/pull/27364#issuecomment-613236056
 
 
   If you are using non-root, that's good. The above link shows NFS mount is 
not working in case of non-root users. So, I assumed that you might use root to 
avoid that. I also don't want to root user.
   - https://discuss.kubernetes.io/t/nfs-mount-is-not-working/6458
   
   Then, could you describe how to verify your PR in the following section of 
the PR description by example with `Minikube`?
   ```
   ## How was this patch tested?
   
   Test cases were added just like the existing EmptyDir support.
   Also the code were ran against my Kubernetes and NFS using pyspark.
   ```
   
   As you know, Apache Spark is using Minikube as the official test environment 
instead of GCE or EKS. Also, without additional configuration like 
SecurityContext, this PR doesn't work in the vanilla `Minikube`.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function 
support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613235282
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613235287
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25942/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function 
support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613235287
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25942/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613235282
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613234916
 
 
   updated, Thank you very much @maropu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613234973
 
 
   **[Test build #121255 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121255/testReport)**
 for PR 28106 at commit 
[`0fc69ca`](https://github.com/apache/spark/commit/0fc69ca4eb2948307e87798dceee137719d1).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] 
Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189#discussion_r407874752
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -172,17 +176,8 @@ object DateTimeUtils {
* @return The number of micros since epoch from `java.sql.Timestamp`.
*/
   def fromJavaTimestamp(t: Timestamp): SQLTimestamp = {
-val era = if (t.before(julianCommonEraStart)) 0 else 1
-val localDateTime = LocalDateTime.of(
-  t.getYear + 1900, t.getMonth + 1, 1,
-  t.getHours, t.getMinutes, t.getSeconds, t.getNanos)
-  .`with`(ChronoField.ERA, era)
-  // Add days separately to convert dates existed in Julian calendar but 
not
-  // in Proleptic Gregorian calendar. For example, 1000-02-29 is valid date
-  // in Julian calendar because 1000 is a leap year but 1000 is not a leap
-  // year in Proleptic Gregorian calendar. And 1000-02-29 doesn't exist in 
it.
-  .plusDays(t.getDate - 1) // Returns the next valid date after 
`date.getDate - 1` days
-instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
+val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % 
MICROS_PER_MILLIS
 
 Review comment:
   This seems to be missing in `branch-3.0`.
   ```
   $ git grep millisToMicros
   
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala:
 
   val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % 
MICROS_PER_MILLIS
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

dongjoon-hyun commented on a change in pull request #28189: [SPARK-31426][SQL] 
Fix perf regressions of toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189#discussion_r407874752
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -172,17 +176,8 @@ object DateTimeUtils {
* @return The number of micros since epoch from `java.sql.Timestamp`.
*/
   def fromJavaTimestamp(t: Timestamp): SQLTimestamp = {
-val era = if (t.before(julianCommonEraStart)) 0 else 1
-val localDateTime = LocalDateTime.of(
-  t.getYear + 1900, t.getMonth + 1, 1,
-  t.getHours, t.getMinutes, t.getSeconds, t.getNanos)
-  .`with`(ChronoField.ERA, era)
-  // Add days separately to convert dates existed in Julian calendar but 
not
-  // in Proleptic Gregorian calendar. For example, 1000-02-29 is valid date
-  // in Julian calendar because 1000 is a leap year but 1000 is not a leap
-  // year in Proleptic Gregorian calendar. And 1000-02-29 doesn't exist in 
it.
-  .plusDays(t.getDate - 1) // Returns the next valid date after 
`date.getDate - 1` days
-instantToMicros(localDateTime.atZone(ZoneId.systemDefault).toInstant)
+val micros = millisToMicros(t.getTime) + (t.getNanos / NANOS_PER_MICROS) % 
MICROS_PER_MILLIS
 
 Review comment:
   This seems to be the root cause.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] dongjoon-hyun commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

dongjoon-hyun commented on issue #28189: [SPARK-31426][SQL] Fix perf 
regressions of toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189#issuecomment-613232736
 
 
   Hi, @cloud-fan .
   This seems to break `branch-3.0`. Could you take a look at that once more?
   ```
   [ERROR] [Error] 
/home/runner/work/spark/spark/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala:181:
 not found: value millisToMicros
   700
   [ERROR] one error found
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613232614
 
 
   No more comment now except for the existing comments. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#discussion_r407873456
 
 

 ##
 File path: 
sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/expressions/MiscExpressionsSuite.scala
 ##
 @@ -89,4 +90,29 @@ class MiscExpressionsSuite extends SparkFunSuite with 
ExpressionEvalHelper {
 assert(outputCodegen.contains(s"Result of $inputExpr is 1"))
 assert(outputEval.contains(s"Result of $inputExpr is 1"))
   }
+
+  test("try expression") {
+intercept[RuntimeException] {
 
 Review comment:
   nit: could you check the error message, too, just in case?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#discussion_r407872264
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ##
 @@ -198,3 +201,85 @@ case class TypeOf(child: Expression) extends 
UnaryExpression {
 defineCodeGen(ctx, ev, _ => 
s"""UTF8String.fromString(${child.dataType.catalogString})""")
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+_FUNC_(expr) - Evaluate an expression and handle certain types of runtime 
exceptions by returning NULL.
+In cases where it is preferable that queries produce NULL instead of 
failing when corrupt or invalid data is encountered, the TRY function may be 
useful, especially when ANSI mode is on and the users need null-tolerant on 
certain columns or outputs.
 
 Review comment:
   nit: "the TRY function" ->  "the `_FUNC_` function"?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#discussion_r407872797
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/misc.scala
 ##
 @@ -198,3 +201,85 @@ case class TypeOf(child: Expression) extends 
UnaryExpression {
 defineCodeGen(ctx, ev, _ => 
s"""UTF8String.fromString(${child.dataType.catalogString})""")
   }
 }
+
+// scalastyle:off line.size.limit
+@ExpressionDescription(
+  usage = """
+_FUNC_(expr) - Evaluate an expression and handle certain types of runtime 
exceptions by returning NULL.
+In cases where it is preferable that queries produce NULL instead of 
failing when corrupt or invalid data is encountered, the TRY function may be 
useful, especially when ANSI mode is on and the users need null-tolerant on 
certain columns or outputs.
+AnalysisExceptions will not be handled by this, typically runtime 
exceptions handled by TRY function are:
+
+  * ArightmeticException - e.g. division by zero, numeric value out of 
range,
+  * NumberFormatException - e.g. invalid casting,
+  * IllegalArgumentException - e.g. invalid datetime pattern, missing 
format argument for string formatting,
+  * DateTimeException - e.g. invalid datetime values
+  * UnsupportedEncodingException - e.g. encode or decode string with 
invalid charset
 
 Review comment:
   Thanks for the update.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] 
Refine code comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613231464
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28199: [SPARK-31402][SQL][FOLLOWUP] 
Refine code comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613231468
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25941/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine 
code comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613231468
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25941/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine 
code comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613231464
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan should apply AQE on child plan of write commands

2020-04-13 Thread GitBox

cloud-fan commented on issue #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan 
should apply AQE on child plan of write commands
URL: https://github.com/apache/spark/pull/27701#issuecomment-613231273
 
 
   thanks, merging to master/3.0!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #27701: [SPARK-30953][SQL] InsertAdaptiveSparkPlan should apply AQE on child plan of write commands

2020-04-13 Thread GitBox

cloud-fan closed pull request #27701: [SPARK-30953][SQL] 
InsertAdaptiveSparkPlan should apply AQE on child plan of write commands
URL: https://github.com/apache/spark/pull/27701
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

SparkQA commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code 
comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613231147
 
 
   **[Test build #121254 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121254/testReport)**
 for PR 28199 at commit 
[`bd34313`](https://github.com/apache/spark/commit/bd34313b4b45a5a0998cc6075a4b2596086d2560).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613229646
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28106: [SPARK-31335][SQL] Add try 
function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613229656
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25940/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code comments in RebaseDateTime

2020-04-13 Thread GitBox

cloud-fan commented on issue #28199: [SPARK-31402][SQL][FOLLOWUP] Refine code 
comments in RebaseDateTime 
URL: https://github.com/apache/spark/pull/28199#issuecomment-613229841
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function 
support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613229646
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support 
duplicated column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613229157
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121250/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28106: [SPARK-31335][SQL] Add try function 
support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613229656
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25940/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support 
duplicated column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613229154
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated 
column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613229157
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121250/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

SparkQA removed a comment on issue #28210: [SPARK-31441] Support duplicated 
column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613221968
 
 
   **[Test build #121250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121250/testReport)**
 for PR 28210 at commit 
[`8ecdf33`](https://github.com/apache/spark/commit/8ecdf33aab595c00f6ef7b5094e802d2e6c4c934).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated 
column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613229154
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

SparkQA commented on issue #28106: [SPARK-31335][SQL] Add try function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613229275
 
 
   **[Test build #121253 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121253/testReport)**
 for PR 28106 at commit 
[`bc59b6c`](https://github.com/apache/spark/commit/bc59b6cfd505e8cff98ecdbf4070e9072fead3db).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column 
names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613228943
 
 
   **[Test build #121250 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121250/testReport)**
 for PR 28210 at commit 
[`8ecdf33`](https://github.com/apache/spark/commit/8ecdf33aab595c00f6ef7b5094e802d2e6c4c934).
* This patch **fails PySpark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column 
names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613228942
 
 
   Should we backport this and SPARK-31186, @viirya and @ueshin?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca…

2020-04-13 Thread GitBox

cloud-fan closed pull request #28198: [SPARK-31392][SQL][3.0]  Support 
CalendarInterval to be reflect to Ca…
URL: https://github.com/apache/spark/pull/28198
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon closed pull request #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

HyukjinKwon closed pull request #28210: [SPARK-31441] Support duplicated column 
names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

maropu commented on issue #28106: [SPARK-31335][SQL] Add try function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613229093
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

HyukjinKwon commented on issue #28210: [SPARK-31441] Support duplicated column 
names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613228761
 
 
   The last comment was comment only. Merged to master and branch-3.0.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca…

2020-04-13 Thread GitBox

cloud-fan commented on issue #28198: [SPARK-31392][SQL][3.0]  Support 
CalendarInterval to be reflect to Ca…
URL: https://github.com/apache/spark/pull/28198#issuecomment-613228680
 
 
   thanks, merging to 3.0!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] 
Ensure all tests in SQLMetricsSuite run with both codegen on and off
URL: https://github.com/apache/spark/pull/28173#discussion_r407869788
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
 ##
 @@ -214,14 +214,16 @@ trait SQLMetricsTestUtils extends SQLTestUtils {
   protected def testSparkPlanMetrics(
   df: DataFrame,
   expectedNumOfJobs: Int,
-  expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = {
+  expectedMetrics: Map[Long, (String, Map[String, Any])],
+  enableWholeStage: Boolean = false): Unit = {
 val expectedMetricsPredicates = expectedMetrics.mapValues { case 
(nodeName, nodeMetrics) =>
   (nodeName, nodeMetrics.mapValues(expectedMetricValue =>
 (actualMetricValue: Any) => {
   actualMetricValue.toString.matches(expectedMetricValue.toString)
 }))
 }
-testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, 
expectedMetricsPredicates)
+testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, 
expectedMetricsPredicates,
 
 Review comment:
   Ah, the same comment there 
https://github.com/apache/spark/pull/28173#discussion_r406877479
   Could you check this, @sririshindra 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28123: [SPARK-31350][SQL] 
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407868316
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketInJoin extends Rule[LogicalPlan]  {
+  val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets"
+
+  private val sqlConf = SQLConf.get
+
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _)
+  if r.bucketSpec.nonEmpty && 
!r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  (large - small) <= 
sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addBucketHint(plan: LogicalPlan, hint: (String, String)): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) =>
+l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession))
 
 Review comment:
   ~How about using `PlanLater` there?~ Ah, I see. We need to think more about 
this...


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-13 Thread GitBox

turboFei edited a comment on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix 
the issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-613226194
 
 
   > IIUC, only one task attempt should commit at a time. If so, will there 
still be conflicts with task speculation when the first task commit 
successfully ?
   
   Got, It seems that we need judge whether the commit operation performed in 
SparkHadoopMapRedUtil.commitTask.
   Then we need revert the performed commit status if exception happened when 
renaming dynacmicStagingTaskFiles.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of fromJavaDate

2020-04-13 Thread GitBox

cloud-fan commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of 
fromJavaDate
URL: https://github.com/apache/spark/pull/28205#issuecomment-613227589
 
 
   can we fix the conflicts?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28123: [SPARK-31350][SQL] 
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407868316
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketInJoin extends Rule[LogicalPlan]  {
+  val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets"
+
+  private val sqlConf = SQLConf.get
+
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _)
+  if r.bucketSpec.nonEmpty && 
!r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  (large - small) <= 
sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addBucketHint(plan: LogicalPlan, hint: (String, String)): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) =>
+l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession))
 
 Review comment:
   How about using `PlanLater` there?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28123: [SPARK-31350][SQL] 
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407867656
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketInJoin extends Rule[LogicalPlan]  {
+  val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets"
+
+  private val sqlConf = SQLConf.get
+
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _)
+  if r.bucketSpec.nonEmpty && 
!r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  (large - small) <= 
sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addBucketHint(plan: LogicalPlan, hint: (String, String)): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) =>
+l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession))
+}
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (!sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_ENABLED)) {
+  return plan
+}
+
+plan transform {
+  case join: Join if join.joinType == Inner =>
 
 Review comment:
   Yea, I also think we can support all join types except for full outer joins.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] turboFei commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-13 Thread GitBox

turboFei commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the 
issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-613226194
 
 
   > IIUC, only one task attempt should commit at a time. If so, will there 
still be conflicts with task speculation when the first task commit 
successfully ?
   
   Got, It seems that we need judge whether the commit operation performed in 
SparkHadoopMapRedUtil.commitTask.
   Then we need revert the performed commit status if some exception happened 
when renaming dynacmicStagingTaskFiles.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-13 Thread GitBox

imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] 
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407866696
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketInJoin extends Rule[LogicalPlan]  {
+  val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets"
+
+  private val sqlConf = SQLConf.get
+
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _)
+  if r.bucketSpec.nonEmpty && 
!r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  (large - small) <= 
sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addBucketHint(plan: LogicalPlan, hint: (String, String)): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) =>
+l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession))
 
 Review comment:
   > You cannot extract a join plan pattern, e.g., 
`Join(LogicalRelation(HadoopFsRelation), LogicalRelation(HadoopFsRelation))`, 
just like `ScanOperation`?
   
   If I match a `Join` in `FileSourceStrategy`, I need to return `SparkPlan` 
for `Join`, no? 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and o

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28173: [SPARK-31389][SQL][TESTS] 
Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run 
with both codegen on and off
URL: https://github.com/apache/spark/pull/28173#discussion_r407866066
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/metric/SQLMetricsTestUtils.scala
 ##
 @@ -214,14 +214,16 @@ trait SQLMetricsTestUtils extends SQLTestUtils {
   protected def testSparkPlanMetrics(
   df: DataFrame,
   expectedNumOfJobs: Int,
-  expectedMetrics: Map[Long, (String, Map[String, Any])]): Unit = {
+  expectedMetrics: Map[Long, (String, Map[String, Any])],
+  enableWholeStage: Boolean = false): Unit = {
 val expectedMetricsPredicates = expectedMetrics.mapValues { case 
(nodeName, nodeMetrics) =>
   (nodeName, nodeMetrics.mapValues(expectedMetricValue =>
 (actualMetricValue: Any) => {
   actualMetricValue.toString.matches(expectedMetricValue.toString)
 }))
 }
-testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, 
expectedMetricsPredicates)
+testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, 
expectedMetricsPredicates,
 
 Review comment:
   Is it not enough just to run tests here with both modes: codegen and 
non-codegen?
   ```
   Seq(true, false).foreach { codegenEnabled =>
 testSparkPlanMetricsWithPredicates(df, expectedNumOfJobs, 
expectedMetricsPredicates, codegenEnabled)
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan closed pull request #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

cloud-fan closed pull request #28189: [SPARK-31426][SQL] Fix perf regressions 
of toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of toJavaTimestamp/fromJavaTimestamp

2020-04-13 Thread GitBox

cloud-fan commented on issue #28189: [SPARK-31426][SQL] Fix perf regressions of 
toJavaTimestamp/fromJavaTimestamp
URL: https://github.com/apache/spark/pull/28189#issuecomment-613224495
 
 
   thanks, merging to master/3.0!


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts 
should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613224073
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts 
should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613224081
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121241/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin 
scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613224081
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121241/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] 
Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run 
with both codegen on and off
URL: https://github.com/apache/spark/pull/28173#issuecomment-613224026
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25939/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28173: [SPARK-31389][SQL][TESTS] 
Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run 
with both codegen on and off
URL: https://github.com/apache/spark/pull/28173#issuecomment-613224019
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin 
scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613224073
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all 
tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both 
codegen on and off
URL: https://github.com/apache/spark/pull/28173#issuecomment-613224019
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all 
tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both 
codegen on and off
URL: https://github.com/apache/spark/pull/28173#issuecomment-613224026
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25939/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer

2020-04-13 Thread GitBox

SparkQA commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove 
requirement to launch a task to reset locality wait timer
URL: https://github.com/apache/spark/pull/28188#issuecomment-613223702
 
 
   **[Test build #121251 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121251/testReport)**
 for PR 28188 at commit 
[`e22caed`](https://github.com/apache/spark/commit/e22caed2136b21a3694b94cb68d713d37aa52fa5).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

SparkQA commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests 
in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen 
on and off
URL: https://github.com/apache/spark/pull/28173#issuecomment-613223712
 
 
   **[Test build #121252 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121252/testReport)**
 for PR 28173 at commit 
[`8ac7fe5`](https://github.com/apache/spark/commit/8ac7fe54a56ea06035dff901a81e93eb8b5ded6f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] bmarcott commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer

2020-04-13 Thread GitBox

bmarcott commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove 
requirement to launch a task to reset locality wait timer
URL: https://github.com/apache/spark/pull/28188#issuecomment-613223634
 
 
   Meanwhile..
   I added a test case which matches that scenario you described in the other 
PR.
   
   I changed the code such that launching a task is still required to reset the 
timer, but not launching a task does not prevent you from resetting on a follow 
up offer. 
   This passes the new test and previous ones.
   
   It seems better to remove the requirement to launch a task, but I'm ok with 
this since it isn't worse than legacy locality wait.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

SparkQA removed a comment on issue #28200: [SPARK-31432][DEPLOY] bin/sbin 
scripts should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613176864
 
 
   **[Test build #121241 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121241/testReport)**
 for PR 28200 at commit 
[`a128f01`](https://github.com/apache/spark/commit/a128f0100a23a7a9ad7f6133b1c7704c824270ba).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts should allow to customize jars dir

2020-04-13 Thread GitBox

SparkQA commented on issue #28200: [SPARK-31432][DEPLOY] bin/sbin scripts 
should allow to customize jars dir
URL: https://github.com/apache/spark/pull/28200#issuecomment-613223580
 
 
   **[Test build #121241 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121241/testReport)**
 for PR 28200 at commit 
[`a128f01`](https://github.com/apache/spark/commit/a128f0100a23a7a9ad7f6133b1c7704c824270ba).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on and off

2020-04-13 Thread GitBox

maropu commented on issue #28173: [SPARK-31389][SQL][TESTS] Ensure all tests in 
org.apache.spark.sql.execution.metric.SQLMetricsSuite run with both codegen on 
and off
URL: https://github.com/apache/spark/pull/28173#issuecomment-613223510
 
 
   retest this please


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-13 Thread GitBox

imback82 commented on a change in pull request #28123: [SPARK-31350][SQL] 
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407864011
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketInJoin extends Rule[LogicalPlan]  {
+  val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets"
+
+  private val sqlConf = SQLConf.get
+
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _)
+  if r.bucketSpec.nonEmpty && 
!r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  (large - small) <= 
sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addBucketHint(plan: LogicalPlan, hint: (String, String)): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) =>
+l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession))
+}
+  }
+
+  def apply(plan: LogicalPlan): LogicalPlan = {
+if (!sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_ENABLED)) {
+  return plan
+}
+
+plan transform {
+  case join: Join if join.joinType == Inner =>
 
 Review comment:
   Thanks for pointing this out. Actually, this should apply for any join 
types; we don't check the join types when buckets are used. This optimization 
would help in equi-join cases but I don't think this will negatively impact the 
non equi-join cases. (please correct me if I am wrong here)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] 
FileStreamSource: Avoid reading compact metadata log twice if the query 
restarts from compact batch
URL: https://github.com/apache/spark/pull/27649#issuecomment-613222587
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #27649: [SPARK-30900][SS] 
FileStreamSource: Avoid reading compact metadata log twice if the query 
restarts from compact batch
URL: https://github.com/apache/spark/pull/27649#issuecomment-613222593
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121230/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] 
remove requirement to launch a task to reset locality wait timer
URL: https://github.com/apache/spark/pull/28188#issuecomment-61353
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25938/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] manuzhang commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the issue that for dynamic partition overwrite a task would conflict with its speculative task

2020-04-13 Thread GitBox

manuzhang commented on issue #26339: [SPARK-27194][SPARK-29302][SQL] Fix the 
issue that for dynamic partition overwrite a task would conflict with its 
speculative task
URL: https://github.com/apache/spark/pull/26339#issuecomment-613222514
 
 
   IIUC, only one task attempt should commit at a time. If so, will there still 
be conflicts with task speculation when the first task commit successfully ?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28188: [SPARK-18886][CORE][FOLLOWUP] 
remove requirement to launch a task to reset locality wait timer
URL: https://github.com/apache/spark/pull/28188#issuecomment-61349
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: 
Avoid reading compact metadata log twice if the query restarts from compact 
batch
URL: https://github.com/apache/spark/pull/27649#issuecomment-613222593
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121230/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #27649: [SPARK-30900][SS] FileStreamSource: 
Avoid reading compact metadata log twice if the query restarts from compact 
batch
URL: https://github.com/apache/spark/pull/27649#issuecomment-613222587
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove 
requirement to launch a task to reset locality wait timer
URL: https://github.com/apache/spark/pull/28188#issuecomment-61353
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25938/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove requirement to launch a task to reset locality wait timer

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28188: [SPARK-18886][CORE][FOLLOWUP] remove 
requirement to launch a task to reset locality wait timer
URL: https://github.com/apache/spark/pull/28188#issuecomment-61349
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-13 Thread GitBox

SparkQA removed a comment on issue #27649: [SPARK-30900][SS] FileStreamSource: 
Avoid reading compact metadata log twice if the query restarts from compact 
batch
URL: https://github.com/apache/spark/pull/27649#issuecomment-613136567
 
 
   **[Test build #121230 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121230/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] MaxGekk commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of fromJavaDate

2020-04-13 Thread GitBox

MaxGekk commented on issue #28205: [SPARK-31439][SQL] Fix perf regression of 
fromJavaDate
URL: https://github.com/apache/spark/pull/28205#issuecomment-613222161
 
 
   @cloud-fan @HyukjinKwon Please, review the PR.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid reading compact metadata log twice if the query restarts from compact batch

2020-04-13 Thread GitBox

SparkQA commented on issue #27649: [SPARK-30900][SS] FileStreamSource: Avoid 
reading compact metadata log twice if the query restarts from compact batch
URL: https://github.com/apache/spark/pull/27649#issuecomment-613222015
 
 
   **[Test build #121230 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121230/testReport)**
 for PR 27649 at commit 
[`6406e36`](https://github.com/apache/spark/commit/6406e36eb34377983aaf113495ca16b1553317a3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

SparkQA commented on issue #28210: [SPARK-31441] Support duplicated column 
names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613221968
 
 
   **[Test build #121250 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/121250/testReport)**
 for PR 28210 at commit 
[`8ecdf33`](https://github.com/apache/spark/commit/8ecdf33aab595c00f6ef7b5094e802d2e6c4c934).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] maropu commented on a change in pull request #28123: [SPARK-31350][SQL] Coalesce bucketed tables for join if applicable

2020-04-13 Thread GitBox

maropu commented on a change in pull request #28123: [SPARK-31350][SQL] 
Coalesce bucketed tables for join if applicable
URL: https://github.com/apache/spark/pull/28123#discussion_r407862565
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/bucketing/CoalesceBucketInJoin.scala
 ##
 @@ -0,0 +1,110 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.sql.execution.bucketing
+
+import org.apache.spark.sql.catalyst.catalog.BucketSpec
+import org.apache.spark.sql.catalyst.plans.Inner
+import org.apache.spark.sql.catalyst.plans.logical.{Filter, Join, LogicalPlan, 
Project}
+import org.apache.spark.sql.catalyst.rules.Rule
+import org.apache.spark.sql.execution.datasources.{HadoopFsRelation, 
LogicalRelation}
+import org.apache.spark.sql.internal.SQLConf
+
+/**
+ * This rule injects a hint if one side of two bucketed tables can be coalesced
+ * when the two bucketed tables are inner-joined and they differ in the number 
of buckets.
+ */
+object CoalesceBucketInJoin extends Rule[LogicalPlan]  {
+  val JOIN_HINT_COALESCED_NUM_BUCKETS: String = "JoinHintCoalescedNumBuckets"
+
+  private val sqlConf = SQLConf.get
+
+  private def isPlanEligible(plan: LogicalPlan): Boolean = {
+def forall(plan: LogicalPlan)(p: LogicalPlan => Boolean): Boolean = {
+  p(plan) && plan.children.forall(forall(_)(p))
+}
+
+forall(plan) {
+  case _: Filter | _: Project | _: LogicalRelation => true
+  case _ => false
+}
+  }
+
+  private def getBucketSpec(plan: LogicalPlan): Option[BucketSpec] = {
+if (isPlanEligible(plan)) {
+  plan.collectFirst {
+case _ @ LogicalRelation(r: HadoopFsRelation, _, _, _)
+  if r.bucketSpec.nonEmpty && 
!r.options.contains(JOIN_HINT_COALESCED_NUM_BUCKETS) =>
+  r.bucketSpec.get
+  }
+} else {
+  None
+}
+  }
+
+  private def mayCoalesce(numBuckets1: Int, numBuckets2: Int): Option[Int] = {
+assert(numBuckets1 != numBuckets2)
+val (small, large) = (math.min(numBuckets1, numBuckets2), 
math.max(numBuckets1, numBuckets2))
+// A bucket can be coalesced only if the bigger number of buckets is 
divisible by the smaller
+// number of buckets because bucket id is calculated by modding the total 
number of buckets.
+if ((large % small == 0) &&
+  (large - small) <= 
sqlConf.getConf(SQLConf.COALESCE_BUCKET_IN_JOIN_MAX_NUM_BUCKETS_DIFF)) {
+  Some(small)
+} else {
+  None
+}
+  }
+
+  private def addBucketHint(plan: LogicalPlan, hint: (String, String)): 
LogicalPlan = {
+plan.transformUp {
+  case l @ LogicalRelation(r: HadoopFsRelation, _, _, _) =>
+l.copy(relation = r.copy(options = r.options + hint)(r.sparkSession))
 
 Review comment:
   > for example, if I see a Join in a strategy, I need to act on it, and if I 
see a scan node, I cannot traverse up the TreeNode
   
   
   You cannot extract a join plan pattern, e.g., 
`Join(LogicalRelation(HadoopFsRelation), LogicalRelation(HadoopFsRelation))`,  
just like `ScanOperation`?
   
   > Modifying BucketSpec is not desirable either, right?
   
   Yea, `BucketSpec` is an item in catalog, so I think we cannot put 
optimization hints in it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] huaxingao commented on issue #28209: [SPARK-31319][SQL][FOLLOW-UP] Add a SQL example for UDAF

2020-04-13 Thread GitBox

huaxingao commented on issue #28209: [SPARK-31319][SQL][FOLLOW-UP] Add a SQL 
example for UDAF
URL: https://github.com/apache/spark/pull/28209#issuecomment-613221208
 
 
   Thanks! @HyukjinKwon @maropu 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on issue #28198: [SPARK-31392][SQL][3.0] Support CalendarInterval to be reflect to Ca…

2020-04-13 Thread GitBox

yaooqinn commented on issue #28198: [SPARK-31392][SQL][3.0]  Support 
CalendarInterval to be reflect to Ca…
URL: https://github.com/apache/spark/pull/28198#issuecomment-613220853
 
 
   cc @dongjoon-hyun @cloud-fan @HyukjinKwon, the backport PR is ready to 
review thanks


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support 
duplicated column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613220581
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #28210: [SPARK-31441] Support 
duplicated column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613220587
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25937/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more 
constraint checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-613220348
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated 
column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613220587
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder-K8s/25937/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated column names for toPandas with arrow execution.

2020-04-13 Thread GitBox

AmplabJenkins commented on issue #28210: [SPARK-31441] Support duplicated 
column names for toPandas with arrow execution.
URL: https://github.com/apache/spark/pull/28210#issuecomment-613220581
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support

2020-04-13 Thread GitBox

yaooqinn commented on issue #28106: [SPARK-31335][SQL] Add try function support
URL: https://github.com/apache/spark/pull/28106#issuecomment-613220617
 
 
   retest this please
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] [spark] AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more constraint checks in file streaming source to avoid checkpoint corruption

2020-04-13 Thread GitBox

AmplabJenkins removed a comment on issue #25965: [SPARK-26425][SS] Add more 
constraint checks in file streaming source to avoid checkpoint corruption
URL: https://github.com/apache/spark/pull/25965#issuecomment-613220356
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/121235/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1352 matches

Mail list logo