[GitHub] [spark] sandeep-katta commented on issue #25399: [SPARK-28670][SQL] create function should thrown Exception if the resource is not found

2019-08-11 Thread GitBox
sandeep-katta commented on issue #25399: [SPARK-28670][SQL] create function 
should thrown Exception if the resource is not found
URL: https://github.com/apache/spark/pull/25399#issuecomment-520306332
 
 
   **Hive** 
   
   In both (temporary and permanent) the cases query execution fails
   
   **case i: temporary function**
   
   jdbc:hive2://vm1:21066/> create temporary function addm AS 
'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/AddDoublesUDF1.jar';
   INFO  : Executing 
command(queryId=omm_20190812133851_e58dd117-e8b1-40b6-8659-5ad14eddfdd6): 
create temporary function addm AS 
'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/AddDoublesUDF1.jar'--0; Current 
sessionId=dcdc1020-3f73-4af2-95db-e834abda2020
   **ERROR : File does not exist: hdfs://hacluster/user/AddDoublesUDF1.jar**
   Error: Error while processing statement: FAILED: Execution Error, return 
code -101 from **org.apache.hadoop.hive.ql.exec.FunctionTask. 
java.io.FileNotFoundException:** File does not exist: 
hdfs://hacluster/user/AddDoublesUDF1.jar (state=08S01,code=-101)
   
   **case ii: Permanent function**
   
   jdbc:hive2://vm1:21066/> create function addm AS 
'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 
'hdfs://hacluster/user/AddDoublesUDF1.jar';
   INFO  : Executing 
command(queryId=omm_20190812133902_54e39039-b678-493e-93c2-8c09ce5bcfc0): 
create function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' 
using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'--0; Current 
sessionId=dcdc1020-3f73-4af2-95db-e834abda2020
   INFO  : Starting task [Stage-0:FUNC] in serial mode
   **ERROR : File does not exist: hdfs://hacluster/user/AddDoublesUDF1.jar**
   ERROR : Failed to register default.addm using class 
com.huawei.bigdata.hive.example.udf.AddDoublesUDF
   Error: Error while processing statement: FAILED: Execution Error, return 
code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1)
   
   **Presto:**
   As per presto there is no such temporary or permanent function concept, user 
needs to implement the UDF as plugin and put in the plugin folder and restart 
presto server.
   
   details about presto UDF are 
[here](https://www.qubole.com/blog/plugging-in-presto-udfs/)


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dilipbiswal commented on issue #25331: [SPARK-27768][SQL] Infinity, -Infinity, NaN should be recognized in a case insensitive manner.

2019-08-11 Thread GitBox
dilipbiswal commented on issue #25331: [SPARK-27768][SQL] Infinity, -Infinity, 
NaN should be recognized in a case insensitive manner.
URL: https://github.com/apache/spark/pull/25331#issuecomment-520306306
 
 
   gentle ping @dongjoon-hyun 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new 
metric regarding number of rows later than watermark
URL: https://github.com/apache/spark/pull/24936#issuecomment-510987092
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark

2019-08-11 Thread GitBox
SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding 
number of rows later than watermark
URL: https://github.com/apache/spark/pull/24936#issuecomment-520305959
 
 
   **[Test build #108952 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108952/testReport)**
 for PR 24936 at commit 
[`54d159f`](https://github.com/apache/spark/commit/54d159fec0203f4edf615c8fd552df6c1f0b604f).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520304855
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108948/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520304851
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
SparkQA removed a comment on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520294055
 
 
   **[Test build #108948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108948/testReport)**
 for PR 25201 at commit 
[`ac87ffc`](https://github.com/apache/spark/commit/ac87ffc95e0f95c147a725396447518811472c8a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon closed pull request #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
HyukjinKwon closed pull request #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520304855
 
 
   Test FAILed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108948/
   Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520304851
 
 
   Merged build finished. Test FAILed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer 
support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520304737
 
 
   **[Test build #108948 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108948/testReport)**
 for PR 25201 at commit 
[`ac87ffc`](https://github.com/apache/spark/commit/ac87ffc95e0f95c147a725396447518811472c8a).
* This patch **fails Spark unit tests**.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-11 Thread GitBox
jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only 
give warning when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#discussion_r312784188
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/LookupCatalog.scala
 ##
 @@ -62,6 +62,9 @@ trait LookupCatalog extends Logging {
 try {
   Some(lookupCatalog(SESSION_CATALOG_NAME))
 } catch {
+  case _: CatalogNotFoundException =>
+logWarning("Session catalog is not defined")
+None
 
 Review comment:
   @dongjoon-hyun Thanks for the review. Your command line is not the case I 
tried to fix in the PR. In your case, the stack trace is helpful.
   
   It seems that the current master has session catalog defined by default, so 
here is the command line to reproduce my case:
   ```
   $ bin/spark-shell --master 'local[*]' --conf spark.sql.catalog.session=
   ...
   Spark context available as 'sc' (master = local[*], app id = 
local-1565588237201).
   Spark session available as 'spark'.
   ...
   scala> spark.sessionState.analyzer.sessionCatalog
   ...
   2019-08-11 22:37:24,216 ERROR [main] hive.HiveSessionStateBuilder$$anon$1 
(Logging.scala:logError(94)) - Cannot load v2 session catalog
   org.apache.spark.SparkException: Cannot find catalog plugin class for 
catalog 'session':
at org.apache.spark.sql.catalog.v2.Catalogs.load(Catalogs.java:81)
   ...
   res0: Option[org.apache.spark.sql.catalog.v2.CatalogPlugin] = None
   ```
   Here the stack trace does not add more information. And I am concerned that 
if any rule uses session catalog, we will see this long stack trace again and 
again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined

2019-08-11 Thread GitBox
jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only 
give warning when session catalog is not defined
URL: https://github.com/apache/spark/pull/25372#discussion_r312784188
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/LookupCatalog.scala
 ##
 @@ -62,6 +62,9 @@ trait LookupCatalog extends Logging {
 try {
   Some(lookupCatalog(SESSION_CATALOG_NAME))
 } catch {
+  case _: CatalogNotFoundException =>
+logWarning("Session catalog is not defined")
+None
 
 Review comment:
   @dongjoon-hyun Thanks for the review. Your command line is not the case I 
tried to fix in the PR. In your case, the stack trace is helpful.
   
   It seems that the current master has session catalog defined by default, so 
here is the command line to reproduce my case:
   ```
   $ bin/spark-shell --master 'local[*]' --conf spark.sql.catalog.session=
   ...
   Spark context available as 'sc' (master = local[*], app id = 
local-1565588237201).
   Spark session available as 'spark'.
   ...
   scala> spark.sessionState.analyzer.sessionCatalog
   ...
   2019-08-11 22:37:24,216 ERROR [main] hive.HiveSessionStateBuilder$$anon$1 
(Logging.scala:logError(94)) - Cannot load v2 session catalog
   org.apache.spark.SparkException: Cannot find catalog plugin class for 
catalog 'session':
at org.apache.spark.sql.catalog.v2.Catalogs.load(Catalogs.java:81)
   ...
   res0: Option[org.apache.spark.sql.catalog.v2.CatalogPlugin] = None
   ```
   Here is the stack trace does not add more information. And I am concerned 
that if any rule uses session catalog, we will see this long stack trace again 
and again.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip 
test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 
11
URL: https://github.com/apache/spark/pull/25414#issuecomment-520303874
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14023/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip 
test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 
11
URL: https://github.com/apache/spark/pull/25414#issuecomment-520303872
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11

2019-08-11 Thread GitBox
SparkQA commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read 
hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
URL: https://github.com/apache/spark/pull/25414#issuecomment-520304097
 
 
   **[Test build #108951 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108951/testReport)**
 for PR 25414 at commit 
[`978a21a`](https://github.com/apache/spark/commit/978a21af5250c31525a4ba2fab96175a1e93871b).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
HyukjinKwon commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520303909
 
 
   Merged to master.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test 
`read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
URL: https://github.com/apache/spark/pull/25414#issuecomment-520303872
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test 
`read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
URL: https://github.com/apache/spark/pull/25414#issuecomment-520303874
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14023/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on issue #25405: [SPARK-28685][SQL][TEST] Test HMS 2.0.0+ in VersionsSuite/HiveClientSuites on JDK 11

2019-08-11 Thread GitBox
HyukjinKwon commented on issue #25405: [SPARK-28685][SQL][TEST] Test HMS 2.0.0+ 
in VersionsSuite/HiveClientSuites on JDK 11
URL: https://github.com/apache/spark/pull/25405#issuecomment-520303800
 
 
   @shaneknapp .. actually, adding `test-java11` in PR builder turned to be a 
bit important now.. @wangyum, mind filing an JIRA?
   
   For some context as FYI, for JDK 11 support, I asked feasibility about Hive 
2.3.6 release at [this Hive dev 
thread](http://mail-archives.apache.org/mod_mbox/hive-dev/201908.mbox/%3CCANQiJeV3VM0iVp%2BgTwKPpx9dHeXe0BcLAicmWU5EPDOptb%2B_%2BQ%40mail.gmail.com%3E).
 Thankfully, seems positive. @wangyum quickly started worked on that and 
@alanfgates (from Hive) is actively cooperative - thanks again.
   
   Therefore, two communities are cooperating at 
[SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and 
[HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096) . It might be 
much easier if we can test JDK 11 in PR builder in Spark side.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum opened a new pull request #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11

2019-08-11 Thread GitBox
wangyum opened a new pull request #25414: [SPARK-28688][SQL][TEST] Skip test 
`read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
URL: https://github.com/apache/spark/pull/25414
 
 
   ## What changes were proposed in this pull request?
   
   This PR makes it skip test `read hive materialized view` since Hive 3.0 in 
`VersionsSuite.scala` on JDK 11 because 
[HIVE-19383](https://issues.apache.org/jira/browse/HIVE-19383) added 
[ArrayList$SubList](https://github.com/apache/hive/blob/ae4df627952610dbec029b099f0964908b3a4f25/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java#L383)
 which is incompatible with JDK 11:
   ```java
   java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:389)
at 
org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235)
   ...
   ```
   
![image](https://issues.apache.org/jira/secure/attachment/12977250/12977250_screenshot-2.png)
   
![image](https://issues.apache.org/jira/secure/attachment/12977249/12977249_screenshot-1.png)
   
   ## How was this patch tested?
   
   manual tests
   **Test on JDK 11**:
   ```
   ...
   [info] - 2.3: sql read hive materialized view (1 second, 253 milliseconds)
   ...
   [info] - 3.0: sql read hive materialized view !!! CANCELED !!! (31 
milliseconds)
   [info]   org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) 
was true, and "[3.0]" did not equal "[2.3]" (VersionsSuite.scala:624)
   ...
   [info] - 3.1: sql read hive materialized view !!! CANCELED !!! (1 
millisecond)
   [info]   org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) 
was true, and "[3.1]" did not equal "[2.3]" (VersionsSuite.scala:624)
   ...
   ```
   
   **Test on JDK 1.8**:
   ```
   ...
   [info] - 2.3: sql read hive materialized view (1 second, 444 milliseconds)
   ...
   [info] - 3.0: sql read hive materialized view (3 seconds, 100 milliseconds)
   ...
   [info] - 3.1: sql read hive materialized view (2 seconds, 941 milliseconds)
   ...
   ```
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add 
python version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520301172
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14022/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add 
python version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520301169
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version 
check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520301164
 
 
   Kubernetes integration test status success
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14022/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520301169
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520301172
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14022/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config 
to retry spark streaming's meta log when it met error
URL: https://github.com/apache/spark/pull/25333#issuecomment-520299294
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108943/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config 
to retry spark streaming's meta log when it met error
URL: https://github.com/apache/spark/pull/25333#issuecomment-520299290
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry 
spark streaming's meta log when it met error
URL: https://github.com/apache/spark/pull/25333#issuecomment-520299290
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry 
spark streaming's meta log when it met error
URL: https://github.com/apache/spark/pull/25333#issuecomment-520299294
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108943/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error

2019-08-11 Thread GitBox
SparkQA removed a comment on issue #25333: [SPARK-28597][SS] Add config to 
retry spark streaming's meta log when it met error
URL: https://github.com/apache/spark/pull/25333#issuecomment-520279181
 
 
   **[Test build #108943 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108943/testReport)**
 for PR 25333 at commit 
[`757491e`](https://github.com/apache/spark/commit/757491e3433fcd68d852abcfff26dfe07a2e07f4).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version 
check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520299074
 
 
   Kubernetes integration test starting
   URL: 
https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14022/
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error

2019-08-11 Thread GitBox
SparkQA commented on issue #25333: [SPARK-28597][SS] Add config to retry spark 
streaming's meta log when it met error
URL: https://github.com/apache/spark/pull/25333#issuecomment-520299105
 
 
   **[Test build #108943 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108943/testReport)**
 for PR 25333 at commit 
[`757491e`](https://github.com/apache/spark/commit/757491e3433fcd68d852abcfff26dfe07a2e07f4).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add 
python version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520298813
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add 
python version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520298817
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108950/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520298817
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108950/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
SparkQA removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520297350
 
 
   **[Test build #108950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108950/testReport)**
 for PR 25411 at commit 
[`bbc2e70`](https://github.com/apache/spark/commit/bbc2e708844fbcb18eacbfc404d75f67f17818d3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python 
version check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520298813
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version 
check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520298780
 
 
   **[Test build #108950 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108950/testReport)**
 for PR 25411 at commit 
[`bbc2e70`](https://github.com/apache/spark/commit/bbc2e708844fbcb18eacbfc404d75f67f17818d3).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25360: [SPARK-28280][PYTHON][SQL][TESTS][FOLLOW-UP] Add UDF cases into group by clause in 'udf-group-by.sql'

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25360: 
[SPARK-28280][PYTHON][SQL][TESTS][FOLLOW-UP] Add UDF cases into group by clause 
in 'udf-group-by.sql'
URL: https://github.com/apache/spark/pull/25360#discussion_r312779019
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql
 ##
 @@ -20,29 +20,25 @@ SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1;
 SELECT 'foo' FROM testData WHERE a = 0 GROUP BY udf(1);
 
 -- Aggregate grouped by literals (hash aggregate).
-SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 
GROUP BY 1;
+SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 
GROUP BY udf(1);
 
 -- Aggregate grouped by literals (sort aggregate).
-SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1;
+SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY udf(1);
 
 -- Aggregate with complex GroupBy expressions.
 SELECT udf(a + b), udf(COUNT(b)) FROM testData GROUP BY a + b;
 SELECT udf(a + 2), udf(COUNT(b)) FROM testData GROUP BY a + 1;
-
--- [SPARK-28445] Inconsistency between Scala and Python/Panda udfs when 
groupby with udf() is used
--- The following query will make Scala UDF work, but Python and Pandas udfs 
will fail with an AnalysisException.
--- The query should be added after SPARK-28445.
--- SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 1);
+SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 1);
 
 Review comment:
   @skonto, looks all fine except this one. Let's fix and I'll get this in.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25371: [SPARK-28393][SQL][PYTHON][TESTS] Convert and port 'pgSQL/join.sql' into UDF test base

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25371: 
[SPARK-28393][SQL][PYTHON][TESTS] Convert and port 'pgSQL/join.sql' into UDF 
test base
URL: https://github.com/apache/spark/pull/25371#discussion_r312778802
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/udf/pgSQL/udf-join.sql
 ##
 @@ -0,0 +1,2081 @@
+--
+-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group
+--
+--
+-- JOIN
+-- Test JOIN clauses
+-- 
https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/join.sql
+--
+-- This test file was converted from pgSQL/join.sql.
+
+CREATE OR REPLACE TEMPORARY VIEW INT4_TBL AS SELECT * FROM
+  (VALUES (0), (123456), (-123456), (2147483647), (-2147483647))
+  AS v(f1);
+CREATE OR REPLACE TEMPORARY VIEW INT8_TBL AS SELECT * FROM
+  (VALUES
+(123, 456),
+(123, 4567890123456789),
+(4567890123456789, 123),
+(4567890123456789, 4567890123456789),
+(4567890123456789, -4567890123456789))
+  AS v(q1, q2);
+CREATE OR REPLACE TEMPORARY VIEW FLOAT8_TBL AS SELECT * FROM
+  (VALUES (0.0), (1004.30), (-34.84),
+(cast('1.2345678901234e+200' as double)), (cast('1.2345678901234e-200' as 
double)))
+  AS v(f1);
+CREATE OR REPLACE TEMPORARY VIEW TEXT_TBL AS SELECT * FROM
+  (VALUES ('doh!'), ('hi de ho neighbor'))
+  AS v(f1);
+CREATE OR REPLACE TEMPORARY VIEW tenk2 AS SELECT * FROM tenk1;
+
+CREATE TABLE J1_TBL (
+  i integer,
+  j integer,
+  t string
+) USING parquet;
+
+CREATE TABLE J2_TBL (
+  i integer,
+  k integer
+) USING parquet;
+
+
+INSERT INTO J1_TBL VALUES (1, 4, 'one');
+INSERT INTO J1_TBL VALUES (2, 3, 'two');
+INSERT INTO J1_TBL VALUES (3, 2, 'three');
+INSERT INTO J1_TBL VALUES (4, 1, 'four');
+INSERT INTO J1_TBL VALUES (5, 0, 'five');
+INSERT INTO J1_TBL VALUES (6, 6, 'six');
+INSERT INTO J1_TBL VALUES (7, 7, 'seven');
+INSERT INTO J1_TBL VALUES (8, 8, 'eight');
+INSERT INTO J1_TBL VALUES (0, NULL, 'zero');
+INSERT INTO J1_TBL VALUES (NULL, NULL, 'null');
+INSERT INTO J1_TBL VALUES (NULL, 0, 'zero');
+
+INSERT INTO J2_TBL VALUES (1, -1);
+INSERT INTO J2_TBL VALUES (2, 2);
+INSERT INTO J2_TBL VALUES (3, -3);
+INSERT INTO J2_TBL VALUES (2, 4);
+INSERT INTO J2_TBL VALUES (5, -5);
+INSERT INTO J2_TBL VALUES (5, -5);
+INSERT INTO J2_TBL VALUES (0, NULL);
+INSERT INTO J2_TBL VALUES (NULL, NULL);
+INSERT INTO J2_TBL VALUES (NULL, 0);
+
+-- [SPARK-20856] Do not need onerow because it only used for test statement 
using nested joins
+-- useful in some tests below
+-- create temp table onerow();
+-- insert into onerow default values;
+-- analyze onerow;
+
+
+--
+-- CORRELATION NAMES
+-- Make sure that table/column aliases are supported
+-- before diving into more complex join syntax.
+--
+
+SELECT udf('') AS `xxx`, udf(i), udf(j), udf(t)
 
 Review comment:
   @huaxingao, seems this file almost added `udf(...)` once for every 
possibility. Can we use other combinations in general? For instance,
   
   ```
   udf(...)
   udf(udf(...))
   ```
   
   or 
   
   ```
   on (udf(...) = ...)
   on (... = udf(...))
   on (udf(...) = udf(udf(...)))
   
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-520297336
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108945/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI 
updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-520297334
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to 
show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-520297336
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108945/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version 
check for executor
URL: https://github.com/apache/spark/pull/25411#issuecomment-520297350
 
 
   **[Test build #108950 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108950/testReport)**
 for PR 25411 at commit 
[`bbc2e70`](https://github.com/apache/spark/commit/bbc2e708844fbcb18eacbfc404d75f67f17818d3).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to 
show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-520297334
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
SparkQA removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to 
show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-520283851
 
 
   **[Test build #108945 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108945/testReport)**
 for PR 25409 at commit 
[`fbb4382`](https://github.com/apache/spark/commit/fbb4382d282be5e299e64da88e8ead3bd1d45292).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] viirya commented on a change in pull request #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor

2019-08-11 Thread GitBox
viirya commented on a change in pull request #25411: [SPARK-28652][TESTS][K8S] 
Add python version check for executor
URL: https://github.com/apache/spark/pull/25411#discussion_r312778434
 
 

 ##
 File path: resource-managers/kubernetes/integration-tests/tests/pyfiles.py
 ##
 @@ -35,4 +36,11 @@
 # Begin of Python container checks
 version_check(sys.argv[1], 2 if sys.argv[1] == "python" else 3)
 
+# Check python executable at executors
+spark.catalog.registerFunction("getSysVer",
+   lambda: "%d.%d" % sys.version_info[:2], 
StringType())
+[row] = spark.sql("SELECT getSysVer()").collect()
+driverVersion = "%d.%d" % sys.version_info[:2]
 
 Review comment:
   Yes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
SparkQA commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show 
resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#issuecomment-520297134
 
 
   **[Test build #108945 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108945/testReport)**
 for PR 25409 at commit 
[`fbb4382`](https://github.com/apache/spark/commit/fbb4382d282be5e299e64da88e8ead3bd1d45292).
* This patch passes all tests.
* This patch merges cleanly.
* This patch adds no public classes.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun edited a comment on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 2.12.9

2019-08-11 Thread GitBox
dongjoon-hyun edited a comment on issue #25404: [SPARK-28683][BUILD] Upgrade 
Scala to 2.12.9
URL: https://github.com/apache/spark/pull/25404#issuecomment-520296446
 
 
   Hi, All. Let's reuse SPARK-28683 for 2.12.10. I reopened the JIRA issue for 
that.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 2.12.9

2019-08-11 Thread GitBox
dongjoon-hyun commented on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 
2.12.9
URL: https://github.com/apache/spark/pull/25404#issuecomment-520296446
 
 
   Hi, All. Let's reuse SPARK-28683 for 2.12.10. I reopened it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables

2019-08-11 Thread GitBox
imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] 
Implement SHOW TABLES for Data Source V2 Tables
URL: https://github.com/apache/spark/pull/25247#discussion_r312777236
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1700,6 +1704,126 @@ class DataSourceV2SQLSuite extends QueryTest with 
SharedSQLContext with BeforeAn
 }
   }
 
+  test("ShowTables: using v2 catalog") {
+spark.sql("CREATE TABLE testcat.db.table_name (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.n1.n2.db.table_name (id bigint, data 
string) USING foo")
+
+runShowTablesSql("SHOW TABLES FROM testcat.db", Seq(Row("db", 
"table_name")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.n1.n2.db",
+  Seq(Row("n1.n2.db", "table_name")))
+  }
+
+  test("ShowTables: using v2 catalog with a pattern") {
+spark.sql("CREATE TABLE testcat.db.table (id bigint, data string) USING 
foo")
+spark.sql("CREATE TABLE testcat.db.table_name_1 (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.db.table_name_2 (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.db2.table_name_2 (id bigint, data string) 
USING foo")
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db",
+  Seq(
+Row("db", "table"),
+Row("db", "table_name_1"),
+Row("db", "table_name_2")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db LIKE '*name*'",
+  Seq(Row("db", "table_name_1"), Row("db", "table_name_2")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db LIKE '*2'",
+  Seq(Row("db", "table_name_2")))
+  }
+
+  test("ShowTables: using v2 catalog, namespace doesn't exist") {
+runShowTablesSql("SHOW TABLES FROM testcat.unknown", Seq())
 
 Review comment:
   @cloud-fan as far as I understand, throwing `NoSuchNamespaceException` is 
optional in v2:
   ```
 /**
  * List the tables in a namespace from the catalog.
  * 
  * If the catalog supports views, this must return identifiers for only 
tables and not views.
  *
  * @param namespace a multi-part namespace
  * @return an array of Identifiers for tables
  * @throws NoSuchNamespaceException If the namespace does not exist 
(optional).
  */
 Identifier[] listTables(String[] namespace) throws 
NoSuchNamespaceException;
   ```
   
   I can update `TestInMemoryTableCatalog` to throw NoSuchNamespaceException if 
there is no namespace existing for the tables created. However, I am not sure 
if this is the right approach since you could have created namespace without 
tables - in v1, you could have done `CREATE DATABASE db` without creating 
tables belonging to `db`, although I don't think this scenario is supported in 
v2 yet.
   
   Please advise how this needs to be handled. Thanks!
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE 
TABLE is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#issuecomment-520295435
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14021/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE 
TABLE is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#issuecomment-520295432
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #24601: [SPARK-27702][K8S] Allow using some alternatives for service accounts

2019-08-11 Thread GitBox
dongjoon-hyun commented on issue #24601: [SPARK-27702][K8S] Allow using some 
alternatives for service accounts
URL: https://github.com/apache/spark/pull/24601#issuecomment-520295526
 
 
   Sorry for being later, @Udbhav30 . If then, could you describe the test 
procedure in the `How was this patch tested?` section of PR description? For 
example, how to create new service account and how to use that?
   > i am unable to simulate this from minikube as there will always be a 
default service account so i am not sure if i could write any test case for 
this. 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables

2019-08-11 Thread GitBox
imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] 
Implement SHOW TABLES for Data Source V2 Tables
URL: https://github.com/apache/spark/pull/25247#discussion_r312777236
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1700,6 +1704,126 @@ class DataSourceV2SQLSuite extends QueryTest with 
SharedSQLContext with BeforeAn
 }
   }
 
+  test("ShowTables: using v2 catalog") {
+spark.sql("CREATE TABLE testcat.db.table_name (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.n1.n2.db.table_name (id bigint, data 
string) USING foo")
+
+runShowTablesSql("SHOW TABLES FROM testcat.db", Seq(Row("db", 
"table_name")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.n1.n2.db",
+  Seq(Row("n1.n2.db", "table_name")))
+  }
+
+  test("ShowTables: using v2 catalog with a pattern") {
+spark.sql("CREATE TABLE testcat.db.table (id bigint, data string) USING 
foo")
+spark.sql("CREATE TABLE testcat.db.table_name_1 (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.db.table_name_2 (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.db2.table_name_2 (id bigint, data string) 
USING foo")
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db",
+  Seq(
+Row("db", "table"),
+Row("db", "table_name_1"),
+Row("db", "table_name_2")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db LIKE '*name*'",
+  Seq(Row("db", "table_name_1"), Row("db", "table_name_2")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db LIKE '*2'",
+  Seq(Row("db", "table_name_2")))
+  }
+
+  test("ShowTables: using v2 catalog, namespace doesn't exist") {
+runShowTablesSql("SHOW TABLES FROM testcat.unknown", Seq())
 
 Review comment:
   @cloud-fan as far as I understand, throwing `NoSuchNamespaceException` is 
optional:
   ```
 /**
  * List the tables in a namespace from the catalog.
  * 
  * If the catalog supports views, this must return identifiers for only 
tables and not views.
  *
  * @param namespace a multi-part namespace
  * @return an array of Identifiers for tables
  * @throws NoSuchNamespaceException If the namespace does not exist 
(optional).
  */
 Identifier[] listTables(String[] namespace) throws 
NoSuchNamespaceException;
   ```
   
   I can update `TestInMemoryTableCatalog` to throw NoSuchNamespaceException if 
there is no namespace existing for the tables created. However, I am not sure 
if this is the right approach since you could have created namespace without 
tables - in v1, you could have done `CREATE DATABASE db` without creating 
tables belonging to `db`, although I don't think this scenario is supported in 
v2 yet.
   
   Please advise how this needs to be handled. Thanks!
   
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE 
is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#issuecomment-520295432
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE 
is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#issuecomment-520295435
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14021/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] 
SHOW CREATE TABLE is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#discussion_r312776947
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala
 ##
 @@ -159,7 +159,9 @@ abstract class SQLViewSuite extends QueryTest with 
SQLTestUtils {
 
Thread.currentThread().getContextClassLoader.getResource("data/files/employee.dat")
   assertNoSuchTable(s"""LOAD DATA LOCAL INPATH "$dataFilePath" INTO TABLE 
$viewName""")
   assertNoSuchTable(s"TRUNCATE TABLE $viewName")
-  assertNoSuchTable(s"SHOW CREATE TABLE $viewName")
+  intercept[AnalysisException] {
 
 Review comment:
   As you know already, we need to check the actual error message always 
because `AnalysisException` can hide the regressions.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] 
SHOW CREATE TABLE is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#discussion_r312776831
 
 

 ##
 File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala
 ##
 @@ -949,16 +949,22 @@ case class ShowCreateTableCommand(table: 
TableIdentifier) extends RunnableComman
 
   override def run(sparkSession: SparkSession): Seq[Row] = {
 val catalog = sparkSession.sessionState.catalog
-val tableMetadata = catalog.getTableMetadata(table)
-
-// TODO: unify this after we unify the CREATE TABLE syntax for hive serde 
and data source table.
-val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) {
-  showCreateDataSourceTable(tableMetadata)
+if (catalog.isTemporaryTable(table)) {
+  throw new AnalysisException(
+s"SHOW CREATE TABLE is not supported on a temporary view: 
${table.identifier}")
 } else {
-  showCreateHiveTable(tableMetadata)
-}
+  val tableMetadata = catalog.getTableMetadata(table)
 
-Seq(Row(stmt))
+  // TODO: unify this after we unify the
 
 Review comment:
   Hi, @wangyum .
   I know that this `TODO` is not generated in this PR, but this is a good 
change to make this to `IDed TODO`. It will help the other contributors pick up 
this issue. Could you file a JIRA for this and use that JIRA ID here, please?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
SparkQA commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not 
supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#issuecomment-520294838
 
 
   **[Test build #108949 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108949/testReport)**
 for PR 25149 at commit 
[`38cf574`](https://github.com/apache/spark/commit/38cf57471416e74769e88c04aaae288e1c4be309).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view

2019-08-11 Thread GitBox
dongjoon-hyun commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE 
is not supported on a temporary view
URL: https://github.com/apache/spark/pull/25149#issuecomment-520294719
 
 
   Retest this please.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag

2019-08-11 Thread GitBox
dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag
URL: https://github.com/apache/spark/pull/25229#issuecomment-520294492
 
 
   Thank you, @skonto !


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer 
support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520294055
 
 
   **[Test build #108948 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108948/testReport)**
 for PR 25201 at commit 
[`ac87ffc`](https://github.com/apache/spark/commit/ac87ffc95e0f95c147a725396447518811472c8a).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] 
Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#discussion_r312776086
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -455,6 +456,22 @@ object DateTimeUtils {
 (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt
   }
 
+  /**
+   * Returns seconds, including fractional parts, multiplied by 1000. The 
timestamp
 
 Review comment:
   +1 for @HyukjinKwon 's advice . My reasons are here, @MaxGekk .
   - https://github.com/apache/spark/pull/25408#discussion_r312775959


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520293856
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] 
Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#discussion_r312776086
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -455,6 +456,22 @@ object DateTimeUtils {
 (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt
   }
 
+  /**
+   * Returns seconds, including fractional parts, multiplied by 1000. The 
timestamp
 
 Review comment:
   +1 for @HyukjinKwon . My reasons are here, @MaxGekk .
   - https://github.com/apache/spark/pull/25408#discussion_r312775959


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520293859
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14020/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520293856
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable 
SparkThriftServer support proxy user's authentication .
URL: https://github.com/apache/spark/pull/25201#issuecomment-520293859
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14020/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-11 Thread GitBox
dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] 
Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#discussion_r312775959
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -455,6 +456,22 @@ object DateTimeUtils {
 (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt
   }
 
+  /**
+   * Returns seconds, including fractional parts, multiplied by 1 000. The 
timestamp
 
 Review comment:
   Let me ask you in this way, @MaxGekk .
   1. If you think this is better, why not Java style `1_000_000` ? (AFAIK, 
this is supported since JDK7).
   2. And, does Apache Spark have `1 000 000` or `1 000` in our code?
   3. In this PR, I don't find any explanation why you introduce this new style 
in Apache Spark community in this PR.
   4. Finally, this is Apache Spark doc, not `PostgreSQL` doc. A blind 
`copy` is unable to justify that you thought this is better.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API

2019-08-11 Thread GitBox
SparkQA commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API
URL: https://github.com/apache/spark/pull/25354#issuecomment-520293335
 
 
   **[Test build #108947 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108947/testReport)**
 for PR 25354 at commit 
[`4538721`](https://github.com/apache/spark/commit/45387211d054400017ac83627b7b40887f614d16).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins commented on issue #25413: Merge pull request #1 from apache/master

2019-08-11 Thread GitBox
AmplabJenkins commented on issue #25413: Merge pull request #1 from 
apache/master
URL: https://github.com/apache/spark/pull/25413#issuecomment-520292459
 
 
   Can one of the admins verify this patch?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pvk2727 closed pull request #25413: Merge pull request #1 from apache/master

2019-08-11 Thread GitBox
pvk2727 closed pull request #25413: Merge pull request #1 from apache/master
URL: https://github.com/apache/spark/pull/25413
 
 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] pvk2727 opened a new pull request #25413: Merge pull request #1 from apache/master

2019-08-11 Thread GitBox
pvk2727 opened a new pull request #25413: Merge pull request #1 from 
apache/master
URL: https://github.com/apache/spark/pull/25413
 
 
   Send Pull Request
   
   ## What changes were proposed in this pull request?
   
   (Please fill in changes proposed in this fix)
   
   ## How was this patch tested?
   
   (Please explain how this patch was tested. E.g. unit tests, integration 
tests, manual tests)
   (If this patch involves UI changes, please attach a screenshot; otherwise, 
remove this)
   
   Please review https://spark.apache.org/contributing.html before opening a 
pull request.
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312774692
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
+  cannot be achieved with (partitionId, epochId). e.g. source provides 
different number of
+  partitions for some reason, Spark optimization changes number of partitions, 
etc.
+  Refer SPARK-28650 for more details. `epochId` can still be used for 
deduplication, but there's less
 
 Review comment:
   Just to match with other doc:
   
   ```
   See [SPARK-28650](https://issues.apache.org/jira/browse/SPARK-28650) for 
more details.
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] dongjoon-hyun commented on issue #25412: [SPARK-28691][EXAMPLES]DirectKafkaWordCount supoort kafka with kerberos

2019-08-11 Thread GitBox
dongjoon-hyun commented on issue #25412: 
[SPARK-28691][EXAMPLES]DirectKafkaWordCount supoort kafka with kerberos
URL: https://github.com/apache/spark/pull/25412#issuecomment-520291947
 
 
   Hi, @hddong . Thank you for making a PR.
   - Since `DirectKafkaWordCount.scala` and `JavaDirectKafkaWordCount.java` is 
a pair for language parity, this PR should update them together consistently.
   - And, just out of curiosity, why do you want to have a Kerberized example 
at `DirectKafkaWordCount`? Do you want to change another examples like 
`StructuredKafkaWordCount.scala`, too?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312774510
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
+  cannot be achieved with (partitionId, epochId). e.g. source provides 
different number of
+  partitions for some reason, Spark optimization changes number of partitions, 
etc.
+  Refer SPARK-28650 for more details. `epochId` can still be used for 
deduplication, but there's less
+  benefit to leverage this, as the chance for Spark to successfully write all 
partitions and fail to checkpoint
 
 Review comment:
   Using epoch seems not quite useful given the description. Should we maybe 
just remove it out?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter

2019-08-11 Thread GitBox
gczsjdy commented on a change in pull request #25342: 
[SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the 
SortShuffleWriter
URL: https://github.com/apache/spark/pull/25342#discussion_r312773982
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala
 ##
 @@ -46,7 +47,8 @@ private[spark] class DiskBlockObjectWriter(
 writeMetrics: ShuffleWriteMetricsReporter,
 val blockId: BlockId = null)
   extends OutputStream
-  with Logging {
+  with Logging
+  with PairsWriter {
 
 Review comment:
   :nit add `override` to one function


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter

2019-08-11 Thread GitBox
gczsjdy commented on a change in pull request #25342: 
[SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the 
SortShuffleWriter
URL: https://github.com/apache/spark/pull/25342#discussion_r312773349
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/util/collection/PairsWriter.scala
 ##
 @@ -0,0 +1,23 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection
+
+private[spark] trait PairsWriter {
 
 Review comment:
   : nit add docs where can this be used?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter

2019-08-11 Thread GitBox
gczsjdy commented on a change in pull request #25342: 
[SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the 
SortShuffleWriter
URL: https://github.com/apache/spark/pull/25342#discussion_r312773647
 
 

 ##
 File path: 
core/src/main/scala/org/apache/spark/util/collection/ShufflePartitionPairsWriter.scala
 ##
 @@ -0,0 +1,98 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.spark.util.collection
+
+import java.io.{Closeable, FilterOutputStream, OutputStream}
+
+import org.apache.spark.serializer.{SerializationStream, SerializerInstance, 
SerializerManager}
+import org.apache.spark.shuffle.ShuffleWriteMetricsReporter
+import org.apache.spark.shuffle.api.ShufflePartitionWriter
+import org.apache.spark.storage.BlockId
+
+/**
+ * A key-value writer inspired by {@link DiskBlockObjectWriter} that pushes 
the bytes to an
+ * arbitrary partition writer instead of writing to local disk through the 
block manager.
+ */
+private[spark] class ShufflePartitionPairsWriter(
 
 Review comment:
   This should instead be in `o.a.s.s` package?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] cloud-fan commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables

2019-08-11 Thread GitBox
cloud-fan commented on a change in pull request #25247: [SPARK-28319][SQL] 
Implement SHOW TABLES for Data Source V2 Tables
URL: https://github.com/apache/spark/pull/25247#discussion_r312773091
 
 

 ##
 File path: 
sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala
 ##
 @@ -1700,6 +1704,126 @@ class DataSourceV2SQLSuite extends QueryTest with 
SharedSQLContext with BeforeAn
 }
   }
 
+  test("ShowTables: using v2 catalog") {
+spark.sql("CREATE TABLE testcat.db.table_name (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.n1.n2.db.table_name (id bigint, data 
string) USING foo")
+
+runShowTablesSql("SHOW TABLES FROM testcat.db", Seq(Row("db", 
"table_name")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.n1.n2.db",
+  Seq(Row("n1.n2.db", "table_name")))
+  }
+
+  test("ShowTables: using v2 catalog with a pattern") {
+spark.sql("CREATE TABLE testcat.db.table (id bigint, data string) USING 
foo")
+spark.sql("CREATE TABLE testcat.db.table_name_1 (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.db.table_name_2 (id bigint, data string) 
USING foo")
+spark.sql("CREATE TABLE testcat.db2.table_name_2 (id bigint, data string) 
USING foo")
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db",
+  Seq(
+Row("db", "table"),
+Row("db", "table_name_1"),
+Row("db", "table_name_2")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db LIKE '*name*'",
+  Seq(Row("db", "table_name_1"), Row("db", "table_name_2")))
+
+runShowTablesSql(
+  "SHOW TABLES FROM testcat.db LIKE '*2'",
+  Seq(Row("db", "table_name_2")))
+  }
+
+  test("ShowTables: using v2 catalog, namespace doesn't exist") {
+runShowTablesSql("SHOW TABLES FROM testcat.unknown", Seq())
 
 Review comment:
   In current Spark, `SHOW TABLES FROM non-existing-db` would fail, shall we 
follow it?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312773068
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
+  cannot be achieved with (partitionId, epochId). e.g. source provides 
different number of
+  partitions for some reason, Spark optimization changes number of partitions, 
etc.
 
 Review comment:
   typo: some reasons


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312772909
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
+  cannot be achieved with (partitionId, epochId). e.g. source provides 
different number of
+  partitions for some reason, Spark optimization changes number of partitions, 
etc.
 
 Review comment:
   typo same reason?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312772909
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
+  cannot be achieved with (partitionId, epochId). e.g. source provides 
different number of
+  partitions for some reason, Spark optimization changes number of partitions, 
etc.
 
 Review comment:
   typo same reason?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312772909
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
+  cannot be achieved with (partitionId, epochId). e.g. source provides 
different number of
+  partitions for some reason, Spark optimization changes number of partitions, 
etc.
 
 Review comment:
   typo some reasons


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML

2019-08-11 Thread GitBox
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] 
Implement Tree-Based Feature Transformation for ML
URL: https://github.com/apache/spark/pull/25383#discussion_r312771157
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala
 ##
 @@ -78,6 +78,28 @@ private[spark] trait DecisionTreeModel {
 
   /** Convert to spark.mllib DecisionTreeModel (losing some information) */
   private[spark] def toOld: OldDecisionTreeModel
+
+  /** Returns an iterator that traverses (DFS, left to right) the leaves
+   *  in the subtree of this node.
+   */
+  private def leafIterator(node: Node): Iterator[LeafNode] = {
+node match {
+  case l: LeafNode => Iterator.single(l)
+  case n: InternalNode =>
+leafIterator(n.leftChild) ++ leafIterator(n.rightChild)
+}
+  }
+
+  @transient private lazy val leafIndices: Map[LeafNode, Int] = {
 
 Review comment:
   I had impled another leaf-transformation in the .mllib side 
https://github.com/apache/spark/pull/11520, and it used the sorted `leafId` as 
the output.
   However, in the .ml side, the `LeafNode` class do not contain a Id, and is 
exposed to the end user. So I tend to leave current `LeafNode` class alone.
   As to the extra memory pressure, I think its size O(#numLeaves * #numTrees) 
is much smaller than the model itself.
   WDYT @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML

2019-08-11 Thread GitBox
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] 
Implement Tree-Based Feature Transformation for ML
URL: https://github.com/apache/spark/pull/25383#discussion_r312771157
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala
 ##
 @@ -78,6 +78,28 @@ private[spark] trait DecisionTreeModel {
 
   /** Convert to spark.mllib DecisionTreeModel (losing some information) */
   private[spark] def toOld: OldDecisionTreeModel
+
+  /** Returns an iterator that traverses (DFS, left to right) the leaves
+   *  in the subtree of this node.
+   */
+  private def leafIterator(node: Node): Iterator[LeafNode] = {
+node match {
+  case l: LeafNode => Iterator.single(l)
+  case n: InternalNode =>
+leafIterator(n.leftChild) ++ leafIterator(n.rightChild)
+}
+  }
+
+  @transient private lazy val leafIndices: Map[LeafNode, Int] = {
 
 Review comment:
   I had impled another leaf-transformation in the .mllib side 
https://github.com/apache/spark/pull/11520, and it used the sorted `leafId` as 
the output.
   However, in the .ml side, the `LeafNode` class do not contain a Id, and is 
exposed to the end user. So I tend to leave current `LeafNode` class alone.
   As to the extra memory pressure, I think it is much smaller than the model 
itself.
   WDYT @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25407: 
[SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
URL: https://github.com/apache/spark/pull/25407#discussion_r312772609
 
 

 ##
 File path: docs/structured-streaming-programming-guide.md
 ##
 @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the 
function or the object’s
 
 - The close() method (if it exists) is called if an open() method exists and 
returns successfully (irrespective of the return value), except if the JVM or 
Python process crashes in the middle.
 
-- **Note:** The partitionId and epochId in the open() method can be used to 
deduplicate generated data 
-  when failures cause reprocessing of some input data. This depends on the 
execution mode of the query. 
-  If the streaming query is being executed in the micro-batch mode, then every 
partition represented 
-  by a unique tuple (partition_id, epoch_id) is guaranteed to have the same 
data. 
-  Hence, (partition_id, epoch_id) can be used to deduplicate and/or 
transactionally commit 
-  data and achieve exactly-once guarantees. However, if the streaming query is 
being executed 
-  in the continuous mode, then this guarantee does not hold and therefore 
should not be used for deduplication.
+- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on 
failure, so deduplication
 
 Review comment:
   no big deal but I usually avoid abbreviation in the doc. `doesn't` -> `does 
not`


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] 
Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#discussion_r312772428
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala
 ##
 @@ -455,6 +456,22 @@ object DateTimeUtils {
 (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt
   }
 
+  /**
+   * Returns seconds, including fractional parts, multiplied by 1000. The 
timestamp
 
 Review comment:
   @MaxGekk, if 
https://github.com/apache/spark/pull/25408#discussion_r312748606 matters, we 
could 1,000 or 1,000,000, I believe.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] 
Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#discussion_r312772288
 
 

 ##
 File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala
 ##
 @@ -1876,3 +1930,22 @@ case class Decade(child: Expression) extends 
UnaryExpression with ImplicitCastIn
 defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)")
   }
 }
+
+case class Epoch(child: Expression, timeZoneId: Option[String] = None)
+extends UnaryExpression with ImplicitCastInputTypes with 
TimeZoneAwareExpression {
+
+  override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType)
+  override def dataType: DataType = DecimalType(20, 6)
 
 Review comment:
   @MaxGekk, Out of curiosity, why is it `DecimalType(20, 6)`?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator field leak

2019-08-11 Thread GitBox
ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] 
ExternalSorter's readingIterator field leak
URL: https://github.com/apache/spark/pull/23083#discussion_r312770904
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/CompletionIterator.scala
 ##
 @@ -25,11 +25,14 @@ private[spark]
 abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends 
Iterator[A] {
 
   private[this] var completed = false
-  def next(): A = sub.next()
+  private[this] var iter = sub
+  def next(): A = iter.next()
   def hasNext: Boolean = {
-val r = sub.hasNext
+val r = iter.hasNext
 if (!r && !completed) {
   completed = true
+  // reassign to release resources of highly resource consuming iterators 
early
+  iter = Iterator.empty.asInstanceOf[I]
 
 Review comment:
   Thanks, szhem :)
   your UT explains all. 
   at first I misunderstand sub as CompletionIterator(val sub)
   Hided, well done!
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML

2019-08-11 Thread GitBox
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] 
Implement Tree-Based Feature Transformation for ML
URL: https://github.com/apache/spark/pull/25383#discussion_r312771512
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala
 ##
 @@ -455,7 +508,19 @@ private[ml] object GBTClassifierParams {
 Array("logistic").map(_.toLowerCase(Locale.ROOT))
 }
 
-private[ml] trait GBTClassifierParams extends GBTParams with 
HasVarianceImpurity {
+private[ml] trait GBTClassifierParams extends GBTParams with 
HasVarianceImpurity
+  with ProbabilisticClassifierParams {
+
+  override protected def validateAndTransformSchema(
 
 Review comment:
   Good point, I will look into it.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] 
Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
URL: https://github.com/apache/spark/pull/25408#discussion_r312771506
 
 

 ##
 File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/date.sql
 ##
 @@ -228,9 +228,9 @@ SELECT f1 - date '2000-01-01' AS `Days From 2K` FROM 
DATE_TBL;
 -- test extract!
 --
 -- epoch
---
--- SELECT EXTRACT(EPOCH FROM DATE'1970-01-01'); --  0
--- SELECT EXTRACT(EPOCH FROM TIMESTAMP   '1970-01-01'); --  0
 
 Review comment:
   Seems fixed as of https://github.com/apache/spark/pull/25357 . Does this 
still fail?


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML

2019-08-11 Thread GitBox
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] 
Implement Tree-Based Feature Transformation for ML
URL: https://github.com/apache/spark/pull/25383#discussion_r312771157
 
 

 ##
 File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala
 ##
 @@ -78,6 +78,28 @@ private[spark] trait DecisionTreeModel {
 
   /** Convert to spark.mllib DecisionTreeModel (losing some information) */
   private[spark] def toOld: OldDecisionTreeModel
+
+  /** Returns an iterator that traverses (DFS, left to right) the leaves
+   *  in the subtree of this node.
+   */
+  private def leafIterator(node: Node): Iterator[LeafNode] = {
+node match {
+  case l: LeafNode => Iterator.single(l)
+  case n: InternalNode =>
+leafIterator(n.leftChild) ++ leafIterator(n.rightChild)
+}
+  }
+
+  @transient private lazy val leafIndices: Map[LeafNode, Int] = {
 
 Review comment:
   I had impled another leaf-transformation in the .mllib side 
https://github.com/apache/spark/pull/11520/files, and it used the sorted 
`leafId` as the output.
   However, in the .ml side, the `LeafNode` class do not contain a Id, and is 
exposed to the end user. So I tend to leave current `LeafNode` class alone.
   As to the extra memory pressure, I think it is much smaller than the model 
itself.
   WDYT @srowen 


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] SparkQA commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-11 Thread GitBox
SparkQA commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to 
track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-520287517
 
 
   **[Test build #108946 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108946/testReport)**
 for PR 25368 at commit 
[`b07790d`](https://github.com/apache/spark/commit/b07790d133346d24ef92695ae1a61ad755e988cf).


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-520287324
 
 
   Merged build finished. Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs

2019-08-11 Thread GitBox
AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create 
CatalogManager to track registered v2 catalogs
URL: https://github.com/apache/spark/pull/25368#issuecomment-520287328
 
 
   Test PASSed.
   Refer to this link for build results (access rights to CI server needed): 
   
https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14019/
   Test PASSed.


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] HyukjinKwon commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone

2019-08-11 Thread GitBox
HyukjinKwon commented on a change in pull request #25409: [SPARK-28414][WEBUI] 
UI updates to show resource info in Standalone
URL: https://github.com/apache/spark/pull/25409#discussion_r312770930
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala
 ##
 @@ -242,6 +243,22 @@ private[deploy] class Worker(
   System.exit(1)
 }
 }
+resources.foreach { case (rName, _) =>
 
 Review comment:
   nit:
   
   ```scala
   resources.keys.foreach { rName =>
 resourcesUsed(rName) = new ResourceInformation(rName, Array.empty[String])
   }
   ```


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator field leak

2019-08-11 Thread GitBox
ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] 
ExternalSorter's readingIterator field leak
URL: https://github.com/apache/spark/pull/23083#discussion_r312770904
 
 

 ##
 File path: core/src/main/scala/org/apache/spark/util/CompletionIterator.scala
 ##
 @@ -25,11 +25,14 @@ private[spark]
 abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends 
Iterator[A] {
 
   private[this] var completed = false
-  def next(): A = sub.next()
+  private[this] var iter = sub
+  def next(): A = iter.next()
   def hasNext: Boolean = {
-val r = sub.hasNext
+val r = iter.hasNext
 if (!r && !completed) {
   completed = true
+  // reassign to release resources of highly resource consuming iterators 
early
+  iter = Iterator.empty.asInstanceOf[I]
 
 Review comment:
   Thanks, szhem :)
   your UT explains all. 
   at first I misunderstand sub as CompletionIterator(val sub)
   BTW, sub is absolutely not type of strong reference, do you konw what is 
sub's reference type. 
   


This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



  1   2   3   4   >