[GitHub] [spark] sandeep-katta commented on issue #25399: [SPARK-28670][SQL] create function should thrown Exception if the resource is not found
sandeep-katta commented on issue #25399: [SPARK-28670][SQL] create function should thrown Exception if the resource is not found URL: https://github.com/apache/spark/pull/25399#issuecomment-520306332 **Hive** In both (temporary and permanent) the cases query execution fails **case i: temporary function** jdbc:hive2://vm1:21066/> create temporary function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'; INFO : Executing command(queryId=omm_20190812133851_e58dd117-e8b1-40b6-8659-5ad14eddfdd6): create temporary function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'--0; Current sessionId=dcdc1020-3f73-4af2-95db-e834abda2020 **ERROR : File does not exist: hdfs://hacluster/user/AddDoublesUDF1.jar** Error: Error while processing statement: FAILED: Execution Error, return code -101 from **org.apache.hadoop.hive.ql.exec.FunctionTask. java.io.FileNotFoundException:** File does not exist: hdfs://hacluster/user/AddDoublesUDF1.jar (state=08S01,code=-101) **case ii: Permanent function** jdbc:hive2://vm1:21066/> create function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'; INFO : Executing command(queryId=omm_20190812133902_54e39039-b678-493e-93c2-8c09ce5bcfc0): create function addm AS 'com.huawei.bigdata.hive.example.udf.AddDoublesUDF' using jar 'hdfs://hacluster/user/AddDoublesUDF1.jar'--0; Current sessionId=dcdc1020-3f73-4af2-95db-e834abda2020 INFO : Starting task [Stage-0:FUNC] in serial mode **ERROR : File does not exist: hdfs://hacluster/user/AddDoublesUDF1.jar** ERROR : Failed to register default.addm using class com.huawei.bigdata.hive.example.udf.AddDoublesUDF Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.FunctionTask (state=08S01,code=1) **Presto:** As per presto there is no such temporary or permanent function concept, user needs to implement the UDF as plugin and put in the plugin folder and restart presto server. details about presto UDF are [here](https://www.qubole.com/blog/plugging-in-presto-udfs/) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dilipbiswal commented on issue #25331: [SPARK-27768][SQL] Infinity, -Infinity, NaN should be recognized in a case insensitive manner.
dilipbiswal commented on issue #25331: [SPARK-27768][SQL] Infinity, -Infinity, NaN should be recognized in a case insensitive manner. URL: https://github.com/apache/spark/pull/25331#issuecomment-520306306 gentle ping @dongjoon-hyun This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
AmplabJenkins removed a comment on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-510987092 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark
SparkQA commented on issue #24936: [SPARK-24634][SS] Add a new metric regarding number of rows later than watermark URL: https://github.com/apache/spark/pull/24936#issuecomment-520305959 **[Test build #108952 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108952/testReport)** for PR 24936 at commit [`54d159f`](https://github.com/apache/spark/commit/54d159fec0203f4edf615c8fd552df6c1f0b604f). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520304855 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108948/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520304851 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
SparkQA removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520294055 **[Test build #108948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108948/testReport)** for PR 25201 at commit [`ac87ffc`](https://github.com/apache/spark/commit/ac87ffc95e0f95c147a725396447518811472c8a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon closed pull request #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
HyukjinKwon closed pull request #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520304855 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108948/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520304851 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520304737 **[Test build #108948 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108948/testReport)** for PR 25201 at commit [`ac87ffc`](https://github.com/apache/spark/commit/ac87ffc95e0f95c147a725396447518811472c8a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#discussion_r312784188 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/LookupCatalog.scala ## @@ -62,6 +62,9 @@ trait LookupCatalog extends Logging { try { Some(lookupCatalog(SESSION_CATALOG_NAME)) } catch { + case _: CatalogNotFoundException => +logWarning("Session catalog is not defined") +None Review comment: @dongjoon-hyun Thanks for the review. Your command line is not the case I tried to fix in the PR. In your case, the stack trace is helpful. It seems that the current master has session catalog defined by default, so here is the command line to reproduce my case: ``` $ bin/spark-shell --master 'local[*]' --conf spark.sql.catalog.session= ... Spark context available as 'sc' (master = local[*], app id = local-1565588237201). Spark session available as 'spark'. ... scala> spark.sessionState.analyzer.sessionCatalog ... 2019-08-11 22:37:24,216 ERROR [main] hive.HiveSessionStateBuilder$$anon$1 (Logging.scala:logError(94)) - Cannot load v2 session catalog org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'session': at org.apache.spark.sql.catalog.v2.Catalogs.load(Catalogs.java:81) ... res0: Option[org.apache.spark.sql.catalog.v2.CatalogPlugin] = None ``` Here the stack trace does not add more information. And I am concerned that if any rule uses session catalog, we will see this long stack trace again and again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined
jzhuge commented on a change in pull request #25372: [SPARK-28640][SQL] Only give warning when session catalog is not defined URL: https://github.com/apache/spark/pull/25372#discussion_r312784188 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalog/v2/LookupCatalog.scala ## @@ -62,6 +62,9 @@ trait LookupCatalog extends Logging { try { Some(lookupCatalog(SESSION_CATALOG_NAME)) } catch { + case _: CatalogNotFoundException => +logWarning("Session catalog is not defined") +None Review comment: @dongjoon-hyun Thanks for the review. Your command line is not the case I tried to fix in the PR. In your case, the stack trace is helpful. It seems that the current master has session catalog defined by default, so here is the command line to reproduce my case: ``` $ bin/spark-shell --master 'local[*]' --conf spark.sql.catalog.session= ... Spark context available as 'sc' (master = local[*], app id = local-1565588237201). Spark session available as 'spark'. ... scala> spark.sessionState.analyzer.sessionCatalog ... 2019-08-11 22:37:24,216 ERROR [main] hive.HiveSessionStateBuilder$$anon$1 (Logging.scala:logError(94)) - Cannot load v2 session catalog org.apache.spark.SparkException: Cannot find catalog plugin class for catalog 'session': at org.apache.spark.sql.catalog.v2.Catalogs.load(Catalogs.java:81) ... res0: Option[org.apache.spark.sql.catalog.v2.CatalogPlugin] = None ``` Here is the stack trace does not add more information. And I am concerned that if any rule uses session catalog, we will see this long stack trace again and again. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11 URL: https://github.com/apache/spark/pull/25414#issuecomment-520303874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14023/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
AmplabJenkins removed a comment on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11 URL: https://github.com/apache/spark/pull/25414#issuecomment-520303872 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
SparkQA commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11 URL: https://github.com/apache/spark/pull/25414#issuecomment-520304097 **[Test build #108951 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108951/testReport)** for PR 25414 at commit [`978a21a`](https://github.com/apache/spark/commit/978a21af5250c31525a4ba2fab96175a1e93871b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
HyukjinKwon commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520303909 Merged to master. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11 URL: https://github.com/apache/spark/pull/25414#issuecomment-520303872 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
AmplabJenkins commented on issue #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11 URL: https://github.com/apache/spark/pull/25414#issuecomment-520303874 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14023/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on issue #25405: [SPARK-28685][SQL][TEST] Test HMS 2.0.0+ in VersionsSuite/HiveClientSuites on JDK 11
HyukjinKwon commented on issue #25405: [SPARK-28685][SQL][TEST] Test HMS 2.0.0+ in VersionsSuite/HiveClientSuites on JDK 11 URL: https://github.com/apache/spark/pull/25405#issuecomment-520303800 @shaneknapp .. actually, adding `test-java11` in PR builder turned to be a bit important now.. @wangyum, mind filing an JIRA? For some context as FYI, for JDK 11 support, I asked feasibility about Hive 2.3.6 release at [this Hive dev thread](http://mail-archives.apache.org/mod_mbox/hive-dev/201908.mbox/%3CCANQiJeV3VM0iVp%2BgTwKPpx9dHeXe0BcLAicmWU5EPDOptb%2B_%2BQ%40mail.gmail.com%3E). Thankfully, seems positive. @wangyum quickly started worked on that and @alanfgates (from Hive) is actively cooperative - thanks again. Therefore, two communities are cooperating at [SPARK-28684](https://issues.apache.org/jira/browse/SPARK-28684) and [HIVE-22096](https://issues.apache.org/jira/browse/HIVE-22096) . It might be much easier if we can test JDK 11 in PR builder in Spark side. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum opened a new pull request #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11
wangyum opened a new pull request #25414: [SPARK-28688][SQL][TEST] Skip test `read hive materialized view` since Hive 3.0 in VersionsSuite.scala on JDK 11 URL: https://github.com/apache/spark/pull/25414 ## What changes were proposed in this pull request? This PR makes it skip test `read hive materialized view` since Hive 3.0 in `VersionsSuite.scala` on JDK 11 because [HIVE-19383](https://issues.apache.org/jira/browse/HIVE-19383) added [ArrayList$SubList](https://github.com/apache/hive/blob/ae4df627952610dbec029b099f0964908b3a4f25/ql/src/java/org/apache/hadoop/hive/ql/exec/SerializationUtilities.java#L383) which is incompatible with JDK 11: ```java java.lang.RuntimeException: java.lang.NoSuchFieldException: parentOffset at org.apache.hadoop.hive.ql.exec.SerializationUtilities$ArrayListSubListSerializer.(SerializationUtilities.java:389) at org.apache.hadoop.hive.ql.exec.SerializationUtilities$1.create(SerializationUtilities.java:235) ... ``` ![image](https://issues.apache.org/jira/secure/attachment/12977250/12977250_screenshot-2.png) ![image](https://issues.apache.org/jira/secure/attachment/12977249/12977249_screenshot-1.png) ## How was this patch tested? manual tests **Test on JDK 11**: ``` ... [info] - 2.3: sql read hive materialized view (1 second, 253 milliseconds) ... [info] - 3.0: sql read hive materialized view !!! CANCELED !!! (31 milliseconds) [info] org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) was true, and "[3.0]" did not equal "[2.3]" (VersionsSuite.scala:624) ... [info] - 3.1: sql read hive materialized view !!! CANCELED !!! (1 millisecond) [info] org.apache.commons.lang3.SystemUtils.isJavaVersionAtLeast(JAVA_9) was true, and "[3.1]" did not equal "[2.3]" (VersionsSuite.scala:624) ... ``` **Test on JDK 1.8**: ``` ... [info] - 2.3: sql read hive materialized view (1 second, 444 milliseconds) ... [info] - 3.0: sql read hive materialized view (3 seconds, 100 milliseconds) ... [info] - 3.1: sql read hive materialized view (2 seconds, 941 milliseconds) ... ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520301172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14022/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520301169 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520301164 Kubernetes integration test status success URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14022/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520301169 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520301172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14022/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error
AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error URL: https://github.com/apache/spark/pull/25333#issuecomment-520299294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108943/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error
AmplabJenkins removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error URL: https://github.com/apache/spark/pull/25333#issuecomment-520299290 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error
AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error URL: https://github.com/apache/spark/pull/25333#issuecomment-520299290 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error
AmplabJenkins commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error URL: https://github.com/apache/spark/pull/25333#issuecomment-520299294 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108943/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error
SparkQA removed a comment on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error URL: https://github.com/apache/spark/pull/25333#issuecomment-520279181 **[Test build #108943 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108943/testReport)** for PR 25333 at commit [`757491e`](https://github.com/apache/spark/commit/757491e3433fcd68d852abcfff26dfe07a2e07f4). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520299074 Kubernetes integration test starting URL: https://amplab.cs.berkeley.edu/jenkins/job/testing-k8s-prb-make-spark-distribution-unified/14022/ This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error
SparkQA commented on issue #25333: [SPARK-28597][SS] Add config to retry spark streaming's meta log when it met error URL: https://github.com/apache/spark/pull/25333#issuecomment-520299105 **[Test build #108943 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108943/testReport)** for PR 25333 at commit [`757491e`](https://github.com/apache/spark/commit/757491e3433fcd68d852abcfff26dfe07a2e07f4). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520298813 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520298817 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108950/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520298817 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108950/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
SparkQA removed a comment on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520297350 **[Test build #108950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108950/testReport)** for PR 25411 at commit [`bbc2e70`](https://github.com/apache/spark/commit/bbc2e708844fbcb18eacbfc404d75f67f17818d3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
AmplabJenkins commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520298813 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520298780 **[Test build #108950 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108950/testReport)** for PR 25411 at commit [`bbc2e70`](https://github.com/apache/spark/commit/bbc2e708844fbcb18eacbfc404d75f67f17818d3). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25360: [SPARK-28280][PYTHON][SQL][TESTS][FOLLOW-UP] Add UDF cases into group by clause in 'udf-group-by.sql'
HyukjinKwon commented on a change in pull request #25360: [SPARK-28280][PYTHON][SQL][TESTS][FOLLOW-UP] Add UDF cases into group by clause in 'udf-group-by.sql' URL: https://github.com/apache/spark/pull/25360#discussion_r312779019 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/udf-group-by.sql ## @@ -20,29 +20,25 @@ SELECT 'foo', COUNT(udf(a)) FROM testData GROUP BY 1; SELECT 'foo' FROM testData WHERE a = 0 GROUP BY udf(1); -- Aggregate grouped by literals (hash aggregate). -SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1; +SELECT 'foo', udf(APPROX_COUNT_DISTINCT(udf(a))) FROM testData WHERE a = 0 GROUP BY udf(1); -- Aggregate grouped by literals (sort aggregate). -SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY 1; +SELECT 'foo', MAX(STRUCT(udf(a))) FROM testData WHERE a = 0 GROUP BY udf(1); -- Aggregate with complex GroupBy expressions. SELECT udf(a + b), udf(COUNT(b)) FROM testData GROUP BY a + b; SELECT udf(a + 2), udf(COUNT(b)) FROM testData GROUP BY a + 1; - --- [SPARK-28445] Inconsistency between Scala and Python/Panda udfs when groupby with udf() is used --- The following query will make Scala UDF work, but Python and Pandas udfs will fail with an AnalysisException. --- The query should be added after SPARK-28445. --- SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 1); +SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 1); Review comment: @skonto, looks all fine except this one. Let's fix and I'll get this in. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25371: [SPARK-28393][SQL][PYTHON][TESTS] Convert and port 'pgSQL/join.sql' into UDF test base
HyukjinKwon commented on a change in pull request #25371: [SPARK-28393][SQL][PYTHON][TESTS] Convert and port 'pgSQL/join.sql' into UDF test base URL: https://github.com/apache/spark/pull/25371#discussion_r312778802 ## File path: sql/core/src/test/resources/sql-tests/inputs/udf/pgSQL/udf-join.sql ## @@ -0,0 +1,2081 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- JOIN +-- Test JOIN clauses +-- https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/join.sql +-- +-- This test file was converted from pgSQL/join.sql. + +CREATE OR REPLACE TEMPORARY VIEW INT4_TBL AS SELECT * FROM + (VALUES (0), (123456), (-123456), (2147483647), (-2147483647)) + AS v(f1); +CREATE OR REPLACE TEMPORARY VIEW INT8_TBL AS SELECT * FROM + (VALUES +(123, 456), +(123, 4567890123456789), +(4567890123456789, 123), +(4567890123456789, 4567890123456789), +(4567890123456789, -4567890123456789)) + AS v(q1, q2); +CREATE OR REPLACE TEMPORARY VIEW FLOAT8_TBL AS SELECT * FROM + (VALUES (0.0), (1004.30), (-34.84), +(cast('1.2345678901234e+200' as double)), (cast('1.2345678901234e-200' as double))) + AS v(f1); +CREATE OR REPLACE TEMPORARY VIEW TEXT_TBL AS SELECT * FROM + (VALUES ('doh!'), ('hi de ho neighbor')) + AS v(f1); +CREATE OR REPLACE TEMPORARY VIEW tenk2 AS SELECT * FROM tenk1; + +CREATE TABLE J1_TBL ( + i integer, + j integer, + t string +) USING parquet; + +CREATE TABLE J2_TBL ( + i integer, + k integer +) USING parquet; + + +INSERT INTO J1_TBL VALUES (1, 4, 'one'); +INSERT INTO J1_TBL VALUES (2, 3, 'two'); +INSERT INTO J1_TBL VALUES (3, 2, 'three'); +INSERT INTO J1_TBL VALUES (4, 1, 'four'); +INSERT INTO J1_TBL VALUES (5, 0, 'five'); +INSERT INTO J1_TBL VALUES (6, 6, 'six'); +INSERT INTO J1_TBL VALUES (7, 7, 'seven'); +INSERT INTO J1_TBL VALUES (8, 8, 'eight'); +INSERT INTO J1_TBL VALUES (0, NULL, 'zero'); +INSERT INTO J1_TBL VALUES (NULL, NULL, 'null'); +INSERT INTO J1_TBL VALUES (NULL, 0, 'zero'); + +INSERT INTO J2_TBL VALUES (1, -1); +INSERT INTO J2_TBL VALUES (2, 2); +INSERT INTO J2_TBL VALUES (3, -3); +INSERT INTO J2_TBL VALUES (2, 4); +INSERT INTO J2_TBL VALUES (5, -5); +INSERT INTO J2_TBL VALUES (5, -5); +INSERT INTO J2_TBL VALUES (0, NULL); +INSERT INTO J2_TBL VALUES (NULL, NULL); +INSERT INTO J2_TBL VALUES (NULL, 0); + +-- [SPARK-20856] Do not need onerow because it only used for test statement using nested joins +-- useful in some tests below +-- create temp table onerow(); +-- insert into onerow default values; +-- analyze onerow; + + +-- +-- CORRELATION NAMES +-- Make sure that table/column aliases are supported +-- before diving into more complex join syntax. +-- + +SELECT udf('') AS `xxx`, udf(i), udf(j), udf(t) Review comment: @huaxingao, seems this file almost added `udf(...)` once for every possibility. Can we use other combinations in general? For instance, ``` udf(...) udf(udf(...)) ``` or ``` on (udf(...) = ...) on (... = udf(...)) on (udf(...) = udf(udf(...))) ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#issuecomment-520297336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108945/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
AmplabJenkins removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#issuecomment-520297334 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#issuecomment-520297336 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/108945/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
SparkQA commented on issue #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#issuecomment-520297350 **[Test build #108950 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108950/testReport)** for PR 25411 at commit [`bbc2e70`](https://github.com/apache/spark/commit/bbc2e708844fbcb18eacbfc404d75f67f17818d3). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
AmplabJenkins commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#issuecomment-520297334 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
SparkQA removed a comment on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#issuecomment-520283851 **[Test build #108945 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108945/testReport)** for PR 25409 at commit [`fbb4382`](https://github.com/apache/spark/commit/fbb4382d282be5e299e64da88e8ead3bd1d45292). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on a change in pull request #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor
viirya commented on a change in pull request #25411: [SPARK-28652][TESTS][K8S] Add python version check for executor URL: https://github.com/apache/spark/pull/25411#discussion_r312778434 ## File path: resource-managers/kubernetes/integration-tests/tests/pyfiles.py ## @@ -35,4 +36,11 @@ # Begin of Python container checks version_check(sys.argv[1], 2 if sys.argv[1] == "python" else 3) +# Check python executable at executors +spark.catalog.registerFunction("getSysVer", + lambda: "%d.%d" % sys.version_info[:2], StringType()) +[row] = spark.sql("SELECT getSysVer()").collect() +driverVersion = "%d.%d" % sys.version_info[:2] Review comment: Yes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
SparkQA commented on issue #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#issuecomment-520297134 **[Test build #108945 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108945/testReport)** for PR 25409 at commit [`fbb4382`](https://github.com/apache/spark/commit/fbb4382d282be5e299e64da88e8ead3bd1d45292). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 2.12.9
dongjoon-hyun edited a comment on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 2.12.9 URL: https://github.com/apache/spark/pull/25404#issuecomment-520296446 Hi, All. Let's reuse SPARK-28683 for 2.12.10. I reopened the JIRA issue for that. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 2.12.9
dongjoon-hyun commented on issue #25404: [SPARK-28683][BUILD] Upgrade Scala to 2.12.9 URL: https://github.com/apache/spark/pull/25404#issuecomment-520296446 Hi, All. Let's reuse SPARK-28683 for 2.12.10. I reopened it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables
imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables URL: https://github.com/apache/spark/pull/25247#discussion_r312777236 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala ## @@ -1700,6 +1704,126 @@ class DataSourceV2SQLSuite extends QueryTest with SharedSQLContext with BeforeAn } } + test("ShowTables: using v2 catalog") { +spark.sql("CREATE TABLE testcat.db.table_name (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.n1.n2.db.table_name (id bigint, data string) USING foo") + +runShowTablesSql("SHOW TABLES FROM testcat.db", Seq(Row("db", "table_name"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.n1.n2.db", + Seq(Row("n1.n2.db", "table_name"))) + } + + test("ShowTables: using v2 catalog with a pattern") { +spark.sql("CREATE TABLE testcat.db.table (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db.table_name_1 (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db.table_name_2 (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db2.table_name_2 (id bigint, data string) USING foo") + +runShowTablesSql( + "SHOW TABLES FROM testcat.db", + Seq( +Row("db", "table"), +Row("db", "table_name_1"), +Row("db", "table_name_2"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.db LIKE '*name*'", + Seq(Row("db", "table_name_1"), Row("db", "table_name_2"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.db LIKE '*2'", + Seq(Row("db", "table_name_2"))) + } + + test("ShowTables: using v2 catalog, namespace doesn't exist") { +runShowTablesSql("SHOW TABLES FROM testcat.unknown", Seq()) Review comment: @cloud-fan as far as I understand, throwing `NoSuchNamespaceException` is optional in v2: ``` /** * List the tables in a namespace from the catalog. * * If the catalog supports views, this must return identifiers for only tables and not views. * * @param namespace a multi-part namespace * @return an array of Identifiers for tables * @throws NoSuchNamespaceException If the namespace does not exist (optional). */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; ``` I can update `TestInMemoryTableCatalog` to throw NoSuchNamespaceException if there is no namespace existing for the tables created. However, I am not sure if this is the right approach since you could have created namespace without tables - in v1, you could have done `CREATE DATABASE db` without creating tables belonging to `db`, although I don't think this scenario is supported in v2 yet. Please advise how this needs to be handled. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#issuecomment-520295435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14021/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
AmplabJenkins removed a comment on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#issuecomment-520295432 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24601: [SPARK-27702][K8S] Allow using some alternatives for service accounts
dongjoon-hyun commented on issue #24601: [SPARK-27702][K8S] Allow using some alternatives for service accounts URL: https://github.com/apache/spark/pull/24601#issuecomment-520295526 Sorry for being later, @Udbhav30 . If then, could you describe the test procedure in the `How was this patch tested?` section of PR description? For example, how to create new service account and how to use that? > i am unable to simulate this from minikube as there will always be a default service account so i am not sure if i could write any test case for this. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables
imback82 commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables URL: https://github.com/apache/spark/pull/25247#discussion_r312777236 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala ## @@ -1700,6 +1704,126 @@ class DataSourceV2SQLSuite extends QueryTest with SharedSQLContext with BeforeAn } } + test("ShowTables: using v2 catalog") { +spark.sql("CREATE TABLE testcat.db.table_name (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.n1.n2.db.table_name (id bigint, data string) USING foo") + +runShowTablesSql("SHOW TABLES FROM testcat.db", Seq(Row("db", "table_name"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.n1.n2.db", + Seq(Row("n1.n2.db", "table_name"))) + } + + test("ShowTables: using v2 catalog with a pattern") { +spark.sql("CREATE TABLE testcat.db.table (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db.table_name_1 (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db.table_name_2 (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db2.table_name_2 (id bigint, data string) USING foo") + +runShowTablesSql( + "SHOW TABLES FROM testcat.db", + Seq( +Row("db", "table"), +Row("db", "table_name_1"), +Row("db", "table_name_2"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.db LIKE '*name*'", + Seq(Row("db", "table_name_1"), Row("db", "table_name_2"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.db LIKE '*2'", + Seq(Row("db", "table_name_2"))) + } + + test("ShowTables: using v2 catalog, namespace doesn't exist") { +runShowTablesSql("SHOW TABLES FROM testcat.unknown", Seq()) Review comment: @cloud-fan as far as I understand, throwing `NoSuchNamespaceException` is optional: ``` /** * List the tables in a namespace from the catalog. * * If the catalog supports views, this must return identifiers for only tables and not views. * * @param namespace a multi-part namespace * @return an array of Identifiers for tables * @throws NoSuchNamespaceException If the namespace does not exist (optional). */ Identifier[] listTables(String[] namespace) throws NoSuchNamespaceException; ``` I can update `TestInMemoryTableCatalog` to throw NoSuchNamespaceException if there is no namespace existing for the tables created. However, I am not sure if this is the right approach since you could have created namespace without tables - in v1, you could have done `CREATE DATABASE db` without creating tables belonging to `db`, although I don't think this scenario is supported in v2 yet. Please advise how this needs to be handled. Thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#issuecomment-520295432 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
AmplabJenkins commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#issuecomment-520295435 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14021/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#discussion_r312776947 ## File path: sql/core/src/test/scala/org/apache/spark/sql/execution/SQLViewSuite.scala ## @@ -159,7 +159,9 @@ abstract class SQLViewSuite extends QueryTest with SQLTestUtils { Thread.currentThread().getContextClassLoader.getResource("data/files/employee.dat") assertNoSuchTable(s"""LOAD DATA LOCAL INPATH "$dataFilePath" INTO TABLE $viewName""") assertNoSuchTable(s"TRUNCATE TABLE $viewName") - assertNoSuchTable(s"SHOW CREATE TABLE $viewName") + intercept[AnalysisException] { Review comment: As you know already, we need to check the actual error message always because `AnalysisException` can hide the regressions. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
dongjoon-hyun commented on a change in pull request #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#discussion_r312776831 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/command/tables.scala ## @@ -949,16 +949,22 @@ case class ShowCreateTableCommand(table: TableIdentifier) extends RunnableComman override def run(sparkSession: SparkSession): Seq[Row] = { val catalog = sparkSession.sessionState.catalog -val tableMetadata = catalog.getTableMetadata(table) - -// TODO: unify this after we unify the CREATE TABLE syntax for hive serde and data source table. -val stmt = if (DDLUtils.isDatasourceTable(tableMetadata)) { - showCreateDataSourceTable(tableMetadata) +if (catalog.isTemporaryTable(table)) { + throw new AnalysisException( +s"SHOW CREATE TABLE is not supported on a temporary view: ${table.identifier}") } else { - showCreateHiveTable(tableMetadata) -} + val tableMetadata = catalog.getTableMetadata(table) -Seq(Row(stmt)) + // TODO: unify this after we unify the Review comment: Hi, @wangyum . I know that this `TODO` is not generated in this PR, but this is a good change to make this to `IDed TODO`. It will help the other contributors pick up this issue. Could you file a JIRA for this and use that JIRA ID here, please? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
SparkQA commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#issuecomment-520294838 **[Test build #108949 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108949/testReport)** for PR 25149 at commit [`38cf574`](https://github.com/apache/spark/commit/38cf57471416e74769e88c04aaae288e1c4be309). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view
dongjoon-hyun commented on issue #25149: [SPARK-28383][SQL] SHOW CREATE TABLE is not supported on a temporary view URL: https://github.com/apache/spark/pull/25149#issuecomment-520294719 Retest this please. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag
dongjoon-hyun commented on issue #25229: [SPARK-27900][K8s] Add jvm oom flag URL: https://github.com/apache/spark/pull/25229#issuecomment-520294492 Thank you, @skonto ! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
SparkQA commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520294055 **[Test build #108948 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108948/testReport)** for PR 25201 at commit [`ac87ffc`](https://github.com/apache/spark/commit/ac87ffc95e0f95c147a725396447518811472c8a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` URL: https://github.com/apache/spark/pull/25408#discussion_r312776086 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -455,6 +456,22 @@ object DateTimeUtils { (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt } + /** + * Returns seconds, including fractional parts, multiplied by 1000. The timestamp Review comment: +1 for @HyukjinKwon 's advice . My reasons are here, @MaxGekk . - https://github.com/apache/spark/pull/25408#discussion_r312775959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520293856 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` URL: https://github.com/apache/spark/pull/25408#discussion_r312776086 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -455,6 +456,22 @@ object DateTimeUtils { (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt } + /** + * Returns seconds, including fractional parts, multiplied by 1000. The timestamp Review comment: +1 for @HyukjinKwon . My reasons are here, @MaxGekk . - https://github.com/apache/spark/pull/25408#discussion_r312775959 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins removed a comment on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520293859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14020/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520293856 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication .
AmplabJenkins commented on issue #25201: [SPARK-28419][SQL] Enable SparkThriftServer support proxy user's authentication . URL: https://github.com/apache/spark/pull/25201#issuecomment-520293859 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14020/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
dongjoon-hyun commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` URL: https://github.com/apache/spark/pull/25408#discussion_r312775959 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -455,6 +456,22 @@ object DateTimeUtils { (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt } + /** + * Returns seconds, including fractional parts, multiplied by 1 000. The timestamp Review comment: Let me ask you in this way, @MaxGekk . 1. If you think this is better, why not Java style `1_000_000` ? (AFAIK, this is supported since JDK7). 2. And, does Apache Spark have `1 000 000` or `1 000` in our code? 3. In this PR, I don't find any explanation why you introduce this new style in Apache Spark community in this PR. 4. Finally, this is Apache Spark doc, not `PostgreSQL` doc. A blind `copy` is unable to justify that you thought this is better. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API
SparkQA commented on issue #25354: [SPARK-28612][SQL] Add DataFrameWriterV2 API URL: https://github.com/apache/spark/pull/25354#issuecomment-520293335 **[Test build #108947 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108947/testReport)** for PR 25354 at commit [`4538721`](https://github.com/apache/spark/commit/45387211d054400017ac83627b7b40887f614d16). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #25413: Merge pull request #1 from apache/master
AmplabJenkins commented on issue #25413: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/25413#issuecomment-520292459 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] pvk2727 closed pull request #25413: Merge pull request #1 from apache/master
pvk2727 closed pull request #25413: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/25413 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] pvk2727 opened a new pull request #25413: Merge pull request #1 from apache/master
pvk2727 opened a new pull request #25413: Merge pull request #1 from apache/master URL: https://github.com/apache/spark/pull/25413 Send Pull Request ## What changes were proposed in this pull request? (Please fill in changes proposed in this fix) ## How was this patch tested? (Please explain how this patch was tested. E.g. unit tests, integration tests, manual tests) (If this patch involves UI changes, please attach a screenshot; otherwise, remove this) Please review https://spark.apache.org/contributing.html before opening a pull request. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312774692 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication + cannot be achieved with (partitionId, epochId). e.g. source provides different number of + partitions for some reason, Spark optimization changes number of partitions, etc. + Refer SPARK-28650 for more details. `epochId` can still be used for deduplication, but there's less Review comment: Just to match with other doc: ``` See [SPARK-28650](https://issues.apache.org/jira/browse/SPARK-28650) for more details. ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #25412: [SPARK-28691][EXAMPLES]DirectKafkaWordCount supoort kafka with kerberos
dongjoon-hyun commented on issue #25412: [SPARK-28691][EXAMPLES]DirectKafkaWordCount supoort kafka with kerberos URL: https://github.com/apache/spark/pull/25412#issuecomment-520291947 Hi, @hddong . Thank you for making a PR. - Since `DirectKafkaWordCount.scala` and `JavaDirectKafkaWordCount.java` is a pair for language parity, this PR should update them together consistently. - And, just out of curiosity, why do you want to have a Kerberized example at `DirectKafkaWordCount`? Do you want to change another examples like `StructuredKafkaWordCount.scala`, too? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312774510 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication + cannot be achieved with (partitionId, epochId). e.g. source provides different number of + partitions for some reason, Spark optimization changes number of partitions, etc. + Refer SPARK-28650 for more details. `epochId` can still be used for deduplication, but there's less + benefit to leverage this, as the chance for Spark to successfully write all partitions and fail to checkpoint Review comment: Using epoch seems not quite useful given the description. Should we maybe just remove it out? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter
gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter URL: https://github.com/apache/spark/pull/25342#discussion_r312773982 ## File path: core/src/main/scala/org/apache/spark/storage/DiskBlockObjectWriter.scala ## @@ -46,7 +47,8 @@ private[spark] class DiskBlockObjectWriter( writeMetrics: ShuffleWriteMetricsReporter, val blockId: BlockId = null) extends OutputStream - with Logging { + with Logging + with PairsWriter { Review comment: :nit add `override` to one function This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter
gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter URL: https://github.com/apache/spark/pull/25342#discussion_r312773349 ## File path: core/src/main/scala/org/apache/spark/util/collection/PairsWriter.scala ## @@ -0,0 +1,23 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection + +private[spark] trait PairsWriter { Review comment: : nit add docs where can this be used? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter
gczsjdy commented on a change in pull request #25342: [SPARK-28571][CORE][SHUFFLE] Use the shuffle writer plugin for the SortShuffleWriter URL: https://github.com/apache/spark/pull/25342#discussion_r312773647 ## File path: core/src/main/scala/org/apache/spark/util/collection/ShufflePartitionPairsWriter.scala ## @@ -0,0 +1,98 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.spark.util.collection + +import java.io.{Closeable, FilterOutputStream, OutputStream} + +import org.apache.spark.serializer.{SerializationStream, SerializerInstance, SerializerManager} +import org.apache.spark.shuffle.ShuffleWriteMetricsReporter +import org.apache.spark.shuffle.api.ShufflePartitionWriter +import org.apache.spark.storage.BlockId + +/** + * A key-value writer inspired by {@link DiskBlockObjectWriter} that pushes the bytes to an + * arbitrary partition writer instead of writing to local disk through the block manager. + */ +private[spark] class ShufflePartitionPairsWriter( Review comment: This should instead be in `o.a.s.s` package? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cloud-fan commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables
cloud-fan commented on a change in pull request #25247: [SPARK-28319][SQL] Implement SHOW TABLES for Data Source V2 Tables URL: https://github.com/apache/spark/pull/25247#discussion_r312773091 ## File path: sql/core/src/test/scala/org/apache/spark/sql/sources/v2/DataSourceV2SQLSuite.scala ## @@ -1700,6 +1704,126 @@ class DataSourceV2SQLSuite extends QueryTest with SharedSQLContext with BeforeAn } } + test("ShowTables: using v2 catalog") { +spark.sql("CREATE TABLE testcat.db.table_name (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.n1.n2.db.table_name (id bigint, data string) USING foo") + +runShowTablesSql("SHOW TABLES FROM testcat.db", Seq(Row("db", "table_name"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.n1.n2.db", + Seq(Row("n1.n2.db", "table_name"))) + } + + test("ShowTables: using v2 catalog with a pattern") { +spark.sql("CREATE TABLE testcat.db.table (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db.table_name_1 (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db.table_name_2 (id bigint, data string) USING foo") +spark.sql("CREATE TABLE testcat.db2.table_name_2 (id bigint, data string) USING foo") + +runShowTablesSql( + "SHOW TABLES FROM testcat.db", + Seq( +Row("db", "table"), +Row("db", "table_name_1"), +Row("db", "table_name_2"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.db LIKE '*name*'", + Seq(Row("db", "table_name_1"), Row("db", "table_name_2"))) + +runShowTablesSql( + "SHOW TABLES FROM testcat.db LIKE '*2'", + Seq(Row("db", "table_name_2"))) + } + + test("ShowTables: using v2 catalog, namespace doesn't exist") { +runShowTablesSql("SHOW TABLES FROM testcat.unknown", Seq()) Review comment: In current Spark, `SHOW TABLES FROM non-existing-db` would fail, shall we follow it? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312773068 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication + cannot be achieved with (partitionId, epochId). e.g. source provides different number of + partitions for some reason, Spark optimization changes number of partitions, etc. Review comment: typo: some reasons This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312772909 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication + cannot be achieved with (partitionId, epochId). e.g. source provides different number of + partitions for some reason, Spark optimization changes number of partitions, etc. Review comment: typo same reason? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312772909 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication + cannot be achieved with (partitionId, epochId). e.g. source provides different number of + partitions for some reason, Spark optimization changes number of partitions, etc. Review comment: typo same reason? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312772909 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication + cannot be achieved with (partitionId, epochId). e.g. source provides different number of + partitions for some reason, Spark optimization changes number of partitions, etc. Review comment: typo some reasons This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML URL: https://github.com/apache/spark/pull/25383#discussion_r312771157 ## File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala ## @@ -78,6 +78,28 @@ private[spark] trait DecisionTreeModel { /** Convert to spark.mllib DecisionTreeModel (losing some information) */ private[spark] def toOld: OldDecisionTreeModel + + /** Returns an iterator that traverses (DFS, left to right) the leaves + * in the subtree of this node. + */ + private def leafIterator(node: Node): Iterator[LeafNode] = { +node match { + case l: LeafNode => Iterator.single(l) + case n: InternalNode => +leafIterator(n.leftChild) ++ leafIterator(n.rightChild) +} + } + + @transient private lazy val leafIndices: Map[LeafNode, Int] = { Review comment: I had impled another leaf-transformation in the .mllib side https://github.com/apache/spark/pull/11520, and it used the sorted `leafId` as the output. However, in the .ml side, the `LeafNode` class do not contain a Id, and is exposed to the end user. So I tend to leave current `LeafNode` class alone. As to the extra memory pressure, I think its size O(#numLeaves * #numTrees) is much smaller than the model itself. WDYT @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML URL: https://github.com/apache/spark/pull/25383#discussion_r312771157 ## File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala ## @@ -78,6 +78,28 @@ private[spark] trait DecisionTreeModel { /** Convert to spark.mllib DecisionTreeModel (losing some information) */ private[spark] def toOld: OldDecisionTreeModel + + /** Returns an iterator that traverses (DFS, left to right) the leaves + * in the subtree of this node. + */ + private def leafIterator(node: Node): Iterator[LeafNode] = { +node match { + case l: LeafNode => Iterator.single(l) + case n: InternalNode => +leafIterator(n.leftChild) ++ leafIterator(n.rightChild) +} + } + + @transient private lazy val leafIndices: Map[LeafNode, Int] = { Review comment: I had impled another leaf-transformation in the .mllib side https://github.com/apache/spark/pull/11520, and it used the sorted `leafId` as the output. However, in the .ml side, the `LeafNode` class do not contain a Id, and is exposed to the end user. So I tend to leave current `LeafNode` class alone. As to the extra memory pressure, I think it is much smaller than the model itself. WDYT @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter
HyukjinKwon commented on a change in pull request #25407: [SPARK-28650][SS][DOC] Correct explanation of guarantee for ForeachWriter URL: https://github.com/apache/spark/pull/25407#discussion_r312772609 ## File path: docs/structured-streaming-programming-guide.md ## @@ -2251,13 +2251,13 @@ When the streaming query is started, Spark calls the function or the object’s - The close() method (if it exists) is called if an open() method exists and returns successfully (irrespective of the return value), except if the JVM or Python process crashes in the middle. -- **Note:** The partitionId and epochId in the open() method can be used to deduplicate generated data - when failures cause reprocessing of some input data. This depends on the execution mode of the query. - If the streaming query is being executed in the micro-batch mode, then every partition represented - by a unique tuple (partition_id, epoch_id) is guaranteed to have the same data. - Hence, (partition_id, epoch_id) can be used to deduplicate and/or transactionally commit - data and achieve exactly-once guarantees. However, if the streaming query is being executed - in the continuous mode, then this guarantee does not hold and therefore should not be used for deduplication. +- **Note:** Spark doesn't guarantee same output for (partitionId, epochId) on failure, so deduplication Review comment: no big deal but I usually avoid abbreviation in the doc. `doesn't` -> `does not` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` URL: https://github.com/apache/spark/pull/25408#discussion_r312772428 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/util/DateTimeUtils.scala ## @@ -455,6 +456,22 @@ object DateTimeUtils { (MICROSECONDS.toSeconds(localTimestamp(microsec, timeZone)) % 60).toInt } + /** + * Returns seconds, including fractional parts, multiplied by 1000. The timestamp Review comment: @MaxGekk, if https://github.com/apache/spark/pull/25408#discussion_r312748606 matters, we could 1,000 or 1,000,000, I believe. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` URL: https://github.com/apache/spark/pull/25408#discussion_r312772288 ## File path: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/datetimeExpressions.scala ## @@ -1876,3 +1930,22 @@ case class Decade(child: Expression) extends UnaryExpression with ImplicitCastIn defineCodeGen(ctx, ev, c => s"$dtu.getDecade($c)") } } + +case class Epoch(child: Expression, timeZoneId: Option[String] = None) +extends UnaryExpression with ImplicitCastInputTypes with TimeZoneAwareExpression { + + override def inputTypes: Seq[AbstractDataType] = Seq(TimestampType) + override def dataType: DataType = DecimalType(20, 6) Review comment: @MaxGekk, Out of curiosity, why is it `DecimalType(20, 6)`? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator field leak
ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator field leak URL: https://github.com/apache/spark/pull/23083#discussion_r312770904 ## File path: core/src/main/scala/org/apache/spark/util/CompletionIterator.scala ## @@ -25,11 +25,14 @@ private[spark] abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends Iterator[A] { private[this] var completed = false - def next(): A = sub.next() + private[this] var iter = sub + def next(): A = iter.next() def hasNext: Boolean = { -val r = sub.hasNext +val r = iter.hasNext if (!r && !completed) { completed = true + // reassign to release resources of highly resource consuming iterators early + iter = Iterator.empty.asInstanceOf[I] Review comment: Thanks, szhem :) your UT explains all. at first I misunderstand sub as CompletionIterator(val sub) Hided, well done! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML URL: https://github.com/apache/spark/pull/25383#discussion_r312771512 ## File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeParams.scala ## @@ -455,7 +508,19 @@ private[ml] object GBTClassifierParams { Array("logistic").map(_.toLowerCase(Locale.ROOT)) } -private[ml] trait GBTClassifierParams extends GBTParams with HasVarianceImpurity { +private[ml] trait GBTClassifierParams extends GBTParams with HasVarianceImpurity + with ProbabilisticClassifierParams { + + override protected def validateAndTransformSchema( Review comment: Good point, I will look into it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()`
HyukjinKwon commented on a change in pull request #25408: [SPARK-28687][SQL] Support `epoch`, `isoyear`, `milliseconds` and `microseconds` at `extract()` URL: https://github.com/apache/spark/pull/25408#discussion_r312771506 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/date.sql ## @@ -228,9 +228,9 @@ SELECT f1 - date '2000-01-01' AS `Days From 2K` FROM DATE_TBL; -- test extract! -- -- epoch --- --- SELECT EXTRACT(EPOCH FROM DATE'1970-01-01'); -- 0 --- SELECT EXTRACT(EPOCH FROM TIMESTAMP '1970-01-01'); -- 0 Review comment: Seems fixed as of https://github.com/apache/spark/pull/25357 . Does this still fail? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML
zhengruifeng commented on a change in pull request #25383: [SPARK-13677][ML] Implement Tree-Based Feature Transformation for ML URL: https://github.com/apache/spark/pull/25383#discussion_r312771157 ## File path: mllib/src/main/scala/org/apache/spark/ml/tree/treeModels.scala ## @@ -78,6 +78,28 @@ private[spark] trait DecisionTreeModel { /** Convert to spark.mllib DecisionTreeModel (losing some information) */ private[spark] def toOld: OldDecisionTreeModel + + /** Returns an iterator that traverses (DFS, left to right) the leaves + * in the subtree of this node. + */ + private def leafIterator(node: Node): Iterator[LeafNode] = { +node match { + case l: LeafNode => Iterator.single(l) + case n: InternalNode => +leafIterator(n.leftChild) ++ leafIterator(n.rightChild) +} + } + + @transient private lazy val leafIndices: Map[LeafNode, Int] = { Review comment: I had impled another leaf-transformation in the .mllib side https://github.com/apache/spark/pull/11520/files, and it used the sorted `leafId` as the output. However, in the .ml side, the `LeafNode` class do not contain a Id, and is exposed to the end user. So I tend to leave current `LeafNode` class alone. As to the extra memory pressure, I think it is much smaller than the model itself. WDYT @srowen This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs
SparkQA commented on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs URL: https://github.com/apache/spark/pull/25368#issuecomment-520287517 **[Test build #108946 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/108946/testReport)** for PR 25368 at commit [`b07790d`](https://github.com/apache/spark/commit/b07790d133346d24ef92695ae1a61ad755e988cf). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs
AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs URL: https://github.com/apache/spark/pull/25368#issuecomment-520287324 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs
AmplabJenkins removed a comment on issue #25368: [SPARK-28635][SQL] create CatalogManager to track registered v2 catalogs URL: https://github.com/apache/spark/pull/25368#issuecomment-520287328 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/14019/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] HyukjinKwon commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone
HyukjinKwon commented on a change in pull request #25409: [SPARK-28414][WEBUI] UI updates to show resource info in Standalone URL: https://github.com/apache/spark/pull/25409#discussion_r312770930 ## File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ## @@ -242,6 +243,22 @@ private[deploy] class Worker( System.exit(1) } } +resources.foreach { case (rName, _) => Review comment: nit: ```scala resources.keys.foreach { rName => resourcesUsed(rName) = new ResourceInformation(rName, Array.empty[String]) } ``` This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator field leak
ChenjunZou commented on a change in pull request #23083: [SPARK-26114][CORE] ExternalSorter's readingIterator field leak URL: https://github.com/apache/spark/pull/23083#discussion_r312770904 ## File path: core/src/main/scala/org/apache/spark/util/CompletionIterator.scala ## @@ -25,11 +25,14 @@ private[spark] abstract class CompletionIterator[ +A, +I <: Iterator[A]](sub: I) extends Iterator[A] { private[this] var completed = false - def next(): A = sub.next() + private[this] var iter = sub + def next(): A = iter.next() def hasNext: Boolean = { -val r = sub.hasNext +val r = iter.hasNext if (!r && !completed) { completed = true + // reassign to release resources of highly resource consuming iterators early + iter = Iterator.empty.asInstanceOf[I] Review comment: Thanks, szhem :) your UT explains all. at first I misunderstand sub as CompletionIterator(val sub) BTW, sub is absolutely not type of strong reference, do you konw what is sub's reference type. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org