[GitHub] [spark] AmplabJenkins removed a comment on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
AmplabJenkins removed a comment on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 URL: https://github.com/apache/spark/pull/24847#issuecomment-501130458 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
AmplabJenkins removed a comment on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 URL: https://github.com/apache/spark/pull/24847#issuecomment-501130461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11648/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
SparkQA commented on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 URL: https://github.com/apache/spark/pull/24847#issuecomment-501130937 **[Test build #106403 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106403/testReport)** for PR 24847 at commit [`fe8f5b6`](https://github.com/apache/spark/commit/fe8f5b6091f11248f00f9231ac926fc675ce8f9b). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver
dongjoon-hyun commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver URL: https://github.com/apache/spark/pull/24835#issuecomment-501130810 cc @gatorsmile This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
AmplabJenkins commented on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 URL: https://github.com/apache/spark/pull/24847#issuecomment-501130458 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
AmplabJenkins commented on issue #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 URL: https://github.com/apache/spark/pull/24847#issuecomment-501130461 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11648/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun opened a new pull request #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1
dongjoon-hyun opened a new pull request #24847: [SPARK-28013][BUILD][SS] Upgrade to Kafka 2.2.1 URL: https://github.com/apache/spark/pull/24847 ## What changes were proposed in this pull request? This PR aims to update Kafka dependency to 2.2.1 to bring the following improvement and bug fixes for Apache Spark 3.0.0 release. https://issues.apache.org/jira/projects/KAFKA/versions/12345010 ## How was this patch tested? Pass the Jenkins. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #24841: [SPARK-27369][CORE] Setup resources when Standalone Worker starts up
jiangxb1987 commented on a change in pull request #24841: [SPARK-27369][CORE] Setup resources when Standalone Worker starts up URL: https://github.com/apache/spark/pull/24841#discussion_r292753205 ## File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ## @@ -220,6 +225,38 @@ private[deploy] class Worker( metricsSystem.getServletHandlers.foreach(webUi.attachHandler) } + // TODO if we're starting up multi workers under the same host, discovery script won't work. + private def setupWorkerResources(): Unit = { +try { + resources = resourceFile.map { rFile => +ResourceDiscoverer.parseAllocatedFromJsonFile(rFile) + }.getOrElse { +if (resourceDiscoveryScript.isEmpty) { Review comment: We don't want to reuse the discoveryScript config? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver
wangyum commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver URL: https://github.com/apache/spark/pull/24835#issuecomment-501128531 @dongjoon-hyun I updated how I tested it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on a change in pull request #24841: [SPARK-27369][CORE] Setup resources when Standalone Worker starts up
jiangxb1987 commented on a change in pull request #24841: [SPARK-27369][CORE] Setup resources when Standalone Worker starts up URL: https://github.com/apache/spark/pull/24841#discussion_r292752935 ## File path: core/src/main/scala/org/apache/spark/deploy/worker/Worker.scala ## @@ -220,6 +225,38 @@ private[deploy] class Worker( metricsSystem.getServletHandlers.foreach(webUi.attachHandler) } + // TODO if we're starting up multi workers under the same host, discovery script won't work. Review comment: How would the `resourceFile` work under your approach on the scenario of multiple workers launched on the same host? We need to avoid multiple workers access the same external resource address. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun edited a comment on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver
dongjoon-hyun edited a comment on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver URL: https://github.com/apache/spark/pull/24835#issuecomment-501125305 Ur, then, it sounds like no one test this yet. Could you ping someone to help this? > ### How was this patch tested? > Kerberos related changes, so manually test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver
dongjoon-hyun commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver URL: https://github.com/apache/spark/pull/24835#issuecomment-501125305 Ur, then, it sounds like no one test this yet. > ### How was this patch tested? > Kerberos related changes, so manually test This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
SparkQA commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#issuecomment-501125135 **[Test build #106402 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106402/testReport)** for PR 24829 at commit [`0a425c4`](https://github.com/apache/spark/commit/0a425c41b26225512cb9d0e8cb58986d76513f6c). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on a change in pull request #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
wangyum commented on a change in pull request #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#discussion_r292750106 ## File path: sql/core/src/test/resources/sql-tests/inputs/pgSQL/aggregates_part3.sql ## @@ -0,0 +1,284 @@ +-- +-- Portions Copyright (c) 1996-2019, PostgreSQL Global Development Group +-- +-- +-- AGGREGATES [Part 3] +-- https://github.com/postgres/postgres/blob/REL_12_BETA1/src/test/regress/sql/aggregates.sql#L352-L605 + +create temporary view varchar_tbl as select * from values + ('a'), + ('A'), + ('1'), + ('2'), + ('3'), + (''), + -- ('cd'), + ('c') + as varchar_tbl(f1); + +-- We do not support inheritance tree, skip related tests. +-- try it on an inheritance tree +-- create table minmaxtest(f1 int); +-- create table minmaxtest1() inherits (minmaxtest); +-- create table minmaxtest2() inherits (minmaxtest); +-- create table minmaxtest3() inherits (minmaxtest); +-- create index minmaxtesti on minmaxtest(f1); +-- create index minmaxtest1i on minmaxtest1(f1); +-- create index minmaxtest2i on minmaxtest2(f1 desc); +-- create index minmaxtest3i on minmaxtest3(f1) where f1 is not null; + +-- insert into minmaxtest values(11), (12); +-- insert into minmaxtest1 values(13), (14); +-- insert into minmaxtest2 values(15), (16); +-- insert into minmaxtest3 values(17), (18); + +-- explain (costs off) +-- select min(f1), max(f1) from minmaxtest; +-- select min(f1), max(f1) from minmaxtest; + +-- DISTINCT doesn't do anything useful here, but it shouldn't fail +-- explain (costs off) +-- select distinct min(f1), max(f1) from minmaxtest; +-- select distinct min(f1), max(f1) from minmaxtest; + +-- drop table minmaxtest cascade; + +-- [SPARK-9830] It is not allowed to use an aggregate function in the argument of another aggregate function +-- check for correct detection of nested-aggregate errors +-- select max(min(unique1)) from tenk1; +-- select (select max(min(unique1)) from int8_tbl) from tenk1; + +-- These tests only test the explain. Skip these tests. +-- +-- Test removal of redundant GROUP BY columns +-- + +-- create temp table t1 (a int, b int, c int, d int, primary key (a, b)); +-- create temp table t2 (x int, y int, z int, primary key (x, y)); +-- create temp table t3 (a int, b int, c int, primary key(a, b) deferrable); + +-- Non-primary-key columns can be removed from GROUP BY +-- explain (costs off) select * from t1 group by a,b,c,d; + +-- No removal can happen if the complete PK is not present in GROUP BY +-- explain (costs off) select a,c from t1 group by a,c,d; + +-- Test removal across multiple relations +-- explain (costs off) select * +-- from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y +-- group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.y,t2.z; + +-- Test case where t1 can be optimized but not t2 +-- explain (costs off) select t1.*,t2.x,t2.z +-- from t1 inner join t2 on t1.a = t2.x and t1.b = t2.y +-- group by t1.a,t1.b,t1.c,t1.d,t2.x,t2.z; + +-- Cannot optimize when PK is deferrable +-- explain (costs off) select * from t3 group by a,b,c; + +-- drop table t1; +-- drop table t2; +-- drop table t3; + +-- [SPARK-27974] Add built-in Aggregate Function: array_agg +-- +-- Test combinations of DISTINCT and/or ORDER BY +-- + +-- select array_agg(a order by b) +-- from (values (1,4),(2,3),(3,1),(4,2)) v(a,b); +-- select array_agg(a order by a) +-- from (values (1,4),(2,3),(3,1),(4,2)) v(a,b); +-- select array_agg(a order by a desc) +-- from (values (1,4),(2,3),(3,1),(4,2)) v(a,b); +-- select array_agg(b order by a desc) +-- from (values (1,4),(2,3),(3,1),(4,2)) v(a,b); + +-- select array_agg(distinct a) +-- from (values (1),(2),(1),(3),(null),(2)) v(a); +-- select array_agg(distinct a order by a) +-- from (values (1),(2),(1),(3),(null),(2)) v(a); +-- select array_agg(distinct a order by a desc) +-- from (values (1),(2),(1),(3),(null),(2)) v(a); +-- select array_agg(distinct a order by a desc nulls last) +-- from (values (1),(2),(1),(3),(null),(2)) v(a); + +-- multi-arg aggs, strict/nonstrict, distinct/order by + +select aggfstr(a,b,c) Review comment: OK. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
AmplabJenkins removed a comment on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#issuecomment-501124768 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
AmplabJenkins commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#issuecomment-501124768 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
AmplabJenkins removed a comment on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#issuecomment-501124772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11647/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3]
AmplabJenkins commented on issue #24829: [WIP][SPARK-27988][SQL][TEST] Port AGGREGATES.sql [Part 3] URL: https://github.com/apache/spark/pull/24829#issuecomment-501124772 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11647/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] wangyum commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver
wangyum commented on issue #24835: [SPARK-27993][SQL] Port HIVE-12981 to hive-thriftserver URL: https://github.com/apache/spark/pull/24835#issuecomment-501124395 I haven't seen the error yet, because our `hive-thriftserver` does not support user impersonation. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501123521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106399/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501123515 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501123521 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106399/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501123515 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501123204 **[Test build #106399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106399/testReport)** for PR 24741 at commit [`2568288`](https://github.com/apache/spark/commit/2568288b58324e595e87b9b05bb1e821626e700a). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
SparkQA removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501091714 **[Test build #106399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106399/testReport)** for PR 24741 at commit [`2568288`](https://github.com/apache/spark/commit/2568288b58324e595e87b9b05bb1e821626e700a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cxzl25 commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
cxzl25 commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846#issuecomment-501120807 Current problem: ![image](https://user-images.githubusercontent.com/3898450/59324353-1a7ef880-8d11-11e9-819a-30a8712d7b16.png) Fix: ![image](https://user-images.githubusercontent.com/3898450/59324368-2ec2f580-8d11-11e9-9365-d2efcd195b14.png) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
AmplabJenkins removed a comment on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846#issuecomment-501120516 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
AmplabJenkins commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846#issuecomment-501120608 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
AmplabJenkins commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846#issuecomment-501120516 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
AmplabJenkins removed a comment on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846#issuecomment-501119637 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
AmplabJenkins commented on issue #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846#issuecomment-501119637 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] cxzl25 opened a new pull request #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type
cxzl25 opened a new pull request #24846: [SPARK-28012][SQL] Hive UDF supports literal struct type URL: https://github.com/apache/spark/pull/24846 ## What changes were proposed in this pull request? Currently using hive udf, the parameter is literal struct type, will report an error. No handler for Hive UDF 'xxxUDF': java.lang.RuntimeException: Hive doesn't support the constant type [StructType(StructField(name,StringType,true), StructField(value,DecimalType(3,1),true))] ## How was this patch tested? manual test and existing tests This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501113812 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106400/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501113811 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501113812 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106400/ Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
SparkQA removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501098820 **[Test build #106400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106400/testReport)** for PR 24843 at commit [`8fe52eb`](https://github.com/apache/spark/commit/8fe52eb5a934edd82b837c598bbff6e01974255a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
SparkQA commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501113690 **[Test build #106400 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106400/testReport)** for PR 24843 at commit [`8fe52eb`](https://github.com/apache/spark/commit/8fe52eb5a934edd82b837c598bbff6e01974255a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501113811 Merged build finished. Test FAILed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501112668 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501112670 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106398/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501112668 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501112670 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106398/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
SparkQA commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501112394 **[Test build #106398 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106398/testReport)** for PR 24599 at commit [`3aab8bf`](https://github.com/apache/spark/commit/3aab8bf953fab5e26ebe83f252efb63a9a10d469). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
SparkQA removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501079753 **[Test build #106398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106398/testReport)** for PR 24599 at commit [`3aab8bf`](https://github.com/apache/spark/commit/3aab8bf953fab5e26ebe83f252efb63a9a10d469). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] sharmabhaskar commented on issue #13231: [SPARK-15453] [SQL] Sort Merge Join to use bucketing metadata to optimize query plan
sharmabhaskar commented on issue #13231: [SPARK-15453] [SQL] Sort Merge Join to use bucketing metadata to optimize query plan URL: https://github.com/apache/spark/pull/13231#issuecomment-501110631 @tejasapatil I am facing the same issue while joining to bucketed tables . I am sung spark 2,2 in mapR distribution: I have two tables : table A : bucketed on key_column ( 20 buckets ) table B : portioned on year and bucketed on key_columns( 20 buckets ) But while joining both the tables on key_columns the query is doing both sort and exchange: [count#1311L]) +- *Project +- *SortMergeJoin [key_column#1079], [key_column#1218],Inner sort step::- *Sort [key_column#1079 ASC NULLS FIRST], false, 0 exchange step:: +- Exchange hashpartitioning(key_column#1079, 200) : +- *Filter isnotnull(key_column#1079) This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] yuhuali1989 commented on issue #23993: [SPARK-26742][K8s][branch-2.4] Update k8s client version to 4.1.2
yuhuali1989 commented on issue #23993: [SPARK-26742][K8s][branch-2.4] Update k8s client version to 4.1.2 URL: https://github.com/apache/spark/pull/23993#issuecomment-501110196 Hi, I report an issue related with kubernetes-client 4.1.2 which introduce non daemon thread which block jvm exit. Any Idea to fix it ? Thanks. More details please check https://issues.apache.org/jira/browse/SPARK-27812 By the way, I believe update kubernetes-client is a necessary choice. Without this version, I have to fix another issue with aws eks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
AmplabJenkins removed a comment on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#issuecomment-501106379 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
AmplabJenkins removed a comment on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#issuecomment-501106384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11646/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
AmplabJenkins commented on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#issuecomment-501106379 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
AmplabJenkins commented on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#issuecomment-501106384 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11646/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
SparkQA commented on issue #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#issuecomment-501105447 **[Test build #106401 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106401/testReport)** for PR 24792 at commit [`a912b87`](https://github.com/apache/spark/commit/a912b87893d22e603d334429b3ca3de644ca2780). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] dongjoon-hyun closed pull request #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
dongjoon-hyun closed pull request #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
SparkQA commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501098820 **[Test build #106400 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106400/testReport)** for PR 24843 at commit [`8fe52eb`](https://github.com/apache/spark/commit/8fe52eb5a934edd82b837c598bbff6e01974255a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501098434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11645/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins removed a comment on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501098429 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501098434 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11645/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
AmplabJenkins commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501098429 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
beliefer commented on a change in pull request #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#discussion_r292722871 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -735,7 +735,7 @@ colTypeList ; colType -: identifier dataType (COMMENT STRING)? +: identifier dataType (COMMENT STRING)? (DEFAULT defaultExpression=expression)? Review comment: @lipzhu It's worth to reference, but we need to look at the actual situation on Spark SQL. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501093342 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501093347 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106397/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501093347 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106397/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501093342 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
SparkQA removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501055068 **[Test build #106397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106397/testReport)** for PR 24741 at commit [`62b76e9`](https://github.com/apache/spark/commit/62b76e9de5a9bcfd1e9cbb47e3864002e49504ea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501093007 **[Test build #106397 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106397/testReport)** for PR 24741 at commit [`62b76e9`](https://github.com/apache/spark/commit/62b76e9de5a9bcfd1e9cbb47e3864002e49504ea). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] beliefer commented on a change in pull request #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint
beliefer commented on a change in pull request #24792: [SPARK-27943][SPARK-27953][SQL] Add new feature create table could specify column with default constraint URL: https://github.com/apache/spark/pull/24792#discussion_r292722871 ## File path: sql/catalyst/src/main/antlr4/org/apache/spark/sql/catalyst/parser/SqlBase.g4 ## @@ -735,7 +735,7 @@ colTypeList ; colType -: identifier dataType (COMMENT STRING)? +: identifier dataType (COMMENT STRING)? (DEFAULT defaultExpression=expression)? Review comment: @lipzhu It's worth to reference, but we need to look at the actual situation on Spark SQL. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501091714 **[Test build #106399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106399/testReport)** for PR 24741 at commit [`2568288`](https://github.com/apache/spark/commit/2568288b58324e595e87b9b05bb1e821626e700a). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501091227 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11644/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins removed a comment on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501091221 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501091221 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501091227 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11644/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jzhuge commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
jzhuge commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501090383 Rebase and squash This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 closed pull request #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
jiangxb1987 closed pull request #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL
AmplabJenkins removed a comment on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL URL: https://github.com/apache/spark/pull/24845#issuecomment-501084057 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] jiangxb1987 commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
jiangxb1987 commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501085279 Thanks, merging to master! This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL
AmplabJenkins commented on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL URL: https://github.com/apache/spark/pull/24845#issuecomment-501085001 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL
AmplabJenkins commented on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL URL: https://github.com/apache/spark/pull/24845#issuecomment-501084057 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL
AmplabJenkins removed a comment on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL URL: https://github.com/apache/spark/pull/24845#issuecomment-501083697 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL
AmplabJenkins commented on issue #24845: Map ByteType to SMALLINT when using JDBC with PostgreSQL URL: https://github.com/apache/spark/pull/24845#issuecomment-501083697 Can one of the admins verify this patch? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] mojodna opened a new pull request #24845: Map ByteType to SMALLINT
mojodna opened a new pull request #24845: Map ByteType to SMALLINT URL: https://github.com/apache/spark/pull/24845 ## What changes were proposed in this pull request? PostgreSQL doesn't have `TINYINT`, which would map directly, but `SMALLINT`s are sufficient for uni-directional translation. A side-effect of this fix is that `AggregatedDialect` is now usable with multiple dialects targeting `jdbc:postgresql`, as `PostgresDialect.getJDBCType` no longer throws (for which reason backporting this fix would be lovely): https://github.com/apache/spark/blob/1217996f1574f758d81c4e3846452d24b35b/sql/core/src/main/scala/org/apache/spark/sql/jdbc/AggregatedDialect.scala#L42 `dialects.flatMap` throws on the first attempt to get a JDBC type preventing subsequent dialects in the chain from providing an alternative. ## How was this patch tested? Unit tests. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] srowen commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1
srowen commented on issue #24843: [SPARK-28004][UI] Update jquery to 3.4.1 URL: https://github.com/apache/spark/pull/24843#issuecomment-501081448 Ah right. The history server. There's a problem there. I'll look into it. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
SparkQA commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501079753 **[Test build #106398 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106398/testReport)** for PR 24599 at commit [`3aab8bf`](https://github.com/apache/spark/commit/3aab8bf953fab5e26ebe83f252efb63a9a10d469). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501079368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11643/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501079362 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins removed a comment on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501079368 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution-unified/11643/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
AmplabJenkins commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501079362 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] viirya commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField
viirya commented on issue #24599: [SPARK-27701][SQL] Extend NestedColumnAliasing to general nested field cases including GetArrayStructField URL: https://github.com/apache/spark/pull/24599#issuecomment-501078595 Thanks @dongjoon-hyun! Merged the benchmark results now. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r292710387 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import java.net.URI +import java.util.TimeZone + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.mapreduce._ +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.apache.parquet.filter2.compat.FilterCompat +import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate} +import org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS +import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, ParquetInputSplit, ParquetRecordReader} + +import org.apache.spark.TaskContext +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.datasources.{PartitionedFile, RecordReaderIterator} +import org.apache.spark.sql.execution.datasources.parquet._ +import org.apache.spark.sql.execution.datasources.v2._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.{InputPartition, PartitionReader} +import org.apache.spark.sql.types.{AtomicType, StructType} +import org.apache.spark.sql.vectorized.ColumnarBatch +import org.apache.spark.util.SerializableConfiguration + +/** + * A factory used to create Parquet readers. + * + * @param sqlConf SQL configuration. + * @param broadcastedConf Broadcast serializable Hadoop Configuration. + * @param dataSchema Schema of Parquet files. + * @param readDataSchema Required schema of Parquet files. + * @param partitionSchema Schema of partitions. + * @param filters Filters to be pushed down in the batch scan. + */ +case class ParquetPartitionReaderFactory( +sqlConf: SQLConf, +broadcastedConf: Broadcast[SerializableConfiguration], +dataSchema: StructType, +readDataSchema: StructType, +partitionSchema: StructType, +filters: Array[Filter]) extends FilePartitionReaderFactory with Logging { + private val isCaseSensitive = sqlConf.caseSensitiveAnalysis + private val resultSchema = StructType(partitionSchema.fields ++ readDataSchema.fields) + private val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled + private val enableVectorizedReader: Boolean = sqlConf.parquetVectorizedReaderEnabled && +resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + private val enableRecordFilter: Boolean = sqlConf.parquetRecordFilterEnabled + private val timestampConversion: Boolean = sqlConf.isParquetINT96TimestampConversion + private val capacity = sqlConf.parquetVectorizedReaderBatchSize + private val enableParquetFilterPushDown: Boolean = sqlConf.parquetFilterPushDown + private val pushDownDate = sqlConf.parquetFilterPushDownDate + private val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp + private val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal + private val pushDownStringStartWith = sqlConf.parquetFilterPushDownStringStartWith + private val pushDownInFilterThreshold = sqlConf.parquetFilterPushDownInFilterThreshold + + override def supportColumnarReads(partition: InputPartition): Boolean = { +sqlConf.parquetVectorizedReaderEnabled && sqlConf.wholeStageEnabled && + resultSchema.length <= sqlConf.wholeStageMaxNumFields && + resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + } + + override def buildReader(file: PartitionedFile): PartitionReader[InternalRow] = { +val reader = if (enableVectorizedReader) { + createVectorizedReader(file) +} else { + createRowBaseReader(file) +} + +val fileReader = new PartitionReader[InternalRow] { + override def next():
[GitHub] [spark] gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
gengliangwang commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r292709027 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext} +import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat} +import org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel +import org.apache.parquet.hadoop.codec.CodecConfig +import org.apache.parquet.hadoop.util.ContextUtil + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Row +import org.apache.spark.sql.execution.datasources.{OutputWriter, OutputWriterFactory} +import org.apache.spark.sql.execution.datasources.parquet._ +import org.apache.spark.sql.execution.datasources.v2.FileWriteBuilder +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types._ +import org.apache.spark.sql.util.CaseInsensitiveStringMap + +class ParquetWriteBuilder( +options: CaseInsensitiveStringMap, +paths: Seq[String], +formatName: String, +supportsDataType: DataType => Boolean) + extends FileWriteBuilder(options, paths, formatName, supportsDataType) with Logging { + + override def prepareWrite( + sqlConf: SQLConf, + job: Job, + options: Map[String, String], + dataSchema: StructType): OutputWriterFactory = { +val parquetOptions = new ParquetOptions(options, sqlConf) + +val conf = ContextUtil.getConfiguration(job) + +val committerClass = + conf.getClass( +SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key, +classOf[ParquetOutputCommitter], +classOf[OutputCommitter]) + +if (conf.get(SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key) == null) { + logInfo("Using default output committer for Parquet: " + +classOf[ParquetOutputCommitter].getCanonicalName) +} else { + logInfo("Using user defined output committer for Parquet: " + committerClass.getCanonicalName) +} + +conf.setClass( + SQLConf.OUTPUT_COMMITTER_CLASS.key, + committerClass, + classOf[OutputCommitter]) + +// We're not really using `ParquetOutputFormat[Row]` for writing data here, because we override +// it in `ParquetOutputWriter` to support appending and dynamic partitioning. The reason why +// we set it here is to setup the output committer class to `ParquetOutputCommitter`, which is +// bundled with `ParquetOutputFormat[Row]`. +job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]]) + +ParquetOutputFormat.setWriteSupportClass(job, classOf[ParquetWriteSupport]) + +// This metadata is useful for keeping UDTs like Vector/Matrix. +ParquetWriteSupport.setSchema(dataSchema, conf) + +// Sets flags for `ParquetWriteSupport`, which converts Catalyst schema to Parquet +// schema and writes actual rows to Parquet files. +conf.set(SQLConf.PARQUET_WRITE_LEGACY_FORMAT.key, sqlConf.writeLegacyParquetFormat.toString) + +conf.set(SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key, sqlConf.parquetOutputTimestampType.toString) + +// Sets compression scheme +conf.set(ParquetOutputFormat.COMPRESSION, parquetOptions.compressionCodecClassName) + +// SPARK-15719: Disables writing Parquet summary files by default. Review comment: I think it is consistent with V1 here. The value of `parquet.summary.metadata.level` is `ALL` by default. As per SPARK-15719, we should set it as `NONE` by default in Spark. If users set the conf `parquet.summary.metadata.level` as `ALL` or `COMMON_ONLY` explicitly, Spark should write metadata files. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact
[GitHub] [spark] HyukjinKwon commented on issue #24838: [SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized
HyukjinKwon commented on issue #24838: [SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized URL: https://github.com/apache/spark/pull/24838#issuecomment-501074437 Oh, right - we noticed this and fixed in that way in Pandas UDF right :-). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] BryanCutler commented on issue #24838: [SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized
BryanCutler commented on issue #24838: [SPARK-27995][PYTHON] Note the difference between str of Python 2 and 3 at Arrow optimized URL: https://github.com/apache/spark/pull/24838#issuecomment-501073830 +1 on the note. Also, specifying the schema as string type should work too. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
AmplabJenkins removed a comment on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501072989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106396/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins removed a comment on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
AmplabJenkins removed a comment on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501072983 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
AmplabJenkins commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501072983 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA removed a comment on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
SparkQA removed a comment on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501027626 **[Test build #106396 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106396/testReport)** for PR 24809 at commit [`7b174e5`](https://github.com/apache/spark/commit/7b174e52b6f7bad2ea85f6fa844af3f1e1ffbbb1). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
AmplabJenkins commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501072989 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/106396/ Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements
SparkQA commented on issue #24809: [SPARK-21136][SQL] Disallow FROM-only statements and show better warnings for Hive-style single-from statements URL: https://github.com/apache/spark/pull/24809#issuecomment-501072612 **[Test build #106396 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106396/testReport)** for PR 24809 at commit [`7b174e5`](https://github.com/apache/spark/commit/7b174e52b6f7bad2ea85f6fa844af3f1e1ffbbb1). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] rdblue commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
rdblue commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r292705294 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetPartitionReaderFactory.scala ## @@ -0,0 +1,227 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import java.net.URI +import java.util.TimeZone + +import org.apache.hadoop.fs.Path +import org.apache.hadoop.mapreduce._ +import org.apache.hadoop.mapreduce.task.TaskAttemptContextImpl +import org.apache.parquet.filter2.compat.FilterCompat +import org.apache.parquet.filter2.predicate.{FilterApi, FilterPredicate} +import org.apache.parquet.format.converter.ParquetMetadataConverter.SKIP_ROW_GROUPS +import org.apache.parquet.hadoop.{ParquetFileReader, ParquetInputFormat, ParquetInputSplit, ParquetRecordReader} + +import org.apache.spark.TaskContext +import org.apache.spark.broadcast.Broadcast +import org.apache.spark.internal.Logging +import org.apache.spark.sql.catalyst.InternalRow +import org.apache.spark.sql.catalyst.expressions.UnsafeRow +import org.apache.spark.sql.catalyst.util.DateTimeUtils +import org.apache.spark.sql.execution.datasources.{PartitionedFile, RecordReaderIterator} +import org.apache.spark.sql.execution.datasources.parquet._ +import org.apache.spark.sql.execution.datasources.v2._ +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.sources.Filter +import org.apache.spark.sql.sources.v2.reader.{InputPartition, PartitionReader} +import org.apache.spark.sql.types.{AtomicType, StructType} +import org.apache.spark.sql.vectorized.ColumnarBatch +import org.apache.spark.util.SerializableConfiguration + +/** + * A factory used to create Parquet readers. + * + * @param sqlConf SQL configuration. + * @param broadcastedConf Broadcast serializable Hadoop Configuration. + * @param dataSchema Schema of Parquet files. + * @param readDataSchema Required schema of Parquet files. + * @param partitionSchema Schema of partitions. + * @param filters Filters to be pushed down in the batch scan. + */ +case class ParquetPartitionReaderFactory( +sqlConf: SQLConf, +broadcastedConf: Broadcast[SerializableConfiguration], +dataSchema: StructType, +readDataSchema: StructType, +partitionSchema: StructType, +filters: Array[Filter]) extends FilePartitionReaderFactory with Logging { + private val isCaseSensitive = sqlConf.caseSensitiveAnalysis + private val resultSchema = StructType(partitionSchema.fields ++ readDataSchema.fields) + private val enableOffHeapColumnVector = sqlConf.offHeapColumnVectorEnabled + private val enableVectorizedReader: Boolean = sqlConf.parquetVectorizedReaderEnabled && +resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + private val enableRecordFilter: Boolean = sqlConf.parquetRecordFilterEnabled + private val timestampConversion: Boolean = sqlConf.isParquetINT96TimestampConversion + private val capacity = sqlConf.parquetVectorizedReaderBatchSize + private val enableParquetFilterPushDown: Boolean = sqlConf.parquetFilterPushDown + private val pushDownDate = sqlConf.parquetFilterPushDownDate + private val pushDownTimestamp = sqlConf.parquetFilterPushDownTimestamp + private val pushDownDecimal = sqlConf.parquetFilterPushDownDecimal + private val pushDownStringStartWith = sqlConf.parquetFilterPushDownStringStartWith + private val pushDownInFilterThreshold = sqlConf.parquetFilterPushDownInFilterThreshold + + override def supportColumnarReads(partition: InputPartition): Boolean = { +sqlConf.parquetVectorizedReaderEnabled && sqlConf.wholeStageEnabled && + resultSchema.length <= sqlConf.wholeStageMaxNumFields && + resultSchema.forall(_.dataType.isInstanceOf[AtomicType]) + } + + override def buildReader(file: PartitionedFile): PartitionReader[InternalRow] = { +val reader = if (enableVectorizedReader) { + createVectorizedReader(file) +} else { + createRowBaseReader(file) +} + +val fileReader = new PartitionReader[InternalRow] { + override def next(): Boolean =
[GitHub] [spark] rdblue commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2
rdblue commented on a change in pull request #24327: [SPARK-27418][SQL] Migrate Parquet to File Data Source V2 URL: https://github.com/apache/spark/pull/24327#discussion_r292704984 ## File path: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/parquet/ParquetWriteBuilder.scala ## @@ -0,0 +1,116 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + *http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.spark.sql.execution.datasources.v2.parquet + +import org.apache.hadoop.mapred.JobConf +import org.apache.hadoop.mapreduce.{Job, OutputCommitter, TaskAttemptContext} +import org.apache.parquet.hadoop.{ParquetOutputCommitter, ParquetOutputFormat} +import org.apache.parquet.hadoop.ParquetOutputFormat.JobSummaryLevel +import org.apache.parquet.hadoop.codec.CodecConfig +import org.apache.parquet.hadoop.util.ContextUtil + +import org.apache.spark.internal.Logging +import org.apache.spark.sql.Row +import org.apache.spark.sql.execution.datasources.{OutputWriter, OutputWriterFactory} +import org.apache.spark.sql.execution.datasources.parquet._ +import org.apache.spark.sql.execution.datasources.v2.FileWriteBuilder +import org.apache.spark.sql.internal.SQLConf +import org.apache.spark.sql.types._ +import org.apache.spark.sql.util.CaseInsensitiveStringMap + +class ParquetWriteBuilder( +options: CaseInsensitiveStringMap, +paths: Seq[String], +formatName: String, +supportsDataType: DataType => Boolean) + extends FileWriteBuilder(options, paths, formatName, supportsDataType) with Logging { + + override def prepareWrite( + sqlConf: SQLConf, + job: Job, + options: Map[String, String], + dataSchema: StructType): OutputWriterFactory = { +val parquetOptions = new ParquetOptions(options, sqlConf) + +val conf = ContextUtil.getConfiguration(job) + +val committerClass = + conf.getClass( +SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key, +classOf[ParquetOutputCommitter], +classOf[OutputCommitter]) + +if (conf.get(SQLConf.PARQUET_OUTPUT_COMMITTER_CLASS.key) == null) { + logInfo("Using default output committer for Parquet: " + +classOf[ParquetOutputCommitter].getCanonicalName) +} else { + logInfo("Using user defined output committer for Parquet: " + committerClass.getCanonicalName) +} + +conf.setClass( + SQLConf.OUTPUT_COMMITTER_CLASS.key, + committerClass, + classOf[OutputCommitter]) + +// We're not really using `ParquetOutputFormat[Row]` for writing data here, because we override +// it in `ParquetOutputWriter` to support appending and dynamic partitioning. The reason why +// we set it here is to setup the output committer class to `ParquetOutputCommitter`, which is +// bundled with `ParquetOutputFormat[Row]`. +job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]]) + +ParquetOutputFormat.setWriteSupportClass(job, classOf[ParquetWriteSupport]) + +// This metadata is useful for keeping UDTs like Vector/Matrix. +ParquetWriteSupport.setSchema(dataSchema, conf) + +// Sets flags for `ParquetWriteSupport`, which converts Catalyst schema to Parquet +// schema and writes actual rows to Parquet files. +conf.set(SQLConf.PARQUET_WRITE_LEGACY_FORMAT.key, sqlConf.writeLegacyParquetFormat.toString) + +conf.set(SQLConf.PARQUET_OUTPUT_TIMESTAMP_TYPE.key, sqlConf.parquetOutputTimestampType.toString) + +// Sets compression scheme +conf.set(ParquetOutputFormat.COMPRESSION, parquetOptions.compressionCodecClassName) + +// SPARK-15719: Disables writing Parquet summary files by default. Review comment: Why should v2 support deprecated metadata files? This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
SparkQA commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501055068 **[Test build #106397 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/106397/testReport)** for PR 24741 at commit [`62b76e9`](https://github.com/apache/spark/commit/62b76e9de5a9bcfd1e9cbb47e3864002e49504ea). This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] [spark] AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation
AmplabJenkins commented on issue #24741: [SPARK-27322][SQL] DataSourceV2 table relation URL: https://github.com/apache/spark/pull/24741#issuecomment-501054608 Merged build finished. Test PASSed. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org