[jira] [Created] (SPARK-46207) Support MergeInto in DataFrameWriterV2
Huaxin Gao created SPARK-46207: -- Summary: Support MergeInto in DataFrameWriterV2 Key: SPARK-46207 URL: https://issues.apache.org/jira/browse/SPARK-46207 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 4.0.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44060) Code-gen for build side outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-44060: -- Assignee: Szehon Ho > Code-gen for build side outer shuffled hash join > > > Key: SPARK-44060 > URL: https://issues.apache.org/jira/browse/SPARK-44060 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > > Here, build side outer join means LEFT OUTER join with build left, or RIGHT > OUTER join with build right. > As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 > (non-codegen build-side outer shuffled hash join), this task is to add > code-gen for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44060) Code-gen for build side outer shuffled hash join
[ https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-44060. Fix Version/s: 3.5.0 Resolution: Fixed > Code-gen for build side outer shuffled hash join > > > Key: SPARK-44060 > URL: https://issues.apache.org/jira/browse/SPARK-44060 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Szehon Ho >Assignee: Szehon Ho >Priority: Major > Fix For: 3.5.0 > > > Here, build side outer join means LEFT OUTER join with build left, or RIGHT > OUTER join with build right. > As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 > (non-codegen build-side outer shuffled hash join), this task is to add > code-gen for it. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44149) Support DataFrame Merge API
Huaxin Gao created SPARK-44149: -- Summary: Support DataFrame Merge API Key: SPARK-44149 URL: https://issues.apache.org/jira/browse/SPARK-44149 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-43417) Improve CBO stats
Huaxin Gao created SPARK-43417: -- Summary: Improve CBO stats Key: SPARK-43417 URL: https://issues.apache.org/jira/browse/SPARK-43417 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.5.0 Reporter: Huaxin Gao When experimenting the DS V2 Col stats, we identified areas where could potentially improve. For instance, we can probably propagate Union NDV, and add min/max for the varchar columns. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42470) Remove unused declarations from Hive module
[ https://issues.apache.org/jira/browse/SPARK-42470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-42470. Fix Version/s: 3.5.0 Assignee: Yang Jie Resolution: Fixed > Remove unused declarations from Hive module > --- > > Key: SPARK-42470 > URL: https://issues.apache.org/jira/browse/SPARK-42470 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.5.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.5.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40045) The order of filtering predicates is not reasonable
[ https://issues.apache.org/jira/browse/SPARK-40045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-40045: -- Assignee: caican > The order of filtering predicates is not reasonable > --- > > Key: SPARK-40045 > URL: https://issues.apache.org/jira/browse/SPARK-40045 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2, 3.2.0, 3.3.0 >Reporter: caican >Assignee: caican >Priority: Major > Fix For: 3.4.0 > > > {code:java} > select id, data FROM testcat.ns1.ns2.table > where id =2 > and md5(data) = '8cde774d6f7333752ed72cacddb05126' > and trim(data) = 'a' {code} > Based on the SQL, we currently get the filters in the following order: > {code:java} > // `(md5(cast(data#23 as binary)) = 8cde774d6f7333752ed72cacddb05126)) AND > (trim(data#23, None) = a))` comes before `(id#22L = 2)` > == Physical Plan == *(1) Project [id#22L, data#23] > +- *(1) Filter isnotnull(data#23) AND isnotnull(id#22L)) AND > (md5(cast(data#23 as binary)) = 8cde774d6f7333752ed72cacddb05126)) AND > (trim(data#23, None) = a)) AND (id#22L = 2)) > +- BatchScan[id#22L, data#23] class > org.apache.spark.sql.connector.InMemoryTable$InMemoryBatchScan{code} > In this predicate order, all data needs to participate in the evaluation, > even if some data does not meet the later filtering criteria and it may > causes spark tasks to execute slowly. > > So i think that filtering predicates that need to be evaluated should > automatically be placed to the far right to avoid data that does not meet the > criteria being evaluated. > > As shown below: > {noformat} > // `(id#22L = 2)` comes before `(md5(cast(data#23 as binary)) = > 8cde774d6f7333752ed72cacddb05126)) AND (trim(data#23, None) = a))` > == Physical Plan == *(1) Project [id#22L, data#23] > +- *(1) Filter isnotnull(data#23) AND isnotnull(id#22L)) AND (id#22L = > 2) AND (md5(cast(data#23 as binary)) = 8cde774d6f7333752ed72cacddb05126)) AND > (trim(data#23, None) = a))) > +- BatchScan[id#22L, data#23] class > org.apache.spark.sql.connector.InMemoryTable$InMemoryBatchScan{noformat} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42188) Force SBT protobuf version to match Maven on branch 3.2 and 3.3
[ https://issues.apache.org/jira/browse/SPARK-42188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-42188. Assignee: Steve Vaughan Resolution: Fixed > Force SBT protobuf version to match Maven on branch 3.2 and 3.3 > --- > > Key: SPARK-42188 > URL: https://issues.apache.org/jira/browse/SPARK-42188 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.3.1, 3.2.3 >Reporter: Steve Vaughan >Assignee: Steve Vaughan >Priority: Major > Fix For: 3.2.4, 3.3.2 > > > Update SparkBuild.scala to force SBT use of protobuf-java to match the Maven > version. The Maven dependencyManagement section forces protobuf-java to use > 2.5.0, but SBT is using 3.14.0. > Snippet from Maven dependency tree > > {noformat} > [INFO] +- com.google.crypto.tink:tink:jar:1.6.0:compile > [INFO] | +- com.google.protobuf:protobuf-java:jar:2.5.0:compile<--- 2.x > [INFO] | \- com.google.code.gson:gson:jar:2.8.6:compile{noformat} > Snippet from SBT dependency tree > {noformat} > [info] +-com.google.crypto.tink:tink:1.6.0 > [info] | +-com.google.code.gson:gson:2.8.6 > [info] | +-com.google.protobuf:protobuf-java:3.14.0 <--- > 3.x{noformat} > The fix is updating SparkBuild.scala just like SPARK-11538 did with guava. > In addition we should comment on the need to keep the top-level pom.xml and > SparkBuild.scala in sync as was done in SPARK-41247 > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes
[ https://issues.apache.org/jira/browse/SPARK-42134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-42134. Fix Version/s: 3.3.2 3.4.0 Assignee: Peter Toth Resolution: Fixed > Fix getPartitionFiltersAndDataFilters() to handle filters without referenced > attributes > --- > > Key: SPARK-42134 > URL: https://issues.apache.org/jira/browse/SPARK-42134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Fix For: 3.3.2, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42031) Clean up remove methods that do not need override
[ https://issues.apache.org/jira/browse/SPARK-42031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-42031. Fix Version/s: 3.4.0 Assignee: Yang Jie Resolution: Fixed > Clean up remove methods that do not need override > - > > Key: SPARK-42031 > URL: https://issues.apache.org/jira/browse/SPARK-42031 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > Java 8 began to provide the default remove method implementation for the > `java.util.Iterator` interface. > https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-41378) Support Column Stats in DS V2
Huaxin Gao created SPARK-41378: -- Summary: Support Column Stats in DS V2 Key: SPARK-41378 URL: https://issues.apache.org/jira/browse/SPARK-41378 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40946) Introduce a new DataSource V2 interface SupportsPushDownClusterKeys
Huaxin Gao created SPARK-40946: -- Summary: Introduce a new DataSource V2 interface SupportsPushDownClusterKeys Key: SPARK-40946 URL: https://issues.apache.org/jira/browse/SPARK-40946 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao A mix-in interface for ScanBuilder. Data sources can implement this interface to push down all the join or aggregate keys to data sources. A return value true indicates that data source will return input partitions following the clustering keys. Otherwise, a false return value indicates the data source doesn't make such a guarantee, even though it may still report a partitioning that may or may not be compatible with the given clustering keys, and it's Spark's responsibility to group the input partitions whether it can be applied. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output
[ https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-40429: --- Description: {code:java} sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)") sql(s"INSERT INTO $tbl VALUES (1, 'a'), (2, 'b'), (3, 'c')") checkAnswer( spark.table(tbl).select("index", "_partition"), Seq(Row(0, "3"), Row(0, "2"), Row(0, "1")) ) {code} failed with ScalaTestFailureLocation: org.apache.spark.sql.QueryTest at (QueryTest.scala:226) org.scalatest.exceptions.TestFailedException: AttributeSet(id#994L) was not empty The optimized logical plan has missing inputs: RelationV2[index#998, _partition#999] testcat.t > Only set KeyGroupedPartitioning when the referenced column is in the output > --- > > Key: SPARK-40429 > URL: https://issues.apache.org/jira/browse/SPARK-40429 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.4.0 >Reporter: Huaxin Gao >Priority: Minor > > {code:java} > sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)") > sql(s"INSERT INTO $tbl VALUES (1, 'a'), (2, 'b'), (3, 'c')") > checkAnswer( > spark.table(tbl).select("index", "_partition"), > Seq(Row(0, "3"), Row(0, "2"), Row(0, "1")) > ) > {code} > failed with > ScalaTestFailureLocation: org.apache.spark.sql.QueryTest at > (QueryTest.scala:226) > org.scalatest.exceptions.TestFailedException: AttributeSet(id#994L) was not > empty The optimized logical plan has missing inputs: > RelationV2[index#998, _partition#999] testcat.t -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output
Huaxin Gao created SPARK-40429: -- Summary: Only set KeyGroupedPartitioning when the referenced column is in the output Key: SPARK-40429 URL: https://issues.apache.org/jira/browse/SPARK-40429 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0, 3.4.0 Reporter: Huaxin Gao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40293) Make the V2 table error message more meaningful
Huaxin Gao created SPARK-40293: -- Summary: Make the V2 table error message more meaningful Key: SPARK-40293 URL: https://issues.apache.org/jira/browse/SPARK-40293 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao When V2 catalog is not configured, Spark fails to access/create a table using the V2 API and silently falls back to attempting to do the same operation using the V1 Api. This happens frequently among the users. We want to have a better error message so that users can fix the configuration/usage issue by themselves. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40113) Reactor ParquetScanBuilder DataSourceV2 interface implementation
[ https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-40113. Fix Version/s: 3.4.0 Assignee: miracle Resolution: Fixed > Reactor ParquetScanBuilder DataSourceV2 interface implementation > > > Key: SPARK-40113 > URL: https://issues.apache.org/jira/browse/SPARK-40113 > Project: Spark > Issue Type: Improvement > Components: Optimizer >Affects Versions: 3.3.0 >Reporter: Mars >Assignee: miracle >Priority: Minor > Fix For: 3.4.0 > > > Now `FileScanBuilder` interface is not fully implemented in > `ParquetScanBuilder` like > `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder` > In order to unify the logic of the code and make it clearer, this part of the > implementation is unified. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40064) Use V2 Filter in SupportsOverwrite
[ https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-40064. Fix Version/s: 3.4.0 Assignee: Huaxin Gao Resolution: Fixed > Use V2 Filter in SupportsOverwrite > -- > > Key: SPARK-40064 > URL: https://issues.apache.org/jira/browse/SPARK-40064 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > > Add V2 Filter support in SupportsOverwrite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40064) Use V2 Filter in SupportsOverwrite
Huaxin Gao created SPARK-40064: -- Summary: Use V2 Filter in SupportsOverwrite Key: SPARK-40064 URL: https://issues.apache.org/jira/browse/SPARK-40064 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao Add V2 Filter support in SupportsOverwrite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39528) Use V2 Filter in SupportsRuntimeFiltering
[ https://issues.apache.org/jira/browse/SPARK-39528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-39528: --- Parent: SPARK-36555 Issue Type: Sub-task (was: Improvement) > Use V2 Filter in SupportsRuntimeFiltering > - > > Key: SPARK-39528 > URL: https://issues.apache.org/jira/browse/SPARK-39528 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > > Currently, SupportsRuntimeFiltering uses v1 filter. We should use v2 filter > instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39966) Use V2 Filter in SupportsDelete
[ https://issues.apache.org/jira/browse/SPARK-39966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-39966: --- Parent: SPARK-36555 Issue Type: Sub-task (was: Improvement) > Use V2 Filter in SupportsDelete > --- > > Key: SPARK-39966 > URL: https://issues.apache.org/jira/browse/SPARK-39966 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > > Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39966) Use V2 Filter in SupportsDelete
[ https://issues.apache.org/jira/browse/SPARK-39966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39966. Fix Version/s: 3.4.0 Assignee: Huaxin Gao (was: Apache Spark) Resolution: Fixed > Use V2 Filter in SupportsDelete > --- > > Key: SPARK-39966 > URL: https://issues.apache.org/jira/browse/SPARK-39966 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.4.0 > > > Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39966) Use V2 Filter in SupportsDelete
Huaxin Gao created SPARK-39966: -- Summary: Use V2 Filter in SupportsDelete Key: SPARK-39966 URL: https://issues.apache.org/jira/browse/SPARK-39966 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39914) Add DS V2 Filter to V1 Filter conversion
[ https://issues.apache.org/jira/browse/SPARK-39914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39914. Fix Version/s: 3.4.0 Assignee: Huaxin Gao Resolution: Fixed > Add DS V2 Filter to V1 Filter conversion > > > Key: SPARK-39914 > URL: https://issues.apache.org/jira/browse/SPARK-39914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.4.0 > > > add util method to convert DS V2 Filter to V1 Filter -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39909) Organize the check of push down information for JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39909. Fix Version/s: 3.4.0 Resolution: Fixed > Organize the check of push down information for JDBCV2Suite > --- > > Key: SPARK-39909 > URL: https://issues.apache.org/jira/browse/SPARK-39909 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: miracle >Priority: Major > Fix For: 3.4.0 > > > Currently, JDBCV2Suite have many test cases check the push-down information > looks not clean. > For example, > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ") > {code} > If we change it to below looks better. > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]", > "PushedLimit: LIMIT 1") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39909) Organize the check of push down information for JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-39909: -- Assignee: miracle > Organize the check of push down information for JDBCV2Suite > --- > > Key: SPARK-39909 > URL: https://issues.apache.org/jira/browse/SPARK-39909 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: miracle >Priority: Major > > Currently, JDBCV2Suite have many test cases check the push-down information > looks not clean. > For example, > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ") > {code} > If we change it to below looks better. > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]", > "PushedLimit: LIMIT 1") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39909) Organize the check of push down information for JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573091#comment-17573091 ] Huaxin Gao commented on SPARK-39909: Hi Chen Liang, do you have a jira id? > Organize the check of push down information for JDBCV2Suite > --- > > Key: SPARK-39909 > URL: https://issues.apache.org/jira/browse/SPARK-39909 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, JDBCV2Suite have many test cases check the push-down information > looks not clean. > For example, > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ") > {code} > If we change it to below looks better. > {code:java} > checkPushedInfo(df, > "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]", > "PushedLimit: LIMIT 1") > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39914) Add DS V2 Filter to V1 Filter conversion
[ https://issues.apache.org/jira/browse/SPARK-39914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-39914: --- Summary: Add DS V2 Filter to V1 Filter conversion (was: Add DS V2 Filter to V2 Filter conversion) > Add DS V2 Filter to V1 Filter conversion > > > Key: SPARK-39914 > URL: https://issues.apache.org/jira/browse/SPARK-39914 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Priority: Minor > > add util method to convert DS V2 Filter to V1 Filter -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39914) Add DS V2 Filter to V2 Filter conversion
Huaxin Gao created SPARK-39914: -- Summary: Add DS V2 Filter to V2 Filter conversion Key: SPARK-39914 URL: https://issues.apache.org/jira/browse/SPARK-39914 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao add util method to convert DS V2 Filter to V1 Filter -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39857) V2ExpressionBuilder uses the wrong LiteralValue data type for In predicate
Huaxin Gao created SPARK-39857: -- Summary: V2ExpressionBuilder uses the wrong LiteralValue data type for In predicate Key: SPARK-39857 URL: https://issues.apache.org/jira/browse/SPARK-39857 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao When building V2 In Predicate in V2ExpressionBuilder, InSet.dataType (which is BooleanType) is used to build the LiteralValue, InSet.child.dataType should be used instead. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39812) Simplify code to construct AggregateExpression with toAggregateExpression
[ https://issues.apache.org/jira/browse/SPARK-39812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39812. Fix Version/s: 3.4.0 Assignee: jiaan.geng Resolution: Fixed > Simplify code to construct AggregateExpression with toAggregateExpression > - > > Key: SPARK-39812 > URL: https://issues.apache.org/jira/browse/SPARK-39812 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Currently, Spark provides the toAggregateExpression to simplify the code. > But developers still use AggregateExpression.apply in many places. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39784) Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter
[ https://issues.apache.org/jira/browse/SPARK-39784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39784. Fix Version/s: 3.4.0 Assignee: Huaxin Gao Resolution: Fixed > Put Literal values on the right side of the data source filter after > translating Catalyst Expression to data source filter > -- > > Key: SPARK-39784 > URL: https://issues.apache.org/jira/browse/SPARK-39784 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.4.0 > > > After translating Expression to data source filter, we want the Literal value > to be on the right side of the filter. > For example: 1 > a > After translate to Predicate, we want to have a < 1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39759) Implement listIndexes in JDBC (H2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-39759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39759. Assignee: BingKun Pan Resolution: Fixed > Implement listIndexes in JDBC (H2 dialect) > -- > > Key: SPARK-39759 > URL: https://issues.apache.org/jira/browse/SPARK-39759 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39784) Literal values should be on the right side of the data source filter
Huaxin Gao created SPARK-39784: -- Summary: Literal values should be on the right side of the data source filter Key: SPARK-39784 URL: https://issues.apache.org/jira/browse/SPARK-39784 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao After translating Expression to data source filter, we want the Literal value to be on the right side of the filter. For example: 1 > a After translate to Predicate, we want to have a < 1 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39704) Implement createIndex & dropIndex & IndexExists in JDBC (H2 dialect)
[ https://issues.apache.org/jira/browse/SPARK-39704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39704. Fix Version/s: 3.4.0 Assignee: BingKun Pan Resolution: Fixed > Implement createIndex & dropIndex & IndexExists in JDBC (H2 dialect) > > > Key: SPARK-39704 > URL: https://issues.apache.org/jira/browse/SPARK-39704 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39711) Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging
[ https://issues.apache.org/jira/browse/SPARK-39711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39711. Fix Version/s: 3.4.0 Assignee: BingKun Pan Resolution: Fixed > Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging > > > Key: SPARK-39711 > URL: https://issues.apache.org/jira/browse/SPARK-39711 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.3.0 >Reporter: BingKun Pan >Assignee: BingKun Pan >Priority: Minor > Fix For: 3.4.0 > > > SparkFunSuite declare as follow: > {code:java} > abstract class SparkFunSuite > extends AnyFunSuite > with BeforeAndAfterAll > with BeforeAndAfterEach > with ThreadAudit > with Logging > {code} > some suite extends SparkFunSuite and meanwhile with BeforeAndAfterAll or > BeforeAndAfterEach or Logging, it is redundant. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39724) Remove duplicate `.setAccessible(true)` in `kvstore.KVTypeInfo`
[ https://issues.apache.org/jira/browse/SPARK-39724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39724. Fix Version/s: 3.4.0 Assignee: Yang Jie Resolution: Fixed > Remove duplicate `.setAccessible(true)` in `kvstore.KVTypeInfo` > > > Key: SPARK-39724 > URL: https://issues.apache.org/jira/browse/SPARK-39724 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > {code:java} > for (Method m : type.getDeclaredMethods()) { > KVIndex idx = m.getAnnotation(KVIndex.class); > if (idx != null) { > checkIndex(idx, indices); > Preconditions.checkArgument(m.getParameterTypes().length == 0, > "Annotated method %s::%s should not have any parameters.", > type.getName(), m.getName()); > m.setAccessible(true); > indices.put(idx.value(), idx); > m.setAccessible(true); > accessors.put(idx.value(), new MethodAccessor(m)); > } {code} > The above code has duplicate calls to `.setAccessible(true)`. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39633) Dataframe options for time travel via `timestampAsOf` should respect both formats of specifying timestamp
[ https://issues.apache.org/jira/browse/SPARK-39633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39633. Fix Version/s: 3.4.0 3.3.0 Assignee: Prashant Singh Resolution: Fixed > Dataframe options for time travel via `timestampAsOf` should respect both > formats of specifying timestamp > - > > Key: SPARK-39633 > URL: https://issues.apache.org/jira/browse/SPARK-39633 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Assignee: Prashant Singh >Priority: Minor > Fix For: 3.4.0, 3.3.0 > > > presently spark sql query for time travel like : > {{SELECT * from \{table} TIMESTAMP AS OF 1548751078 }} > works correctly, which is what is specified in sql grammar as well (((FOR > SYSTEM_VERSION) | VERSION) AS OF version=(INTEGER_VALUE | STRING)), but when > trying to do the same via dataframe option `timestampAsOf` the code fails > with : > {quote}[info] org.apache.spark.sql.AnalysisException: '1548751078' is not a > valid timestamp expression for time travel. > [info] at > org.apache.spark.sql.errors.QueryCompilationErrors$.invalidTimestampExprForTimeTravel(QueryCompilationErrors.scala:2413) > [info] at > org.apache.spark.sql.catalyst.analysis.TimeTravelSpec$.create(TimeTravelSpec.scala:55) > [info] at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:128) > [info] at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > [info] at scala.Option.flatMap(Option.scala:271) > [info] at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > [info] at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.load(SupportsCatalogOptionsSuite.scala:365) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$33(SupportsCatalogOptionsSuite.scala:329) > [info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:133) > [info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:158) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$30(SupportsCatalogOptionsSuite.scala:329) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:306) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:304) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.withTable(SupportsCatalogOptionsSuite.scala:44) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$26(SupportsCatalogOptionsSuite.scala:309) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39633) Dataframe options for time travel via `timestampAsOf` should respect both formats of specifying timestamp
[ https://issues.apache.org/jira/browse/SPARK-39633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-39633: --- Issue Type: Improvement (was: Bug) > Dataframe options for time travel via `timestampAsOf` should respect both > formats of specifying timestamp > - > > Key: SPARK-39633 > URL: https://issues.apache.org/jira/browse/SPARK-39633 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Priority: Minor > > presently spark sql query for time travel like : > {{SELECT * from \{table} TIMESTAMP AS OF 1548751078 }} > works correctly, which is what is specified in sql grammar as well (((FOR > SYSTEM_VERSION) | VERSION) AS OF version=(INTEGER_VALUE | STRING)), but when > trying to do the same via dataframe option `timestampAsOf` the code fails > with : > {quote}[info] org.apache.spark.sql.AnalysisException: '1548751078' is not a > valid timestamp expression for time travel. > [info] at > org.apache.spark.sql.errors.QueryCompilationErrors$.invalidTimestampExprForTimeTravel(QueryCompilationErrors.scala:2413) > [info] at > org.apache.spark.sql.catalyst.analysis.TimeTravelSpec$.create(TimeTravelSpec.scala:55) > [info] at > org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:128) > [info] at > org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209) > [info] at scala.Option.flatMap(Option.scala:271) > [info] at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207) > [info] at > org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.load(SupportsCatalogOptionsSuite.scala:365) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$33(SupportsCatalogOptionsSuite.scala:329) > [info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:133) > [info] at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:158) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$30(SupportsCatalogOptionsSuite.scala:329) > [info] at > scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) > [info] at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:306) > [info] at > org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:304) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.withTable(SupportsCatalogOptionsSuite.scala:44) > [info] at > org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$26(SupportsCatalogOptionsSuite.scala:309) > [info] at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85) > [info] at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83) > [info] at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104) > [info] at org.scalatest.Transformer.apply(Transformer.scala:22) > [info] at org.scalatest.Transformer.apply(Transformer.scala:20) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190) > [info] at > org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200) > [info] at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200) > [info] at > org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182) > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39528) Use V2 Filter in SupportsRuntimeFiltering
Huaxin Gao created SPARK-39528: -- Summary: Use V2 Filter in SupportsRuntimeFiltering Key: SPARK-39528 URL: https://issues.apache.org/jira/browse/SPARK-39528 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao Currently, SupportsRuntimeFiltering uses v1 filter. We should use v2 filter instead. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39417) Handle Null partition values in PartitioningUtils
[ https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39417. Fix Version/s: 3.3.0 3.4.0 Resolution: Fixed > Handle Null partition values in PartitioningUtils > - > > Key: SPARK-39417 > URL: https://issues.apache.org/jira/browse/SPARK-39417 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Prashant Singh >Assignee: Prashant Singh >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > partitions with null values we get a NPE on partition discovery, earlier we > use to get `DEFAULT_PARTITION_NAME` > > {quote} [info] java.lang.NullPointerException: > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362) > [info] at > org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355) > [info] at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286) > [info] at scala.collection.Iterator.foreach(Iterator.scala:943) > [info] at scala.collection.Iterator.foreach$(Iterator.scala:943){quote} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39393) Parquet data source only supports push-down predicate filters for non-repeated primitive types
[ https://issues.apache.org/jira/browse/SPARK-39393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39393. Fix Version/s: 3.1.3 3.3.0 3.2.2 3.4.0 Assignee: Amin Borjian Resolution: Fixed > Parquet data source only supports push-down predicate filters for > non-repeated primitive types > -- > > Key: SPARK-39393 > URL: https://issues.apache.org/jira/browse/SPARK-39393 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1 >Reporter: Amin Borjian >Assignee: Amin Borjian >Priority: Major > Labels: parquet > Fix For: 3.1.3, 3.3.0, 3.2.2, 3.4.0 > > > I use an example to illustrate the problem. The reason for the problem and > the problem-solving approach are stated below. > Assume follow Protocol buffer schema: > {code:java} > message Model { > string name = 1; > repeated string keywords = 2; > } > {code} > Suppose a parquet file is created from a set of records in the above format > with the help of the {{parquet-protobuf}} library. > Using Spark version 3.0.2 or older, we could run the following query using > {{{}spark-shell{}}}: > {code:java} > val data = spark.read.parquet("/path/to/parquet") > data.registerTempTable("models") > spark.sql("select * from models where array_contains(keywords, > 'X')").show(false) > {code} > But after updating Spark, we get the following error: > {code:java} > Caused by: java.lang.IllegalArgumentException: FilterPredicates do not > currently support repeated columns. Column keywords is repeated. > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56) > at > org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192) > at > org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61) > at > org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95) > at > org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45) > at > org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149) > at > org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72) > at > org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870) > at > org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789) > at > org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657) > at > org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162) > at > org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140) > at > org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373) > at > org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127) > ... > {code} > At first it seems the problem is the parquet library. But in fact, our > problem is because of this line that has been around since 2014 (based on Git > history): > [Parquet Schema Compatibility > Validator|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java#L194] > After some check, I notice that the cause of the problem is due to a change > in the data filtering conditions: > {code:java} > spark.sql("select * from log where array_contains(keywords, > 'X')").explain(true); > // Spark 3.0.2 and older > == Physical Plan == > ... > +- FileScan parquet [link#0,keywords#1] > DataFilters: [array_contains(keywords#1, Google)] > PushedFilters: [] > ... > // Spark 3.1.0 and newer > == Physical Plan == ... > +- FileScan parquet [link#0,keywords#1] > DataFilters: [isnotnull(keywords#1), array_contains(keywords#1, Google)] > PushedFilters: [IsNotNull(keywords)] > ...{code} > It's good that the filtering section has become smarter. Unfortunately, due > to unfamiliarity with code base, I could not find the exact location of the > change and
[jira] [Resolved] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite
[ https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39413. Fix Version/s: 3.4.0 Assignee: jiaan.geng Resolution: Fixed > Capitalize sql keywords in JDBCV2Suite > -- > > Key: SPARK-39413 > URL: https://issues.apache.org/jira/browse/SPARK-39413 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > JDBCV2Suite exists some test case which uses sql keywords without capitalized. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39390) Hide and optimize `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` fron INFO log
[ https://issues.apache.org/jira/browse/SPARK-39390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39390. Fix Version/s: 3.4.0 Assignee: qian Resolution: Fixed > Hide and optimize `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` > fron INFO log > --- > > Key: SPARK-39390 > URL: https://issues.apache.org/jira/browse/SPARK-39390 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: qian >Assignee: qian >Priority: Minor > Fix For: 3.4.0 > > > This issue aims to hide and optimize > `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` fron INFO log. > {code:java} > 2022-06-02 22:02:48.328 - stderr> 22/06/03 05:02:48 INFO SecurityManager: > SecurityManager: authentication disabled; ui acls disabled; users with view > permissions: Set(root); groups with view permissions: Set(); users with > modify permissions: Set(root); groups with modify permissions: Set(){code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39312) Use Parquet in predicate for Spark In filter
Huaxin Gao created SPARK-39312: -- Summary: Use Parquet in predicate for Spark In filter Key: SPARK-39312 URL: https://issues.apache.org/jira/browse/SPARK-39312 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4 Reporter: Huaxin Gao Since now Parquet supports its native in predicate, we want to simplify the current In predicate filter pushdown using Parquet's native in predicate. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37219) support AS OF syntax
[ https://issues.apache.org/jira/browse/SPARK-37219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537627#comment-17537627 ] Huaxin Gao commented on SPARK-37219: Correct. > support AS OF syntax > > > Key: SPARK-37219 > URL: https://issues.apache.org/jira/browse/SPARK-37219 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > > https://docs.databricks.com/delta/quick-start.html#query-an-earlier-version-of-the-table-time-travel > Delta Lake time travel allows user to query an older snapshot of a Delta > table. To query an older version of a table, user needs to specify a version > or timestamp in a SELECT statement using AS OF syntax as the follows > SELECT * FROM default.people10m VERSION AS OF 0; > SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58'; > This ticket is opened to add AS OF syntax in Spark -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-37219) support AS OF syntax
[ https://issues.apache.org/jira/browse/SPARK-37219 ] Huaxin Gao deleted comment on SPARK-37219: was (Author: JIRAUSER284812): h2. Bagaimana Dengan Sistem Keposlot? [Keposlot|https://165.22.216.152/] Menjalankan sistem dengan 1 user id dapat bermain semua permainan yang tersedia. Adapun Keposlot menyediakan transaksi melalu bank transfer dan ewallet lainnya. Anda dapat mencari situs keposlot melalui google dengan cari ketik pencarian keposlot maka akan muncul dan terhubung dengan web situs resmi Keposlot. h2. Slot Online Terpercaya Di Keposlot? Keposlot adalah situs resmi [slot online terpercaya|https://165.22.216.152/] di indonesia. Keposlot memiliki permainan slot terbanyak yaitu [Pragmatic Play|https://165.22.216.152/] dan slot online lainnya yang selalu menciptakan permainan baru. Keposlot dipercaya para player karena sistem deposit yang sangat cepat diproses, Adapun Withdraw berapa pun akan diproses dengan cepat. Maka dari itu situs ini dipercaya dan tidak perlu dicemas kan dalam masalah withdraw. Segala keamanan data akan di simpan sedemikian kita menjaga rahasia para pemain. > support AS OF syntax > > > Key: SPARK-37219 > URL: https://issues.apache.org/jira/browse/SPARK-37219 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > > https://docs.databricks.com/delta/quick-start.html#query-an-earlier-version-of-the-table-time-travel > Delta Lake time travel allows user to query an older snapshot of a Delta > table. To query an older version of a table, user needs to specify a version > or timestamp in a SELECT statement using AS OF syntax as the follows > SELECT * FROM default.people10m VERSION AS OF 0; > SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58'; > This ticket is opened to add AS OF syntax in Spark -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-37219) support AS OF syntax
[ https://issues.apache.org/jira/browse/SPARK-37219 ] Huaxin Gao deleted comment on SPARK-37219: was (Author: JIRAUSER284812): Situs Resmi Judi Togel Online Terlengkap [Acctoto|https://katie-cassidy.us/] adalah situs togel online terlengkap di indonesia. Situs acctoto merupakan situs [Tebak Angka|https://tsagaandarium.org/] terlengkap dalam bermain judi [togel online|https://katie-cassidy.us/]. Acctoto adalah bandar togel online terpercaya yang menyediakan permainan tebak angka atau togel online yang dikeluarkan togel online Singapure, togel online Cambodia, togel online Sydney dan togel online Hongkongpools. Sistem Referal Untuk Member Sistem Referral Acctoto juga memberikan untuk anda yang menginginkan pendapatan tambahan setiap harinya. Daftar dan bergabung sekarang juga di Acctoto bandar togel online terpercaya di Indonesia. Maka anda bisa merecomendasikan website kita yang untuk mendapatkan tambahan nilai saldo anda dan bisa di withdrawkan kapan waktu yang anda inginkan. Situs Acctoto Adalah [Agen Hoki Togel Online|https://katie-cassidy.us/] karena tidak ada [bandar Togel Indonesia|https://katie-cassidy.us/] yang memberikan kenyaman anda dalam bermainan di Acctoto. Untuk hadiah yang diberikan kepada member sangatlah berkualitas dan memuaskan bagi para member sejati kita. > support AS OF syntax > > > Key: SPARK-37219 > URL: https://issues.apache.org/jira/browse/SPARK-37219 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > Fix For: 3.3.0 > > > https://docs.databricks.com/delta/quick-start.html#query-an-earlier-version-of-the-table-time-travel > Delta Lake time travel allows user to query an older snapshot of a Delta > table. To query an older version of a table, user needs to specify a version > or timestamp in a SELECT statement using AS OF syntax as the follows > SELECT * FROM default.people10m VERSION AS OF 0; > SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58'; > This ticket is opened to add AS OF syntax in Spark -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39156) Remove ParquetLogRedirector usage from ParquetFileFormat
[ https://issues.apache.org/jira/browse/SPARK-39156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39156. Fix Version/s: 3.4.0 Assignee: Yang Jie Resolution: Fixed > Remove ParquetLogRedirector usage from ParquetFileFormat > > > Key: SPARK-39156 > URL: https://issues.apache.org/jira/browse/SPARK-39156 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > Spark only uses parquet 1.12.2 and no longer relies on parquet version 1.6, > It seems that the ParquetLogRedirector is no longer needed -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.
[ https://issues.apache.org/jira/browse/SPARK-39162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39162. Fix Version/s: 3.4.0 Assignee: jiaan.geng Resolution: Fixed > Jdbc dialect should decide which function could be pushed down. > --- > > Key: SPARK-39162 > URL: https://issues.apache.org/jira/browse/SPARK-39162 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0 > > > Regardless of whether the functions are ANSI or not, most databases are > actually unsure of their support. > So we should add a new API into JdbcDialect so that Jdbc dialect could decide > which function could be pushed down. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37259) JDBC read is always going to wrap the query in a select statement
[ https://issues.apache.org/jira/browse/SPARK-37259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37259. Fix Version/s: 3.4.0 Assignee: Peter Toth Resolution: Fixed > JDBC read is always going to wrap the query in a select statement > - > > Key: SPARK-37259 > URL: https://issues.apache.org/jira/browse/SPARK-37259 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.2 >Reporter: Kevin Appel >Assignee: Peter Toth >Priority: Major > Fix For: 3.4.0 > > > The read jdbc is wrapping the query it sends to the database server inside a > select statement and there is no way to override this currently. > Initially I ran into this issue when trying to run a CTE query against SQL > server and it fails, the details of the failure is in these cases: > [https://github.com/microsoft/mssql-jdbc/issues/1340] > [https://github.com/microsoft/mssql-jdbc/issues/1657] > [https://github.com/microsoft/sql-spark-connector/issues/147] > https://issues.apache.org/jira/browse/SPARK-32825 > https://issues.apache.org/jira/browse/SPARK-34928 > I started to patch the code to get the query to run and ran into a few > different items, if there is a way to add these features to allow this code > path to run, this would be extremely helpful to running these type of edge > case queries. These are basic examples here the actual queries are much more > complex and would require significant time to rewrite. > Inside JDBCOptions.scala the query is being set to either, using the dbtable > this allows the query to be passed without modification > > {code:java} > name.trim > or > s"(${subquery}) SPARK_GEN_SUBQ_${curId.getAndIncrement()}" > {code} > > Inside JDBCRelation.scala this is going to try to get the schema for this > query, and this ends up running dialect.getSchemaQuery which is doing: > {code:java} > s"SELECT * FROM $table WHERE 1=0"{code} > Overriding the dialect here and initially just passing back the $table gets > passed here and to the next issue which is in the compute function in > JDBCRDD.scala > > {code:java} > val sqlText = s"SELECT $columnList FROM ${options.tableOrQuery} > $myTableSampleClause" + s" $myWhereClause $getGroupByClause $myLimitClause" > > {code} > > For these two queries, about a CTE query and using temp tables, finding out > the schema is difficult without actually running the query and for the temp > table if you run it in the schema check that will have the table now exist > and fail when it runs the actual query. > > The way I patched these is by doing these two items: > JDBCRDD.scala (compute) > > {code:java} > val runQueryAsIs = options.parameters.getOrElse("runQueryAsIs", > "false").toBoolean > val sqlText = if (runQueryAsIs) { > s"${options.tableOrQuery}" > } else { > s"SELECT $columnList FROM ${options.tableOrQuery} $myWhereClause" > } > {code} > JDBCRelation.scala (getSchema) > {code:java} > val useCustomSchema = jdbcOptions.parameters.getOrElse("useCustomSchema", > "false").toBoolean > if (useCustomSchema) { > val myCustomSchema = jdbcOptions.parameters.getOrElse("customSchema", > "").toString > val newSchema = CatalystSqlParser.parseTableSchema(myCustomSchema) > logInfo(s"Going to return the new $newSchema because useCustomSchema is > $useCustomSchema and passed in $myCustomSchema") > newSchema > } else { > val tableSchema = JDBCRDD.resolveTable(jdbcOptions) > jdbcOptions.customSchema match { > case Some(customSchema) => JdbcUtils.getCustomSchema( > tableSchema, customSchema, resolver) > case None => tableSchema > } > }{code} > > This is allowing the query to run as is, by using the dbtable option and then > provide a custom schema that will bypass the dialect schema check > > Test queries > > {code:java} > query1 = """ > SELECT 1 as DummyCOL > """ > query2 = """ > WITH DummyCTE AS > ( > SELECT 1 as DummyCOL > ) > SELECT * > FROM DummyCTE > """ > query3 = """ > (SELECT * > INTO #Temp1a > FROM > (SELECT @@VERSION as version) data > ) > (SELECT * > FROM > #Temp1a) > """ > {code} > > Test schema > > {code:java} > schema1 = """ > DummyXCOL INT > """ > schema2 = """ > DummyXCOL STRING > """ > {code} > > Test code > > {code:java} > jdbcDFWorking = ( > spark.read.format("jdbc") > .option("url", > f"jdbc:sqlserver://{server}:{port};databaseName={database};") > .option("user", user) > .option("password", password) > .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver") > .option("dbtable", queryx) > .option("customSchema", schemax) > .option("useCustomSchema", "true") > .option("runQueryAsIs", "true") >
[jira] [Resolved] (SPARK-39116) Replcace double negation in exists with forall
[ https://issues.apache.org/jira/browse/SPARK-39116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-39116. Fix Version/s: 3.4.0 Assignee: Yang Jie (was: Apache Spark) Resolution: Fixed > Replcace double negation in exists with forall > -- > > Key: SPARK-39116 > URL: https://issues.apache.org/jira/browse/SPARK-39116 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > > Some code in Spark as follows: > {code:java} > !Seq(1, 2).exists(x => !condition(x)) {code} > can replace with > {code:java} > Seq(1, 2).forall(x => condition(x)) {code} > for code simplification > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39011) V2 Filter to ORC Predicate support
[ https://issues.apache.org/jira/browse/SPARK-39011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-39011: --- Summary: V2 Filter to ORC Predicate support (was: V2 Filter to ORC Filter support) > V2 Filter to ORC Predicate support > -- > > Key: SPARK-39011 > URL: https://issues.apache.org/jira/browse/SPARK-39011 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4 >Reporter: Huaxin Gao >Priority: Major > > add V2 filter to ORC predicate support -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39011) V2 Filter to ORC Filter support
Huaxin Gao created SPARK-39011: -- Summary: V2 Filter to ORC Filter support Key: SPARK-39011 URL: https://issues.apache.org/jira/browse/SPARK-39011 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4 Reporter: Huaxin Gao add V2 filter to ORC predicate support -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39010) V2 Filter to Parquet Predicate support
Huaxin Gao created SPARK-39010: -- Summary: V2 Filter to Parquet Predicate support Key: SPARK-39010 URL: https://issues.apache.org/jira/browse/SPARK-39010 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4 Reporter: Huaxin Gao Add support for V2 Filter to Parquet Predicate -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38950) Return Array of Predicate for SupportsPushDownCatalystFilters.pushedFilters
Huaxin Gao created SPARK-38950: -- Summary: Return Array of Predicate for SupportsPushDownCatalystFilters.pushedFilters Key: SPARK-38950 URL: https://issues.apache.org/jira/browse/SPARK-38950 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0, 3.4.0 Reporter: Huaxin Gao in SupportsPushDownCatalystFilters, change {code:java} def pushedFilters: Array[Filter] {code} to {code:java} def pushedFilters: Array[Predicate] {code} -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38865) Update document of JDBC options for pushDownAggregate and pushDownLimit
[ https://issues.apache.org/jira/browse/SPARK-38865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-38865. Fix Version/s: 3.3.0 3.4.0 Assignee: jiaan.geng Resolution: Fixed > Update document of JDBC options for pushDownAggregate and pushDownLimit > --- > > Key: SPARK-38865 > URL: https://issues.apache.org/jira/browse/SPARK-38865 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0, 3.4.0 > > > Because the DS v2 pushdown framework refactored, we need to add more doc in > sql-data-sources-jdbc.md to reflect the new changes. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38825) Add a test to cover parquet notIn filter
Huaxin Gao created SPARK-38825: -- Summary: Add a test to cover parquet notIn filter Key: SPARK-38825 URL: https://issues.apache.org/jira/browse/SPARK-38825 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Huaxin Gao Add a test to cover parquet filter notIn -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite
Huaxin Gao created SPARK-38779: -- Summary: Unify the pushed operator checking between FileSource test suite and JDBC test suite Key: SPARK-38779 URL: https://issues.apache.org/jira/browse/SPARK-38779 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0, 3.4.0 Reporter: Huaxin Gao In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. Will do the same for FileSourceAggregatePushDownSuite {code:java} private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): Unit = { df.queryExecution.optimizedPlan.collect { case _: DataSourceV2ScanRelation => checkKeywordsExistsInExplain(df, expectedPlanFragment) } } {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38643) Validate input dataset of ml.regression
[ https://issues.apache.org/jira/browse/SPARK-38643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-38643. Assignee: zhengruifeng Resolution: Fixed > Validate input dataset of ml.regression > --- > > Key: SPARK-38643 > URL: https://issues.apache.org/jira/browse/SPARK-38643 > Project: Spark > Issue Type: Sub-task > Components: ML >Affects Versions: 3.4.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38546) replace deprecated ChiSqSelector with UnivariateFeatureSelector
[ https://issues.apache.org/jira/browse/SPARK-38546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-38546. Resolution: Implemented > replace deprecated ChiSqSelector with UnivariateFeatureSelector > --- > > Key: SPARK-38546 > URL: https://issues.apache.org/jira/browse/SPARK-38546 > Project: Spark > Issue Type: Improvement > Components: Examples >Affects Versions: 3.1.2, 3.2.0, 3.2.1 >Reporter: qian >Priority: Major > > UnivariateFeatureSelector was added and ChiSqSelector was labeled as > deprecated in > SPARK-34080 > So we need replace deprecated ChiSqSelector with UnivariateFeatureSelector. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38414) Remove redundant SuppressWarnings
[ https://issues.apache.org/jira/browse/SPARK-38414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-38414. Fix Version/s: 3.3.0 Assignee: Yang Jie Resolution: Fixed > Remove redundant SuppressWarnings > - > > Key: SPARK-38414 > URL: https://issues.apache.org/jira/browse/SPARK-38414 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38269) Clean up redundant type cast
[ https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-38269. Fix Version/s: 3.3.0 Assignee: Yang Jie Resolution: Fixed > Clean up redundant type cast > > > Key: SPARK-38269 > URL: https://issues.apache.org/jira/browse/SPARK-38269 > Project: Spark > Issue Type: Improvement > Components: Spark Core, SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced
[ https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-36553. Fix Version/s: 3.1.3 3.3.0 3.2.2 Assignee: zhengruifeng Resolution: Fixed > KMeans fails with NegativeArraySizeException for K = 5 after issue #27758 > was introduced > > > Key: SPARK-36553 > URL: https://issues.apache.org/jira/browse/SPARK-36553 > Project: Spark > Issue Type: Bug > Components: ML, MLlib, PySpark >Affects Versions: 3.1.1 >Reporter: Anders Rydbirk >Assignee: zhengruifeng >Priority: Major > Fix For: 3.1.3, 3.3.0, 3.2.2 > > > We are running KMeans on approximately 350M rows of x, y, z coordinates using > the following configuration: > {code:java} > KMeans( > featuresCol='features', > predictionCol='centroid_id', > k=5, > initMode='k-means||', > initSteps=2, > tol=0.5, > maxIter=20, > seed=SEED, > distanceMeasure='euclidean' > ) > {code} > When using Spark 3.0.0 this worked fine, but when upgrading to 3.1.1 we are > consistently getting errors unless we reduce K. > Stacktrace: > > {code:java} > An error occurred while calling o167.fit.An error occurred while calling > o167.fit.: java.lang.NegativeArraySizeException: -897458648 at > scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at > scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at > scala.Array$.ofDim(Array.scala:221) at > org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52) > at > org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280) > at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) > at org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at > org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191) > at scala.util.Try$.apply(Try.scala:213) at > org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191) > at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native > Method) at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown > Source) at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown > Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at > py4j.Gateway.invoke(Gateway.java:282) at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at > py4j.commands.CallCommand.execute(CallCommand.java:79) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.base/java.lang.Thread.run(Unknown Source) > {code} > > The issue is introduced by > [#27758|#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]] > which significantly reduces the maximum value of K. Snippit of line that > throws error from [DistanceMeasure.scala:|#L52]] > {code:java} > val packedValues = Array.ofDim[Double](k * (k + 1) / 2) > {code} > > *What we have tried:* > * Reducing iterations > * Reducing input volume > * Reducing K > Only reducing K have yielded success. > > *Possible workaround:* > # Roll back to Spark 3.0.0 since a KMeansModel generated with 3.0.0 cannot > be loaded in 3.1.1. > # Reduce K. Currently trying with 45000. > > *What we don't understand*: > Given the line of code above, we do not understand why we would get an > integer overflow. > For K=50,000, packedValues should be allocated with the size of 1,250,025,000 > < (2^31) and not result in a negative array size. > > *Suggested resolution:* > I'm not strong in the inner workings on KMeans, but my immediate thought > would be to add a fallback to previous logic for K larger than a set > threshold if the optimisation is to stay in place, as it breaks compatibility > from 3.0.0 to 3.1.1 for edge cases. > > Please let me know if more information is needed, this is my first time > raising a bug for a OS. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)
[ https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499326#comment-17499326 ] Huaxin Gao commented on SPARK-38357: I will submit a PR soon. > StackOverflowError with OR(data filter, partition filter) > - > > Key: SPARK-38357 > URL: https://issues.apache.org/jira/browse/SPARK-38357 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Huaxin Gao >Priority: Major > > If the filter has OR and contains both data filter and partition filter, > e.g. p is partition col and id is data col > {code:java} > SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) > {code} > throws StackOverflowError -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)
Huaxin Gao created SPARK-38357: -- Summary: StackOverflowError with OR(data filter, partition filter) Key: SPARK-38357 URL: https://issues.apache.org/jira/browse/SPARK-38357 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Huaxin Gao If the filter has OR and contains both data filter and partition filter, e.g. p is partition col and id is data col {code:java} SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) {code} throws StackOverflowError -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-38100) Remove unused method in `Decimal`
[ https://issues.apache.org/jira/browse/SPARK-38100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-38100. Fix Version/s: 3.2.2 3.3 Resolution: Fixed > Remove unused method in `Decimal` > - > > Key: SPARK-38100 > URL: https://issues.apache.org/jira/browse/SPARK-38100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Trivial > Fix For: 3.2.2, 3.3 > > > there is a unused method `overflowException` in > `org.apache.spark.sql.types.Decimal`. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-38100) Remove unused method in `Decimal`
[ https://issues.apache.org/jira/browse/SPARK-38100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-38100: -- Assignee: Yang Jie > Remove unused method in `Decimal` > - > > Key: SPARK-38100 > URL: https://issues.apache.org/jira/browse/SPARK-38100 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Trivial > Fix For: 3.2.2, 3.3 > > > there is a unused method `overflowException` in > `org.apache.spark.sql.types.Decimal`. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-30062) bug with DB2Driver using mode("overwrite") option("truncate",True)
[ https://issues.apache.org/jira/browse/SPARK-30062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-30062: -- Assignee: Ivan Karol > bug with DB2Driver using mode("overwrite") option("truncate",True) > -- > > Key: SPARK-30062 > URL: https://issues.apache.org/jira/browse/SPARK-30062 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: Guy Huinen >Assignee: Ivan Karol >Priority: Major > Labels: db2, pyspark > Fix For: 3.2.2, 3.3 > > > using DB2Driver using mode("overwrite") option("truncate",True) gives sql > error > > {code:java} > dfClient.write\ > .format("jdbc")\ > .mode("overwrite")\ > .option('driver', 'com.ibm.db2.jcc.DB2Driver')\ > .option("url","jdbc:db2://")\ > .option("user","xxx")\ > .option("password","")\ > .option("dbtable","")\ > .option("truncate",True)\{code} > > gives the error below > in summary i belief the semicolon is misplaced or malformated > > {code:java} > EXPO.EXPO#CMR_STG;IMMEDIATE{code} > > > full error > {code:java} > An error occurred while calling o47.save. : > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=END-OF-STATEMENT;LE EXPO.EXPO#CMR_STG;IMMEDIATE, > DRIVER=4.19.77 at com.ibm.db2.jcc.am.b4.a(b4.java:747) at > com.ibm.db2.jcc.am.b4.a(b4.java:66) at com.ibm.db2.jcc.am.b4.a(b4.java:135) > at com.ibm.db2.jcc.am.kh.c(kh.java:2788) at > com.ibm.db2.jcc.am.kh.d(kh.java:2776) at > com.ibm.db2.jcc.am.kh.b(kh.java:2143) at com.ibm.db2.jcc.t4.ab.i(ab.java:226) > at com.ibm.db2.jcc.t4.ab.c(ab.java:48) at com.ibm.db2.jcc.t4.p.b(p.java:38) > at com.ibm.db2.jcc.t4.av.h(av.java:124) at > com.ibm.db2.jcc.am.kh.ak(kh.java:2138) at > com.ibm.db2.jcc.am.kh.a(kh.java:3325) at com.ibm.db2.jcc.am.kh.c(kh.java:765) > at com.ibm.db2.jcc.am.kh.executeUpdate(kh.java:744) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.truncateTable(JdbcUtils.scala:113) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:56) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at > py4j.Gateway.invoke(Gateway.java:282) at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at > py4j.commands.CallCommand.execute(CallCommand.java:79) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at > java.lang.Thread.run(Thread.java:748){code}
[jira] [Resolved] (SPARK-30062) bug with DB2Driver using mode("overwrite") option("truncate",True)
[ https://issues.apache.org/jira/browse/SPARK-30062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-30062. Fix Version/s: 3.2.2 3.3 Resolution: Fixed > bug with DB2Driver using mode("overwrite") option("truncate",True) > -- > > Key: SPARK-30062 > URL: https://issues.apache.org/jira/browse/SPARK-30062 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 2.4.4 >Reporter: Guy Huinen >Priority: Major > Labels: db2, pyspark > Fix For: 3.2.2, 3.3 > > > using DB2Driver using mode("overwrite") option("truncate",True) gives sql > error > > {code:java} > dfClient.write\ > .format("jdbc")\ > .mode("overwrite")\ > .option('driver', 'com.ibm.db2.jcc.DB2Driver')\ > .option("url","jdbc:db2://")\ > .option("user","xxx")\ > .option("password","")\ > .option("dbtable","")\ > .option("truncate",True)\{code} > > gives the error below > in summary i belief the semicolon is misplaced or malformated > > {code:java} > EXPO.EXPO#CMR_STG;IMMEDIATE{code} > > > full error > {code:java} > An error occurred while calling o47.save. : > com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, > SQLSTATE=42601, SQLERRMC=END-OF-STATEMENT;LE EXPO.EXPO#CMR_STG;IMMEDIATE, > DRIVER=4.19.77 at com.ibm.db2.jcc.am.b4.a(b4.java:747) at > com.ibm.db2.jcc.am.b4.a(b4.java:66) at com.ibm.db2.jcc.am.b4.a(b4.java:135) > at com.ibm.db2.jcc.am.kh.c(kh.java:2788) at > com.ibm.db2.jcc.am.kh.d(kh.java:2776) at > com.ibm.db2.jcc.am.kh.b(kh.java:2143) at com.ibm.db2.jcc.t4.ab.i(ab.java:226) > at com.ibm.db2.jcc.t4.ab.c(ab.java:48) at com.ibm.db2.jcc.t4.p.b(p.java:38) > at com.ibm.db2.jcc.t4.av.h(av.java:124) at > com.ibm.db2.jcc.am.kh.ak(kh.java:2138) at > com.ibm.db2.jcc.am.kh.a(kh.java:3325) at com.ibm.db2.jcc.am.kh.c(kh.java:765) > at com.ibm.db2.jcc.am.kh.executeUpdate(kh.java:744) at > org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.truncateTable(JdbcUtils.scala:113) > at > org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:56) > at > org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68) > at > org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127) > at > org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155) > at > org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) > at > org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at > org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at > org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80) > at > org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676) > at > org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73) > at > org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at > org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285) > at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at > sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) at > py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at > py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at > py4j.Gateway.invoke(Gateway.java:282) at > py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at > py4j.commands.CallCommand.execute(CallCommand.java:79) at > py4j.GatewayConnection.run(GatewayConnection.java:238) at >
[jira] [Commented] (SPARK-37963) Need to update Partition URI after renaming table in InMemoryCatalog
[ https://issues.apache.org/jira/browse/SPARK-37963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479813#comment-17479813 ] Huaxin Gao commented on SPARK-37963: Changed the fix version to 3.2.2 for now. Will change back if RC2 fails. > Need to update Partition URI after renaming table in InMemoryCatalog > > > Key: SPARK-37963 > URL: https://issues.apache.org/jira/browse/SPARK-37963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > After renaming a partitioned table, select from the new table from > InMemoryCatalog will get an empty result. > The following checkAnswer will fail as the result is empty. > {code:java} > sql(s"create table foo(i int, j int) using PARQUET partitioned by (j)") > sql("insert into table foo partition(j=2) values (1)") > sql(s"alter table foo rename to bar") > checkAnswer(spark.table("bar"), Row(1, 2)) {code} > To fix the bug, we need to update Partition URI after renaming a table in > InMemoryCatalog > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37963) Need to update Partition URI after renaming table in InMemoryCatalog
[ https://issues.apache.org/jira/browse/SPARK-37963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37963: --- Fix Version/s: 3.2.2 (was: 3.2.1) > Need to update Partition URI after renaming table in InMemoryCatalog > > > Key: SPARK-37963 > URL: https://issues.apache.org/jira/browse/SPARK-37963 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Major > Fix For: 3.3.0, 3.2.2 > > > After renaming a partitioned table, select from the new table from > InMemoryCatalog will get an empty result. > The following checkAnswer will fail as the result is empty. > {code:java} > sql(s"create table foo(i int, j int) using PARQUET partitioned by (j)") > sql("insert into table foo partition(j=2) values (1)") > sql(s"alter table foo rename to bar") > checkAnswer(spark.table("bar"), Row(1, 2)) {code} > To fix the bug, we need to update Partition URI after renaming a table in > InMemoryCatalog > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37959) Fix the UT of checking norm in KMeans & BiKMeans
[ https://issues.apache.org/jira/browse/SPARK-37959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37959. Fix Version/s: 3.2.1 3.3.0 Assignee: zhengruifeng (was: Apache Spark) Resolution: Fixed > Fix the UT of checking norm in KMeans & BiKMeans > > > Key: SPARK-37959 > URL: https://issues.apache.org/jira/browse/SPARK-37959 > Project: Spark > Issue Type: Improvement > Components: ML >Affects Versions: 3.3.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > Fix For: 3.2.1, 3.3.0 > > > In KMeansSuite and BisectingKMeansSuite, there are some unused lines: > > {code:java} > model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0 {code} > > For cosine distance, the norm of centering vector should be 1, so the norm > checking is meaningful; > For euclidean distance, the norm checking is meaningless; > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37923) Generate partition transforms for BucketSpec inside parser
Huaxin Gao created SPARK-37923: -- Summary: Generate partition transforms for BucketSpec inside parser Key: SPARK-37923 URL: https://issues.apache.org/jira/browse/SPARK-37923 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3 Reporter: Huaxin Gao We currently generate partition transforms for BucketSpec in Analyzer. It's cleaner to do this inside Parser. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37818: --- Fix Version/s: 3.2.1 > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.2.1, 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior
[ https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36717: --- Fix Version/s: (was: 3.2.0) > Wrong order of variable initialization may lead to incorrect behavior > - > > Key: SPARK-36717 > URL: https://issues.apache.org/jira/browse/SPARK-36717 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Jianmeng Li >Assignee: Jianmeng Li >Priority: Minor > Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > Incorrect order of variable initialization may lead to incorrect behavior, > Related code: > [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94] > , TorrentBroadCast will get wrong checksumEnabled value after > initialization, this may not be what we need, we can move L94 front of > setConf(SparkEnv.get.conf) to avoid this. > Supplement: > Snippet 1: > {code:java} > class Broadcast { > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > var checksumEnabled = false > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > false{code} > Snippet 2: > {code:java} > class Broadcast { > var checksumEnabled = false > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > true{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior
[ https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36717: --- Fix Version/s: 3.2.1 > Wrong order of variable initialization may lead to incorrect behavior > - > > Key: SPARK-36717 > URL: https://issues.apache.org/jira/browse/SPARK-36717 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Jianmeng Li >Assignee: Jianmeng Li >Priority: Minor > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.2.1, 3.3.0 > > > Incorrect order of variable initialization may lead to incorrect behavior, > Related code: > [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94] > , TorrentBroadCast will get wrong checksumEnabled value after > initialization, this may not be what we need, we can move L94 front of > setConf(SparkEnv.get.conf) to avoid this. > Supplement: > Snippet 1: > {code:java} > class Broadcast { > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > var checksumEnabled = false > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > false{code} > Snippet 2: > {code:java} > class Broadcast { > var checksumEnabled = false > def setConf(): Unit = { > checksumEnabled = true > } > setConf() > } > println(new Broadcast().checksumEnabled){code} > output: > {code:java} > true{code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36979) Add RewriteLateralSubquery rule into nonExcludableRules
[ https://issues.apache.org/jira/browse/SPARK-36979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36979: --- Fix Version/s: 3.2.1 (was: 3.2.0) > Add RewriteLateralSubquery rule into nonExcludableRules > --- > > Key: SPARK-36979 > URL: https://issues.apache.org/jira/browse/SPARK-36979 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: XiDuo You >Priority: Minor > Fix For: 3.2.1 > > > Lateral Join has no meaning without rule `RewriteLateralSubquery`. So now if > we set > `spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery`, > the lateral join query will fail with: > {code:java} > java.lang.AssertionError: assertion failed: No plan for LateralJoin > lateral-subquery#218 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.
[ https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-33277: --- Fix Version/s: 3.2.1 > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > > > Key: SPARK-33277 > URL: https://issues.apache.org/jira/browse/SPARK-33277 > Project: Spark > Issue Type: Bug > Components: PySpark, SQL >Affects Versions: 2.4.7, 3.0.1 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Fix For: 2.4.8, 3.0.2, 3.1.0, 3.2.1 > > > Python/Pandas UDF right after off-heap vectorized reader could cause executor > crash. > E.g.,: > {code:java} > spark.range(0, 10, 1, 1).write.parquet(path) > spark.conf.set("spark.sql.columnVector.offheap.enabled", True) > def f(x): > return 0 > fUdf = udf(f, LongType()) > spark.read.parquet(path).select(fUdf('id')).head() > {code} > This is because, the Python evaluation consumes the parent iterator in a > separate thread and it consumes more data from the parent even after the task > ends and the parent is closed. If an off-heap column vector exists in the > parent iterator, it could cause segmentation fault which crashes the executor. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36464: --- Fix Version/s: 3.2.0 > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.2.0, 3.1.3, 3.0.4, 3.2.1 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data
[ https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-36464: --- Fix Version/s: 3.2.1 (was: 3.2.0) > Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream > for Writing Over 2GB Data > -- > > Key: SPARK-36464 > URL: https://issues.apache.org/jira/browse/SPARK-36464 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0 >Reporter: Kazuyuki Tanimura >Assignee: Kazuyuki Tanimura >Priority: Major > Fix For: 3.1.3, 3.0.4, 3.2.1 > > > The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; > however, the underlying `_size` variable is initialized as `Int`. > That causes an overflow and returns a negative size when over 2GB data is > written into `ChunkedByteBufferOutputStream` -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-30789: --- Fix Version/s: (was: 3.2.0) > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.1 > > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI
[ https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-34399: --- Fix Version/s: 3.2.1 (was: 3.2.0) > Add file commit time to metrics and shown in SQL Tab UI > --- > > Key: SPARK-34399 > URL: https://issues.apache.org/jira/browse/SPARK-34399 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Major > Fix For: 3.2.1 > > > Add file commit time to metrics and shown in SQL Tab UI -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35714) Bug fix for deadlock during the executor shutdown
[ https://issues.apache.org/jira/browse/SPARK-35714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-35714: --- Fix Version/s: 3.2.1 > Bug fix for deadlock during the executor shutdown > - > > Key: SPARK-35714 > URL: https://issues.apache.org/jira/browse/SPARK-35714 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.1.2 >Reporter: Wan Kun >Assignee: Wan Kun >Priority: Minor > Fix For: 3.0.3, 3.2.0, 3.1.3, 3.2.1 > > Attachments: three_thread_lock.log > > > When a executor received a TERM signal, it (the second TERM signal) will lock > java.lang.Shutdown class and then call Shutdown.exit() method to exit the JVM. > Shutdown will call SparkShutdownHook to shutdown the executor. > During the executor shutdown phase, RemoteProcessDisconnected event will be > send to the RPC inbox, and then WorkerWatcher will try to call > System.exit(-1) again. > Because java.lang.Shutdown has already locked, a deadlock has occurred. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
[ https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-30789: --- Fix Version/s: 3.2.1 > Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE > -- > > Key: SPARK-30789 > URL: https://issues.apache.org/jira/browse/SPARK-30789 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.2.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > Fix For: 3.2.0, 3.2.1 > > > All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS > | RESPECT NULLS. For example: > {code:java} > LEAD (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > LAG (value_expr [, offset ]) > [ IGNORE NULLS | RESPECT NULLS ] > OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code} > > {code:java} > NTH_VALUE (expr, offset) > [ IGNORE NULLS | RESPECT NULLS ] > OVER > ( [ PARTITION BY window_partition ] > [ ORDER BY window_ordering > frame_clause ] ){code} > > *Oracle:* > [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0] > *Redshift* > [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html] > *Presto* > [https://prestodb.io/docs/current/functions/window.html] > *DB2* > [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm] > *Teradata* > [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w] > *Snowflake* > [https://docs.snowflake.com/en/sql-reference/functions/lead.html] > [https://docs.snowflake.com/en/sql-reference/functions/lag.html] > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472183#comment-17472183 ] Huaxin Gao commented on SPARK-37818: [~Gengliang.Wang] version 3.2.2 doesn't exist yet. I will just set the version to 3.3.0 for now. Will update the version to 3.2.2 later. > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37818: --- Fix Version/s: (was: 3.2.1) > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37818) Add option for show create table command
[ https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472181#comment-17472181 ] Huaxin Gao commented on SPARK-37818: [~Gengliang.Wang] I am drafting the 3.2.1 voting email now. I will need to change the fixed version to 3.2.2, otherwise, the list of bug fixes will contain this one. I will change this back to 3.2.1 if RC1 doesn't pass. > Add option for show create table command > > > Key: SPARK-37818 > URL: https://issues.apache.org/jira/browse/SPARK-37818 > Project: Spark > Issue Type: Documentation > Components: Documentation >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: PengLei >Priority: Trivial > Fix For: 3.2.1, 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down
[ https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37802: --- Fix Version/s: 3.2.1 (was: 3.2.0) > composite field name like `field name` doesn't work with Aggregate push down > > > Key: SPARK-37802 > URL: https://issues.apache.org/jira/browse/SPARK-37802 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.2.1, 3.3.0 > > > {code:java} > sql("SELECT SUM(`field name`) FROM h2.test.table") > org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'name' expecting (line 1, pos 9) > at > org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212) > at > org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41) > at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544) > at > org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377) > at > org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548) > at > org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467) > at org.antlr.v4.runtime.Parser.match(Parser.java:206) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down
[ https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37802: --- Fix Version/s: 3.2.0 > composite field name like `field name` doesn't work with Aggregate push down > > > Key: SPARK-37802 > URL: https://issues.apache.org/jira/browse/SPARK-37802 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.2.0, 3.3.0 > > > {code:java} > sql("SELECT SUM(`field name`) FROM h2.test.table") > org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'name' expecting (line 1, pos 9) > at > org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212) > at > org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41) > at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544) > at > org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377) > at > org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548) > at > org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467) > at org.antlr.v4.runtime.Parser.match(Parser.java:206) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37527) Translate more standard aggregate functions for pushdown
[ https://issues.apache.org/jira/browse/SPARK-37527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37527. Fix Version/s: 3.3.0 Assignee: jiaan.geng Resolution: Fixed > Translate more standard aggregate functions for pushdown > > > Key: SPARK-37527 > URL: https://issues.apache.org/jira/browse/SPARK-37527 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.3.0 > > > Currently, Spark aggregate pushdown will translate some standard aggregate > functions, so that compile these functions suitable specify database. > After this job, users could override JdbcDialect.compileAggregate to > implement some aggregate functions supported by some database. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down
[ https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-37802: -- Assignee: Huaxin Gao > composite field name like `field name` doesn't work with Aggregate push down > > > Key: SPARK-37802 > URL: https://issues.apache.org/jira/browse/SPARK-37802 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > > {code:java} > sql("SELECT SUM(`field name`) FROM h2.test.table") > org.apache.spark.sql.catalyst.parser.ParseException: > extraneous input 'name' expecting (line 1, pos 9) > at > org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212) > at > org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41) > at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544) > at > org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377) > at > org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548) > at > org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467) > at org.antlr.v4.runtime.Parser.match(Parser.java:206) > at > org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down
Huaxin Gao created SPARK-37802: -- Summary: composite field name like `field name` doesn't work with Aggregate push down Key: SPARK-37802 URL: https://issues.apache.org/jira/browse/SPARK-37802 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Huaxin Gao {code:java} sql("SELECT SUM(`field name`) FROM h2.test.table") org.apache.spark.sql.catalyst.parser.ParseException: extraneous input 'name' expecting (line 1, pos 9) at org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212) at org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41) at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544) at org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377) at org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548) at org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467) at org.antlr.v4.runtime.Parser.match(Parser.java:206) at org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37627) Add sorted column in BucketTransform
Huaxin Gao created SPARK-37627: -- Summary: Add sorted column in BucketTransform Key: SPARK-37627 URL: https://issues.apache.org/jira/browse/SPARK-37627 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao In V1, we can create table with sorted bucket like the following: {code:java} sql("CREATE TABLE tbl(a INT, b INT) USING parquet " + "CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS") {code} However, creating table with sorted bucket in V2 failed with Exception {code:java} org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort columns to a transform. {code} We should be able to create table with sorted bucket in V2. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37545) V2 CreateTableAsSelect command should qualify location
[ https://issues.apache.org/jira/browse/SPARK-37545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37545. Fix Version/s: 3.3.0 Assignee: Terry Kim Resolution: Fixed > V2 CreateTableAsSelect command should qualify location > -- > > Key: SPARK-37545 > URL: https://issues.apache.org/jira/browse/SPARK-37545 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Terry Kim >Assignee: Terry Kim >Priority: Major > Fix For: 3.3.0 > > > V2 CreateTableAsSelect command should qualify location. Currently, > > {code:java} > spark.sql("CREATE TABLE testcat.t USING foo LOCATION '/tmp/foo' AS SELECT id > FROM source") > spark.sql("DESCRIBE EXTENDED testcat.t").show(false) > {code} > displays the location as `/tmp/foo` whereas V1 command displays/stores it as > qualified (`[file:/tmp/foo|file:///tmp/foo]`). > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37546) V2 ReplaceTableAsSelect command should qualify location
Huaxin Gao created SPARK-37546: -- Summary: V2 ReplaceTableAsSelect command should qualify location Key: SPARK-37546 URL: https://issues.apache.org/jira/browse/SPARK-37546 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao V2 ReplaceTableAsSelect command should qualify location. Currently, {code:java} spark.sql("REPLACE TABLE testcat.t USING foo LOCATION '/tmp/foo' AS SELECT id FROM source") spark.sql("DESCRIBE EXTENDED testcat.t").show(false) {code} displays the location as `/tmp/foo` whereas V1 command displays/stores it as qualified (`file:/tmp/foo`). -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37330) Migrate ReplaceTableStatement to v2 command
[ https://issues.apache.org/jira/browse/SPARK-37330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37330. Fix Version/s: 3.3.0 Assignee: dch nguyen Resolution: Fixed > Migrate ReplaceTableStatement to v2 command > --- > > Key: SPARK-37330 > URL: https://issues.apache.org/jira/browse/SPARK-37330 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: dch nguyen >Assignee: dch nguyen >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37523) Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified
[ https://issues.apache.org/jira/browse/SPARK-37523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao updated SPARK-37523: --- Affects Version/s: 3.2.1 > Support optimize skewed partitions in Distribution and Ordering if > numPartitions is not specified > - > > Key: SPARK-37523 > URL: https://issues.apache.org/jira/browse/SPARK-37523 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1, 3.3.0 >Reporter: Huaxin Gao >Priority: Major > > When doing repartition in distribution and sort, if data source requests for > a specific number of partitions, we should not optimize repartition. However, > if data source does not request for a specific number of partitions, Spark > should optimize repartition and split the skewed partitions if necessary. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37523) Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified
Huaxin Gao created SPARK-37523: -- Summary: Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified Key: SPARK-37523 URL: https://issues.apache.org/jira/browse/SPARK-37523 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Huaxin Gao When doing repartition in distribution and sort, if data source requests for a specific number of partitions, we should not optimize repartition. However, if data source does not request for a specific number of partitions, Spark should optimize repartition and split the skewed partitions if necessary. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37496) Migrate ReplaceTableAsSelectStatement to v2 command
[ https://issues.apache.org/jira/browse/SPARK-37496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao reassigned SPARK-37496: -- Assignee: Huaxin Gao > Migrate ReplaceTableAsSelectStatement to v2 command > --- > > Key: SPARK-37496 > URL: https://issues.apache.org/jira/browse/SPARK-37496 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37496) Migrate ReplaceTableAsSelectStatement to v2 command
[ https://issues.apache.org/jira/browse/SPARK-37496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Huaxin Gao resolved SPARK-37496. Resolution: Fixed > Migrate ReplaceTableAsSelectStatement to v2 command > --- > > Key: SPARK-37496 > URL: https://issues.apache.org/jira/browse/SPARK-37496 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org