[jira] [Created] (SPARK-46207) Support MergeInto in DataFrameWriterV2

2023-12-01 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-46207:
--

 Summary: Support MergeInto in DataFrameWriterV2
 Key: SPARK-46207
 URL: https://issues.apache.org/jira/browse/SPARK-46207
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 4.0.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44060) Code-gen for build side outer shuffled hash join

2023-06-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-44060:
--

Assignee: Szehon Ho

> Code-gen for build side outer shuffled hash join
> 
>
> Key: SPARK-44060
> URL: https://issues.apache.org/jira/browse/SPARK-44060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>
> Here, build side outer join means LEFT OUTER join with build left, or RIGHT 
> OUTER join with build right.
> As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 
> (non-codegen build-side outer shuffled hash join), this task is to add 
> code-gen for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44060) Code-gen for build side outer shuffled hash join

2023-06-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44060?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-44060.

Fix Version/s: 3.5.0
   Resolution: Fixed

> Code-gen for build side outer shuffled hash join
> 
>
> Key: SPARK-44060
> URL: https://issues.apache.org/jira/browse/SPARK-44060
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
> Fix For: 3.5.0
>
>
> Here, build side outer join means LEFT OUTER join with build left, or RIGHT 
> OUTER join with build right.
> As a followup for https://github.com/apache/spark/pull/41398/ SPARK-36612 
> (non-codegen build-side outer shuffled hash join), this task is to add 
> code-gen for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44149) Support DataFrame Merge API

2023-06-22 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-44149:
--

 Summary: Support DataFrame Merge API
 Key: SPARK-44149
 URL: https://issues.apache.org/jira/browse/SPARK-44149
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-43417) Improve CBO stats

2023-05-08 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-43417:
--

 Summary: Improve CBO stats
 Key: SPARK-43417
 URL: https://issues.apache.org/jira/browse/SPARK-43417
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.5.0
Reporter: Huaxin Gao


When experimenting the DS V2 Col stats, we identified areas where could 
potentially improve. For instance, we can probably propagate Union NDV, and add 
min/max for the varchar columns.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42470) Remove unused declarations from Hive module

2023-02-17 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-42470.

Fix Version/s: 3.5.0
 Assignee: Yang Jie
   Resolution: Fixed

> Remove unused declarations from Hive module
> ---
>
> Key: SPARK-42470
> URL: https://issues.apache.org/jira/browse/SPARK-42470
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40045) The order of filtering predicates is not reasonable

2023-02-07 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-40045:
--

Assignee: caican

> The order of filtering predicates is not reasonable
> ---
>
> Key: SPARK-40045
> URL: https://issues.apache.org/jira/browse/SPARK-40045
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.2.0, 3.3.0
>Reporter: caican
>Assignee: caican
>Priority: Major
> Fix For: 3.4.0
>
>
> {code:java}
> select id, data FROM testcat.ns1.ns2.table
> where id =2
> and md5(data) = '8cde774d6f7333752ed72cacddb05126'
> and trim(data) = 'a' {code}
> Based on the SQL, we currently get the filters in the following order:
> {code:java}
> // `(md5(cast(data#23 as binary)) = 8cde774d6f7333752ed72cacddb05126)) AND 
> (trim(data#23, None) = a))` comes before `(id#22L = 2)`
> == Physical Plan == *(1) Project [id#22L, data#23]
>  +- *(1) Filter isnotnull(data#23) AND isnotnull(id#22L)) AND 
> (md5(cast(data#23 as binary)) = 8cde774d6f7333752ed72cacddb05126)) AND 
> (trim(data#23, None) = a)) AND (id#22L = 2))
>     +- BatchScan[id#22L, data#23] class 
> org.apache.spark.sql.connector.InMemoryTable$InMemoryBatchScan{code}
> In this predicate order, all data needs to participate in the evaluation, 
> even if some data does not meet the later filtering criteria and it may 
> causes spark tasks to execute slowly.
>  
> So i think that filtering predicates that need to be evaluated should 
> automatically be placed to the far right to avoid data that does not meet the 
> criteria being evaluated.
>  
> As shown below:
> {noformat}
> //  `(id#22L = 2)` comes before `(md5(cast(data#23 as binary)) = 
> 8cde774d6f7333752ed72cacddb05126)) AND (trim(data#23, None) = a))`
> == Physical Plan == *(1) Project [id#22L, data#23]
>  +- *(1) Filter isnotnull(data#23) AND isnotnull(id#22L)) AND (id#22L = 
> 2) AND (md5(cast(data#23 as binary)) = 8cde774d6f7333752ed72cacddb05126)) AND 
> (trim(data#23, None) = a)))
>     +- BatchScan[id#22L, data#23] class 
> org.apache.spark.sql.connector.InMemoryTable$InMemoryBatchScan{noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42188) Force SBT protobuf version to match Maven on branch 3.2 and 3.3

2023-01-25 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-42188.

  Assignee: Steve Vaughan
Resolution: Fixed

> Force SBT protobuf version to match Maven on branch 3.2 and 3.3
> ---
>
> Key: SPARK-42188
> URL: https://issues.apache.org/jira/browse/SPARK-42188
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.1, 3.2.3
>Reporter: Steve Vaughan
>Assignee: Steve Vaughan
>Priority: Major
> Fix For: 3.2.4, 3.3.2
>
>
> Update SparkBuild.scala to force SBT use of protobuf-java to match the Maven 
> version.  The Maven dependencyManagement section forces protobuf-java to use 
> 2.5.0, but SBT is using 3.14.0.
> Snippet from Maven dependency tree
>  
> {noformat}
> [INFO] +- com.google.crypto.tink:tink:jar:1.6.0:compile
> [INFO] |  +- com.google.protobuf:protobuf-java:jar:2.5.0:compile<--- 2.x
> [INFO] |  \- com.google.code.gson:gson:jar:2.8.6:compile{noformat}
>   Snippet from SBT dependency tree
> {noformat}
> [info]   +-com.google.crypto.tink:tink:1.6.0
> [info]   | +-com.google.code.gson:gson:2.8.6
> [info]   | +-com.google.protobuf:protobuf-java:3.14.0   <--- 
> 3.x{noformat}
> The fix is updating SparkBuild.scala just like SPARK-11538 did with guava.  
> In addition we should comment on the need to keep the top-level pom.xml and 
> SparkBuild.scala in sync as was done in SPARK-41247
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42134) Fix getPartitionFiltersAndDataFilters() to handle filters without referenced attributes

2023-01-20 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-42134.

Fix Version/s: 3.3.2
   3.4.0
 Assignee: Peter Toth
   Resolution: Fixed

> Fix getPartitionFiltersAndDataFilters() to handle filters without referenced 
> attributes
> ---
>
> Key: SPARK-42134
> URL: https://issues.apache.org/jira/browse/SPARK-42134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.3.2, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42031) Clean up remove methods that do not need override

2023-01-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42031?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-42031.

Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

> Clean up remove methods that do not need override
> -
>
> Key: SPARK-42031
> URL: https://issues.apache.org/jira/browse/SPARK-42031
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> Java 8 began to provide the default remove method implementation for the 
> `java.util.Iterator` interface.
> https://github.com/openjdk/jdk/blob/9a9add8825a040565051a09010b29b099c2e7d49/jdk/src/share/classes/java/util/Iterator.java#L92-L94



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-41378) Support Column Stats in DS V2

2022-12-04 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-41378:
--

 Summary: Support Column Stats in DS V2
 Key: SPARK-41378
 URL: https://issues.apache.org/jira/browse/SPARK-41378
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40946) Introduce a new DataSource V2 interface SupportsPushDownClusterKeys

2022-10-27 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-40946:
--

 Summary: Introduce a new DataSource V2 interface 
SupportsPushDownClusterKeys
 Key: SPARK-40946
 URL: https://issues.apache.org/jira/browse/SPARK-40946
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


A mix-in interface for ScanBuilder. Data sources can implement this interface 
to push down all the join or aggregate keys to data sources. A return value 
true indicates that data source will return input partitions  following the 
clustering keys. Otherwise, a false return value indicates the data source 
doesn't make such a guarantee, even though it may still report a partitioning 
that may or may not be compatible with the given clustering keys, and it's 
Spark's responsibility to group the input partitions whether it can be applied.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-40429:
---
Description: 

{code:java}
  sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
  sql(s"INSERT INTO $tbl VALUES (1, 'a'), (2, 'b'), (3, 'c')")
  checkAnswer(
spark.table(tbl).select("index", "_partition"),
Seq(Row(0, "3"), Row(0, "2"), Row(0, "1"))
  )
{code}

failed with 
ScalaTestFailureLocation: org.apache.spark.sql.QueryTest at 
(QueryTest.scala:226)
org.scalatest.exceptions.TestFailedException: AttributeSet(id#994L) was not 
empty The optimized logical plan has missing inputs:
RelationV2[index#998, _partition#999] testcat.t


> Only set KeyGroupedPartitioning when the referenced column is in the output
> ---
>
> Key: SPARK-40429
> URL: https://issues.apache.org/jira/browse/SPARK-40429
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.4.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> {code:java}
>   sql(s"CREATE TABLE $tbl (id bigint, data string) PARTITIONED BY (id)")
>   sql(s"INSERT INTO $tbl VALUES (1, 'a'), (2, 'b'), (3, 'c')")
>   checkAnswer(
> spark.table(tbl).select("index", "_partition"),
> Seq(Row(0, "3"), Row(0, "2"), Row(0, "1"))
>   )
> {code}
> failed with 
> ScalaTestFailureLocation: org.apache.spark.sql.QueryTest at 
> (QueryTest.scala:226)
> org.scalatest.exceptions.TestFailedException: AttributeSet(id#994L) was not 
> empty The optimized logical plan has missing inputs:
> RelationV2[index#998, _partition#999] testcat.t



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40429) Only set KeyGroupedPartitioning when the referenced column is in the output

2022-09-14 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-40429:
--

 Summary: Only set KeyGroupedPartitioning when the referenced 
column is in the output
 Key: SPARK-40429
 URL: https://issues.apache.org/jira/browse/SPARK-40429
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Huaxin Gao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40293) Make the V2 table error message more meaningful

2022-08-31 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-40293:
--

 Summary: Make the V2 table error message more meaningful
 Key: SPARK-40293
 URL: https://issues.apache.org/jira/browse/SPARK-40293
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


When V2 catalog is not configured, Spark fails to access/create a table using 
the V2 API and silently falls back to attempting to do the same operation using 
the V1 Api. This happens frequently among the users. We want to have a better 
error message so that users can fix the configuration/usage issue by themselves.





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40113) Reactor ParquetScanBuilder DataSourceV2 interface implementation

2022-08-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-40113.

Fix Version/s: 3.4.0
 Assignee: miracle
   Resolution: Fixed

> Reactor ParquetScanBuilder DataSourceV2 interface implementation
> 
>
> Key: SPARK-40113
> URL: https://issues.apache.org/jira/browse/SPARK-40113
> Project: Spark
>  Issue Type: Improvement
>  Components: Optimizer
>Affects Versions: 3.3.0
>Reporter: Mars
>Assignee: miracle
>Priority: Minor
> Fix For: 3.4.0
>
>
> Now `FileScanBuilder` interface is not fully implemented in 
> `ParquetScanBuilder` like 
> `OrcScanBuilder`,`AvroScanBuilder`,`CSVScanBuilder`
> In order to unify the logic of the code and make it clearer, this part of the 
> implementation is unified.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-15 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40064?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-40064.

Fix Version/s: 3.4.0
 Assignee: Huaxin Gao
   Resolution: Fixed

> Use V2 Filter in SupportsOverwrite
> --
>
> Key: SPARK-40064
> URL: https://issues.apache.org/jira/browse/SPARK-40064
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
>  Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40064) Use V2 Filter in SupportsOverwrite

2022-08-12 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-40064:
--

 Summary: Use V2 Filter in SupportsOverwrite
 Key: SPARK-40064
 URL: https://issues.apache.org/jira/browse/SPARK-40064
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


 Add V2 Filter support in SupportsOverwrite



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39528) Use V2 Filter in SupportsRuntimeFiltering

2022-08-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39528:
---
Parent: SPARK-36555
Issue Type: Sub-task  (was: Improvement)

> Use V2 Filter in SupportsRuntimeFiltering
> -
>
> Key: SPARK-39528
> URL: https://issues.apache.org/jira/browse/SPARK-39528
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, SupportsRuntimeFiltering uses v1 filter. We should use v2 filter 
> instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39966) Use V2 Filter in SupportsDelete

2022-08-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39966:
---
Parent: SPARK-36555
Issue Type: Sub-task  (was: Improvement)

> Use V2 Filter in SupportsDelete
> ---
>
> Key: SPARK-39966
> URL: https://issues.apache.org/jira/browse/SPARK-39966
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39966) Use V2 Filter in SupportsDelete

2022-08-11 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39966?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39966.

Fix Version/s: 3.4.0
 Assignee: Huaxin Gao  (was: Apache Spark)
   Resolution: Fixed

> Use V2 Filter in SupportsDelete
> ---
>
> Key: SPARK-39966
> URL: https://issues.apache.org/jira/browse/SPARK-39966
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39966) Use V2 Filter in SupportsDelete

2022-08-03 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39966:
--

 Summary: Use V2 Filter in SupportsDelete
 Key: SPARK-39966
 URL: https://issues.apache.org/jira/browse/SPARK-39966
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


Spark currently uses V1 Filter in SupportsDelete. Add V2 Filter support.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39914) Add DS V2 Filter to V1 Filter conversion

2022-08-01 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39914.

Fix Version/s: 3.4.0
 Assignee: Huaxin Gao
   Resolution: Fixed

> Add DS V2 Filter to V1 Filter conversion
> 
>
> Key: SPARK-39914
> URL: https://issues.apache.org/jira/browse/SPARK-39914
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.4.0
>
>
> add util method to convert DS V2 Filter to V1 Filter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39909) Organize the check of push down information for JDBCV2Suite

2022-07-29 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39909.

Fix Version/s: 3.4.0
   Resolution: Fixed

> Organize the check of push down information for JDBCV2Suite
> ---
>
> Key: SPARK-39909
> URL: https://issues.apache.org/jira/browse/SPARK-39909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: miracle
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, JDBCV2Suite have many test cases check the push-down information 
> looks not clean.
> For example,
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
> {code}
> If we change it to below looks better.
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
>   "PushedLimit: LIMIT 1")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-39909) Organize the check of push down information for JDBCV2Suite

2022-07-29 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-39909:
--

Assignee: miracle

> Organize the check of push down information for JDBCV2Suite
> ---
>
> Key: SPARK-39909
> URL: https://issues.apache.org/jira/browse/SPARK-39909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: miracle
>Priority: Major
>
> Currently, JDBCV2Suite have many test cases check the push-down information 
> looks not clean.
> For example,
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
> {code}
> If we change it to below looks better.
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
>   "PushedLimit: LIMIT 1")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-39909) Organize the check of push down information for JDBCV2Suite

2022-07-29 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-39909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17573091#comment-17573091
 ] 

Huaxin Gao commented on SPARK-39909:


Hi Chen Liang, do you have a jira id?

> Organize the check of push down information for JDBCV2Suite
> ---
>
> Key: SPARK-39909
> URL: https://issues.apache.org/jira/browse/SPARK-39909
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, JDBCV2Suite have many test cases check the push-down information 
> looks not clean.
> For example,
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1], PushedLimit: LIMIT 1, ")
> {code}
> If we change it to below looks better.
> {code:java}
> checkPushedInfo(df,
>   "PushedFilters: [DEPT IS NOT NULL, DEPT > 1]",
>   "PushedLimit: LIMIT 1")
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39914) Add DS V2 Filter to V1 Filter conversion

2022-07-28 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39914:
---
Summary: Add DS V2 Filter to V1 Filter conversion  (was: Add DS V2 Filter 
to V2 Filter conversion)

> Add DS V2 Filter to V1 Filter conversion
> 
>
> Key: SPARK-39914
> URL: https://issues.apache.org/jira/browse/SPARK-39914
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> add util method to convert DS V2 Filter to V1 Filter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39914) Add DS V2 Filter to V2 Filter conversion

2022-07-28 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39914:
--

 Summary: Add DS V2 Filter to V2 Filter conversion
 Key: SPARK-39914
 URL: https://issues.apache.org/jira/browse/SPARK-39914
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


add util method to convert DS V2 Filter to V1 Filter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39857) V2ExpressionBuilder uses the wrong LiteralValue data type for In predicate

2022-07-24 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39857:
--

 Summary: V2ExpressionBuilder uses the wrong LiteralValue data type 
for In predicate
 Key: SPARK-39857
 URL: https://issues.apache.org/jira/browse/SPARK-39857
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


When building V2 In Predicate in V2ExpressionBuilder, InSet.dataType (which is 
BooleanType) is used to build the LiteralValue, InSet.child.dataType should be 
used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39812) Simplify code to construct AggregateExpression with toAggregateExpression

2022-07-23 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39812.

Fix Version/s: 3.4.0
 Assignee: jiaan.geng
   Resolution: Fixed

> Simplify code to construct AggregateExpression with toAggregateExpression
> -
>
> Key: SPARK-39812
> URL: https://issues.apache.org/jira/browse/SPARK-39812
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently, Spark provides the toAggregateExpression to simplify the code.
> But developers still use AggregateExpression.apply in many places.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39784) Put Literal values on the right side of the data source filter after translating Catalyst Expression to data source filter

2022-07-22 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39784?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39784.

Fix Version/s: 3.4.0
 Assignee: Huaxin Gao
   Resolution: Fixed

> Put Literal values on the right side of the data source filter after 
> translating Catalyst Expression to data source filter
> --
>
> Key: SPARK-39784
> URL: https://issues.apache.org/jira/browse/SPARK-39784
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.4.0
>
>
> After translating Expression to data source filter, we want the Literal value 
> to be on the right side of the filter.
> For example: 1 > a
> After translate to Predicate, we want to have a < 1



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39759) Implement listIndexes in JDBC (H2 dialect)

2022-07-19 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39759?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39759.

  Assignee: BingKun Pan
Resolution: Fixed

> Implement listIndexes in JDBC (H2 dialect)
> --
>
> Key: SPARK-39759
> URL: https://issues.apache.org/jira/browse/SPARK-39759
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39784) Literal values should be on the right side of the data source filter

2022-07-14 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39784:
--

 Summary: Literal values should be on the right side of the data 
source filter
 Key: SPARK-39784
 URL: https://issues.apache.org/jira/browse/SPARK-39784
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao


After translating Expression to data source filter, we want the Literal value 
to be on the right side of the filter.
For example: 1 > a
After translate to Predicate, we want to have a < 1




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39704) Implement createIndex & dropIndex & IndexExists in JDBC (H2 dialect)

2022-07-13 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39704.

Fix Version/s: 3.4.0
 Assignee: BingKun Pan
   Resolution: Fixed

> Implement createIndex & dropIndex & IndexExists in JDBC (H2 dialect)
> 
>
> Key: SPARK-39704
> URL: https://issues.apache.org/jira/browse/SPARK-39704
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39711) Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging

2022-07-11 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39711.

Fix Version/s: 3.4.0
 Assignee: BingKun Pan
   Resolution: Fixed

> Remove redundant trait: BeforeAndAfterAll & BeforeAndAfterEach & Logging
> 
>
> Key: SPARK-39711
> URL: https://issues.apache.org/jira/browse/SPARK-39711
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 3.4.0
>
>
> SparkFunSuite declare as follow:
> {code:java}
> abstract class SparkFunSuite
> extends AnyFunSuite
> with BeforeAndAfterAll
> with BeforeAndAfterEach
> with ThreadAudit
> with Logging
> {code}
> some suite extends SparkFunSuite and meanwhile with BeforeAndAfterAll or 
> BeforeAndAfterEach or Logging, it is redundant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39724) Remove duplicate `.setAccessible(true)` in `kvstore.KVTypeInfo`

2022-07-09 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39724?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39724.

Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

> Remove duplicate `.setAccessible(true)`  in `kvstore.KVTypeInfo`
> 
>
> Key: SPARK-39724
> URL: https://issues.apache.org/jira/browse/SPARK-39724
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> {code:java}
>     for (Method m : type.getDeclaredMethods()) {
>       KVIndex idx = m.getAnnotation(KVIndex.class);
>       if (idx != null) {
>         checkIndex(idx, indices);
>         Preconditions.checkArgument(m.getParameterTypes().length == 0,
>           "Annotated method %s::%s should not have any parameters.", 
> type.getName(), m.getName());
>         m.setAccessible(true);
>         indices.put(idx.value(), idx);
>         m.setAccessible(true);
>         accessors.put(idx.value(), new MethodAccessor(m));
>       } {code}
> The above code has duplicate calls to `.setAccessible(true)`.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39633) Dataframe options for time travel via `timestampAsOf` should respect both formats of specifying timestamp

2022-06-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39633.

Fix Version/s: 3.4.0
   3.3.0
 Assignee: Prashant Singh
   Resolution: Fixed

> Dataframe options for time travel via `timestampAsOf` should respect both 
> formats of specifying timestamp
> -
>
> Key: SPARK-39633
> URL: https://issues.apache.org/jira/browse/SPARK-39633
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Assignee: Prashant Singh
>Priority: Minor
> Fix For: 3.4.0, 3.3.0
>
>
> presently spark sql query for time travel like :
> {{SELECT * from \{table} TIMESTAMP AS OF 1548751078 }}
> works correctly, which is what is specified in sql grammar as well (((FOR 
> SYSTEM_VERSION) | VERSION) AS OF version=(INTEGER_VALUE | STRING)),  but when 
> trying to do the same via dataframe option `timestampAsOf` the code fails 
> with :
> {quote}[info]   org.apache.spark.sql.AnalysisException: '1548751078' is not a 
> valid timestamp expression for time travel.
> [info]   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.invalidTimestampExprForTimeTravel(QueryCompilationErrors.scala:2413)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.TimeTravelSpec$.create(TimeTravelSpec.scala:55)
> [info]   at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:128)
> [info]   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)
> [info]   at scala.Option.flatMap(Option.scala:271)
> [info]   at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
> [info]   at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.load(SupportsCatalogOptionsSuite.scala:365)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$33(SupportsCatalogOptionsSuite.scala:329)
> [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:133)
> [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:158)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$30(SupportsCatalogOptionsSuite.scala:329)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:306)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:304)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.withTable(SupportsCatalogOptionsSuite.scala:44)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$26(SupportsCatalogOptionsSuite.scala:309)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39633) Dataframe options for time travel via `timestampAsOf` should respect both formats of specifying timestamp

2022-06-30 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39633:
---
Issue Type: Improvement  (was: Bug)

> Dataframe options for time travel via `timestampAsOf` should respect both 
> formats of specifying timestamp
> -
>
> Key: SPARK-39633
> URL: https://issues.apache.org/jira/browse/SPARK-39633
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Priority: Minor
>
> presently spark sql query for time travel like :
> {{SELECT * from \{table} TIMESTAMP AS OF 1548751078 }}
> works correctly, which is what is specified in sql grammar as well (((FOR 
> SYSTEM_VERSION) | VERSION) AS OF version=(INTEGER_VALUE | STRING)),  but when 
> trying to do the same via dataframe option `timestampAsOf` the code fails 
> with :
> {quote}[info]   org.apache.spark.sql.AnalysisException: '1548751078' is not a 
> valid timestamp expression for time travel.
> [info]   at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.invalidTimestampExprForTimeTravel(QueryCompilationErrors.scala:2413)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.TimeTravelSpec$.create(TimeTravelSpec.scala:55)
> [info]   at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceV2Utils$.loadV2Source(DataSourceV2Utils.scala:128)
> [info]   at 
> org.apache.spark.sql.DataFrameReader.$anonfun$load$1(DataFrameReader.scala:209)
> [info]   at scala.Option.flatMap(Option.scala:271)
> [info]   at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:207)
> [info]   at 
> org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:171)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.load(SupportsCatalogOptionsSuite.scala:365)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$33(SupportsCatalogOptionsSuite.scala:329)
> [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:133)
> [info]   at org.apache.spark.sql.QueryTest.checkAnswer(QueryTest.scala:158)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$30(SupportsCatalogOptionsSuite.scala:329)
> [info]   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
> [info]   at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1490)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withTable(SQLTestUtils.scala:306)
> [info]   at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withTable$(SQLTestUtils.scala:304)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.withTable(SupportsCatalogOptionsSuite.scala:44)
> [info]   at 
> org.apache.spark.sql.connector.SupportsCatalogOptionsSuite.$anonfun$new$26(SupportsCatalogOptionsSuite.scala:309)
> [info]   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
> [info]   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
> [info]   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:22)
> [info]   at org.scalatest.Transformer.apply(Transformer.scala:20)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike$$anon$1.apply(AnyFunSuiteLike.scala:190)
> [info]   at 
> org.apache.spark.SparkFunSuite.withFixture(SparkFunSuite.scala:203)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.invokeWithFixture$1(AnyFunSuiteLike.scala:188)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.$anonfun$runTest$1(AnyFunSuiteLike.scala:200)
> [info]   at org.scalatest.SuperEngine.runTestImpl(Engine.scala:306)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest(AnyFunSuiteLike.scala:200)
> [info]   at 
> org.scalatest.funsuite.AnyFunSuiteLike.runTest$(AnyFunSuiteLike.scala:182)
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39528) Use V2 Filter in SupportsRuntimeFiltering

2022-06-20 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39528:
--

 Summary: Use V2 Filter in SupportsRuntimeFiltering
 Key: SPARK-39528
 URL: https://issues.apache.org/jira/browse/SPARK-39528
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


Currently, SupportsRuntimeFiltering uses v1 filter. We should use v2 filter 
instead.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39417) Handle Null partition values in PartitioningUtils

2022-06-09 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39417.

Fix Version/s: 3.3.0
   3.4.0
   Resolution: Fixed

> Handle Null partition values in PartitioningUtils
> -
>
> Key: SPARK-39417
> URL: https://issues.apache.org/jira/browse/SPARK-39417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prashant Singh
>Assignee: Prashant Singh
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> partitions with null values we get a NPE on partition discovery, earlier we 
> use to get `DEFAULT_PARTITION_NAME`
>  
> {quote} [info]   java.lang.NullPointerException:
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.removeLeadingZerosFromNumberTypePartition(PartitioningUtils.scala:362)
> [info]   at 
> org.apache.spark.sql.execution.datasources.PartitioningUtils$.$anonfun$getPathFragment$1(PartitioningUtils.scala:355)
> [info]   at 
> scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:286)
> [info]   at scala.collection.Iterator.foreach(Iterator.scala:943)
> [info]   at scala.collection.Iterator.foreach$(Iterator.scala:943){quote}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39393) Parquet data source only supports push-down predicate filters for non-repeated primitive types

2022-06-08 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39393.

Fix Version/s: 3.1.3
   3.3.0
   3.2.2
   3.4.0
 Assignee: Amin Borjian
   Resolution: Fixed

> Parquet data source only supports push-down predicate filters for 
> non-repeated primitive types
> --
>
> Key: SPARK-39393
> URL: https://issues.apache.org/jira/browse/SPARK-39393
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1
>Reporter: Amin Borjian
>Assignee: Amin Borjian
>Priority: Major
>  Labels: parquet
> Fix For: 3.1.3, 3.3.0, 3.2.2, 3.4.0
>
>
> I use an example to illustrate the problem. The reason for the problem and 
> the problem-solving approach are stated below.
> Assume follow Protocol buffer schema:
> {code:java}
> message Model {
>  string name = 1;
>  repeated string keywords = 2;
> }
> {code}
> Suppose a parquet file is created from a set of records in the above format 
> with the help of the {{parquet-protobuf}} library.
> Using Spark version 3.0.2 or older, we could run the following query using 
> {{{}spark-shell{}}}:
> {code:java}
> val data = spark.read.parquet("/path/to/parquet")
> data.registerTempTable("models")
> spark.sql("select * from models where array_contains(keywords, 
> 'X')").show(false)
> {code}
> But after updating Spark, we get the following error:
> {code:java}
> Caused by: java.lang.IllegalArgumentException: FilterPredicates do not 
> currently support repeated columns. Column keywords is repeated.
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumn(SchemaCompatibilityValidator.java:176)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validateColumnFilterPredicate(SchemaCompatibilityValidator.java:149)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:89)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.visit(SchemaCompatibilityValidator.java:56)
>   at 
> org.apache.parquet.filter2.predicate.Operators$NotEq.accept(Operators.java:192)
>   at 
> org.apache.parquet.filter2.predicate.SchemaCompatibilityValidator.validate(SchemaCompatibilityValidator.java:61)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:95)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.visit(RowGroupFilter.java:45)
>   at 
> org.apache.parquet.filter2.compat.FilterCompat$FilterPredicateCompat.accept(FilterCompat.java:149)
>   at 
> org.apache.parquet.filter2.compat.RowGroupFilter.filterRowGroups(RowGroupFilter.java:72)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.filterRowGroups(ParquetFileReader.java:870)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.(ParquetFileReader.java:789)
>   at 
> org.apache.parquet.hadoop.ParquetFileReader.open(ParquetFileReader.java:657)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initializeInternalReader(ParquetRecordReader.java:162)
>   at 
> org.apache.parquet.hadoop.ParquetRecordReader.initialize(ParquetRecordReader.java:140)
>   at 
> org.apache.spark.sql.execution.datasources.parquet.ParquetFileFormat.$anonfun$buildReaderWithPartitionValues$2(ParquetFileFormat.scala:373)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:127)
> ...
> {code}
> At first it seems the problem is the parquet library. But in fact, our 
> problem is because of this line that has been around since 2014 (based on Git 
> history):
> [Parquet Schema Compatibility 
> Validator|https://github.com/apache/parquet-mr/blob/master/parquet-column/src/main/java/org/apache/parquet/filter2/predicate/SchemaCompatibilityValidator.java#L194]
> After some check, I notice that the cause of the problem is due to a change 
> in the data filtering conditions:
> {code:java}
> spark.sql("select * from log where array_contains(keywords, 
> 'X')").explain(true);
> // Spark 3.0.2 and older
> == Physical Plan ==
> ... 
> +- FileScan parquet [link#0,keywords#1]
>   DataFilters: [array_contains(keywords#1, Google)]
>   PushedFilters: []
>   ...
> // Spark 3.1.0 and newer
> == Physical Plan == ... 
> +- FileScan parquet [link#0,keywords#1]
>   DataFilters: [isnotnull(keywords#1),  array_contains(keywords#1, Google)]
>   PushedFilters: [IsNotNull(keywords)]
>   ...{code}
> It's good that the filtering section has become smarter. Unfortunately, due 
> to unfamiliarity with code base, I could not find the exact location of the 
> change and 

[jira] [Resolved] (SPARK-39413) Capitalize sql keywords in JDBCV2Suite

2022-06-08 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39413.

Fix Version/s: 3.4.0
 Assignee: jiaan.geng
   Resolution: Fixed

> Capitalize sql keywords in JDBCV2Suite
> --
>
> Key: SPARK-39413
> URL: https://issues.apache.org/jira/browse/SPARK-39413
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> JDBCV2Suite exists some test case which uses sql keywords without capitalized.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39390) Hide and optimize `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` fron INFO log

2022-06-06 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39390?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39390.

Fix Version/s: 3.4.0
 Assignee: qian
   Resolution: Fixed

> Hide and optimize `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` 
> fron INFO log
> ---
>
> Key: SPARK-39390
> URL: https://issues.apache.org/jira/browse/SPARK-39390
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: qian
>Assignee: qian
>Priority: Minor
> Fix For: 3.4.0
>
>
> This issue aims to hide and optimize 
> `viewAcls`/`viewAclsGroups`/`modifyAcls`/`modifyAclsGroups` fron INFO log.
> {code:java}
> 2022-06-02 22:02:48.328 - stderr> 22/06/03 05:02:48 INFO SecurityManager: 
> SecurityManager: authentication disabled; ui acls disabled; users  with view 
> permissions: Set(root); groups with view permissions: Set(); users  with 
> modify permissions: Set(root); groups with modify permissions: Set(){code}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39312) Use Parquet in predicate for Spark In filter

2022-05-26 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39312:
--

 Summary: Use Parquet in predicate for Spark In filter
 Key: SPARK-39312
 URL: https://issues.apache.org/jira/browse/SPARK-39312
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4
Reporter: Huaxin Gao


Since now Parquet supports its native in predicate, we want to simplify the 
current In predicate filter pushdown using Parquet's native in predicate.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37219) support AS OF syntax

2022-05-16 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37219?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17537627#comment-17537627
 ] 

Huaxin Gao commented on SPARK-37219:


Correct.

> support AS OF syntax
> 
>
> Key: SPARK-37219
> URL: https://issues.apache.org/jira/browse/SPARK-37219
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> https://docs.databricks.com/delta/quick-start.html#query-an-earlier-version-of-the-table-time-travel
> Delta Lake time travel allows user to query an older snapshot of a Delta 
> table. To query an older version of a table, user needs to specify a version 
> or timestamp in a SELECT statement using AS OF syntax as the follows
> SELECT * FROM default.people10m VERSION AS OF 0;
> SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58';
> This ticket is opened to add AS OF syntax in Spark



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-37219) support AS OF syntax

2022-05-16 Thread Huaxin Gao (Jira)


[ https://issues.apache.org/jira/browse/SPARK-37219 ]


Huaxin Gao deleted comment on SPARK-37219:


was (Author: JIRAUSER284812):
h2. Bagaimana Dengan Sistem Keposlot?

[Keposlot|https://165.22.216.152/] Menjalankan sistem dengan 1 user id dapat 
bermain semua permainan yang tersedia. Adapun Keposlot menyediakan transaksi 
melalu bank transfer dan ewallet lainnya. Anda dapat mencari situs keposlot 
melalui google dengan cari ketik pencarian keposlot maka akan muncul dan 
terhubung dengan web situs resmi Keposlot.
h2. Slot Online Terpercaya Di Keposlot?

Keposlot adalah situs resmi [slot online terpercaya|https://165.22.216.152/] di 
indonesia. Keposlot memiliki permainan slot terbanyak yaitu [Pragmatic 
Play|https://165.22.216.152/] dan slot online lainnya yang selalu menciptakan 
permainan baru. Keposlot dipercaya para player karena sistem deposit yang 
sangat cepat diproses, Adapun Withdraw berapa pun akan diproses dengan cepat. 
Maka dari itu situs ini dipercaya dan tidak perlu dicemas kan dalam masalah 
withdraw. Segala keamanan data akan di simpan sedemikian kita menjaga rahasia 
para pemain.

> support AS OF syntax
> 
>
> Key: SPARK-37219
> URL: https://issues.apache.org/jira/browse/SPARK-37219
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> https://docs.databricks.com/delta/quick-start.html#query-an-earlier-version-of-the-table-time-travel
> Delta Lake time travel allows user to query an older snapshot of a Delta 
> table. To query an older version of a table, user needs to specify a version 
> or timestamp in a SELECT statement using AS OF syntax as the follows
> SELECT * FROM default.people10m VERSION AS OF 0;
> SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58';
> This ticket is opened to add AS OF syntax in Spark



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-37219) support AS OF syntax

2022-05-16 Thread Huaxin Gao (Jira)


[ https://issues.apache.org/jira/browse/SPARK-37219 ]


Huaxin Gao deleted comment on SPARK-37219:


was (Author: JIRAUSER284812):
Situs Resmi Judi Togel Online Terlengkap [Acctoto|https://katie-cassidy.us/] 
adalah situs togel online terlengkap di indonesia. Situs acctoto merupakan 
situs [Tebak Angka|https://tsagaandarium.org/] terlengkap dalam bermain judi 
[togel online|https://katie-cassidy.us/]. Acctoto adalah bandar togel online 
terpercaya yang menyediakan permainan tebak angka atau togel online yang 
dikeluarkan togel online Singapure, togel online Cambodia, togel online Sydney 
dan togel online Hongkongpools. Sistem Referal Untuk Member Sistem Referral 
Acctoto juga memberikan untuk anda yang menginginkan pendapatan tambahan setiap 
harinya. Daftar dan bergabung sekarang juga di Acctoto bandar togel online 
terpercaya di Indonesia. Maka anda bisa merecomendasikan website kita yang 
untuk mendapatkan tambahan nilai saldo anda dan bisa di withdrawkan kapan waktu 
yang anda inginkan. Situs Acctoto Adalah [Agen Hoki Togel 
Online|https://katie-cassidy.us/] karena tidak ada [bandar Togel 
Indonesia|https://katie-cassidy.us/] yang memberikan kenyaman anda dalam 
bermainan di Acctoto. Untuk hadiah yang diberikan kepada member sangatlah 
berkualitas dan memuaskan bagi para member sejati kita.

> support AS OF syntax
> 
>
> Key: SPARK-37219
> URL: https://issues.apache.org/jira/browse/SPARK-37219
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.3.0
>
>
> https://docs.databricks.com/delta/quick-start.html#query-an-earlier-version-of-the-table-time-travel
> Delta Lake time travel allows user to query an older snapshot of a Delta 
> table. To query an older version of a table, user needs to specify a version 
> or timestamp in a SELECT statement using AS OF syntax as the follows
> SELECT * FROM default.people10m VERSION AS OF 0;
> SELECT * FROM default.people10m TIMESTAMP AS OF '2019-01-29 00:37:58';
> This ticket is opened to add AS OF syntax in Spark



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39156) Remove ParquetLogRedirector usage from ParquetFileFormat

2022-05-15 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39156?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39156.

Fix Version/s: 3.4.0
 Assignee: Yang Jie
   Resolution: Fixed

> Remove ParquetLogRedirector usage from ParquetFileFormat
> 
>
> Key: SPARK-39156
> URL: https://issues.apache.org/jira/browse/SPARK-39156
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> Spark only uses parquet 1.12.2 and no longer relies on parquet version 1.6, 
> It seems that the ParquetLogRedirector is no longer needed



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-39162) Jdbc dialect should decide which function could be pushed down.

2022-05-14 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39162?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39162.

Fix Version/s: 3.4.0
 Assignee: jiaan.geng
   Resolution: Fixed

> Jdbc dialect should decide which function could be pushed down.
> ---
>
> Key: SPARK-39162
> URL: https://issues.apache.org/jira/browse/SPARK-39162
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.0
>
>
> Regardless of whether the functions are ANSI or not, most databases are 
> actually unsure of their support.
> So we should add a new API into JdbcDialect so that Jdbc dialect could decide 
> which function could be pushed down.



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37259) JDBC read is always going to wrap the query in a select statement

2022-05-06 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37259.

Fix Version/s: 3.4.0
 Assignee: Peter Toth
   Resolution: Fixed

> JDBC read is always going to wrap the query in a select statement
> -
>
> Key: SPARK-37259
> URL: https://issues.apache.org/jira/browse/SPARK-37259
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: Kevin Appel
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.4.0
>
>
> The read jdbc is wrapping the query it sends to the database server inside a 
> select statement and there is no way to override this currently.
> Initially I ran into this issue when trying to run a CTE query against SQL 
> server and it fails, the details of the failure is in these cases:
> [https://github.com/microsoft/mssql-jdbc/issues/1340]
> [https://github.com/microsoft/mssql-jdbc/issues/1657]
> [https://github.com/microsoft/sql-spark-connector/issues/147]
> https://issues.apache.org/jira/browse/SPARK-32825
> https://issues.apache.org/jira/browse/SPARK-34928
> I started to patch the code to get the query to run and ran into a few 
> different items, if there is a way to add these features to allow this code 
> path to run, this would be extremely helpful to running these type of edge 
> case queries.  These are basic examples here the actual queries are much more 
> complex and would require significant time to rewrite.
> Inside JDBCOptions.scala the query is being set to either, using the dbtable 
> this allows the query to be passed without modification
>  
> {code:java}
> name.trim
> or
> s"(${subquery}) SPARK_GEN_SUBQ_${curId.getAndIncrement()}"
> {code}
>  
> Inside JDBCRelation.scala this is going to try to get the schema for this 
> query, and this ends up running dialect.getSchemaQuery which is doing:
> {code:java}
> s"SELECT * FROM $table WHERE 1=0"{code}
> Overriding the dialect here and initially just passing back the $table gets 
> passed here and to the next issue which is in the compute function in 
> JDBCRDD.scala
>  
> {code:java}
> val sqlText = s"SELECT $columnList FROM ${options.tableOrQuery} 
> $myTableSampleClause" + s" $myWhereClause $getGroupByClause $myLimitClause"
>  
> {code}
>  
> For these two queries, about a CTE query and using temp tables, finding out 
> the schema is difficult without actually running the query and for the temp 
> table if you run it in the schema check that will have the table now exist 
> and fail when it runs the actual query.
>  
> The way I patched these is by doing these two items:
> JDBCRDD.scala (compute)
>  
> {code:java}
>     val runQueryAsIs = options.parameters.getOrElse("runQueryAsIs", 
> "false").toBoolean
>     val sqlText = if (runQueryAsIs) {
>       s"${options.tableOrQuery}"
>     } else {
>       s"SELECT $columnList FROM ${options.tableOrQuery} $myWhereClause"
>     }
> {code}
> JDBCRelation.scala (getSchema)
> {code:java}
> val useCustomSchema = jdbcOptions.parameters.getOrElse("useCustomSchema", 
> "false").toBoolean
>     if (useCustomSchema) {
>       val myCustomSchema = jdbcOptions.parameters.getOrElse("customSchema", 
> "").toString
>       val newSchema = CatalystSqlParser.parseTableSchema(myCustomSchema)
>       logInfo(s"Going to return the new $newSchema because useCustomSchema is 
> $useCustomSchema and passed in $myCustomSchema")
>       newSchema
>     } else {
>       val tableSchema = JDBCRDD.resolveTable(jdbcOptions)
>       jdbcOptions.customSchema match {
>       case Some(customSchema) => JdbcUtils.getCustomSchema(
>         tableSchema, customSchema, resolver)
>       case None => tableSchema
>       }
>     }{code}
>  
> This is allowing the query to run as is, by using the dbtable option and then 
> provide a custom schema that will bypass the dialect schema check
>  
> Test queries
>  
> {code:java}
> query1 = """ 
> SELECT 1 as DummyCOL
> """
> query2 = """ 
> WITH DummyCTE AS
> (
> SELECT 1 as DummyCOL
> )
> SELECT *
> FROM DummyCTE
> """
> query3 = """
> (SELECT *
> INTO #Temp1a
> FROM
> (SELECT @@VERSION as version) data
> )
> (SELECT *
> FROM
> #Temp1a)
> """
> {code}
>  
> Test schema
>  
> {code:java}
> schema1 = """
> DummyXCOL INT
> """
> schema2 = """
> DummyXCOL STRING
> """
> {code}
>  
> Test code
>  
> {code:java}
> jdbcDFWorking = (
>     spark.read.format("jdbc")
>     .option("url", 
> f"jdbc:sqlserver://{server}:{port};databaseName={database};")
>     .option("user", user)
>     .option("password", password)
>     .option("driver", "com.microsoft.sqlserver.jdbc.SQLServerDriver")
>     .option("dbtable", queryx)
>     .option("customSchema", schemax)
>     .option("useCustomSchema", "true")
>     .option("runQueryAsIs", "true")
>     

[jira] [Resolved] (SPARK-39116) Replcace double negation in exists with forall

2022-05-06 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-39116.

Fix Version/s: 3.4.0
 Assignee: Yang Jie  (was: Apache Spark)
   Resolution: Fixed

> Replcace double negation in exists with forall
> --
>
> Key: SPARK-39116
> URL: https://issues.apache.org/jira/browse/SPARK-39116
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.4.0
>
>
> Some code in Spark as follows:
> {code:java}
> !Seq(1, 2).exists(x => !condition(x)) {code}
> can replace with 
> {code:java}
> Seq(1, 2).forall(x => condition(x)) {code}
> for code simplification
>  



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-39011) V2 Filter to ORC Predicate support

2022-04-25 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-39011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-39011:
---
Summary: V2 Filter to ORC Predicate support  (was: V2 Filter to ORC Filter 
support)

> V2 Filter to ORC Predicate support
> --
>
> Key: SPARK-39011
> URL: https://issues.apache.org/jira/browse/SPARK-39011
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4
>Reporter: Huaxin Gao
>Priority: Major
>
> add V2 filter to ORC predicate support



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39011) V2 Filter to ORC Filter support

2022-04-25 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39011:
--

 Summary: V2 Filter to ORC Filter support
 Key: SPARK-39011
 URL: https://issues.apache.org/jira/browse/SPARK-39011
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4
Reporter: Huaxin Gao


add V2 filter to ORC predicate support



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-39010) V2 Filter to Parquet Predicate support

2022-04-25 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-39010:
--

 Summary: V2 Filter to Parquet Predicate support
 Key: SPARK-39010
 URL: https://issues.apache.org/jira/browse/SPARK-39010
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.4
Reporter: Huaxin Gao


Add support for V2 Filter to Parquet Predicate



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38950) Return Array of Predicate for SupportsPushDownCatalystFilters.pushedFilters

2022-04-19 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-38950:
--

 Summary: Return Array of Predicate for 
SupportsPushDownCatalystFilters.pushedFilters
 Key: SPARK-38950
 URL: https://issues.apache.org/jira/browse/SPARK-38950
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Huaxin Gao


in SupportsPushDownCatalystFilters, change


{code:java}
def pushedFilters: Array[Filter]
{code}


to


{code:java}
def pushedFilters: Array[Predicate]
{code}




--
This message was sent by Atlassian Jira
(v8.20.7#820007)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38865) Update document of JDBC options for pushDownAggregate and pushDownLimit

2022-04-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38865?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-38865.

Fix Version/s: 3.3.0
   3.4.0
 Assignee: jiaan.geng
   Resolution: Fixed

> Update document of JDBC options for pushDownAggregate and pushDownLimit
> ---
>
> Key: SPARK-38865
> URL: https://issues.apache.org/jira/browse/SPARK-38865
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0, 3.4.0
>
>
> Because the DS v2 pushdown framework refactored, we need to add more doc in 
> sql-data-sources-jdbc.md to reflect the new changes.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38825) Add a test to cover parquet notIn filter

2022-04-07 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-38825:
--

 Summary: Add a test to cover parquet notIn filter
 Key: SPARK-38825
 URL: https://issues.apache.org/jira/browse/SPARK-38825
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Huaxin Gao


Add a test to cover parquet filter notIn



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38779) Unify the pushed operator checking between FileSource test suite and JDBC test suite

2022-04-03 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-38779:
--

 Summary: Unify the pushed operator checking between FileSource 
test suite and JDBC test suite
 Key: SPARK-38779
 URL: https://issues.apache.org/jira/browse/SPARK-38779
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0, 3.4.0
Reporter: Huaxin Gao


In JDBCV2Suite, we use checkPushedInfo to check the pushed down operators. Will 
do the same for FileSourceAggregatePushDownSuite


{code:java}
  private def checkPushedInfo(df: DataFrame, expectedPlanFragment: String): 
Unit = {
df.queryExecution.optimizedPlan.collect {
  case _: DataSourceV2ScanRelation =>
checkKeywordsExistsInExplain(df, expectedPlanFragment)
}
  }
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38643) Validate input dataset of ml.regression

2022-03-25 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-38643.

  Assignee: zhengruifeng
Resolution: Fixed

> Validate input dataset of ml.regression
> ---
>
> Key: SPARK-38643
> URL: https://issues.apache.org/jira/browse/SPARK-38643
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38546) replace deprecated ChiSqSelector with UnivariateFeatureSelector

2022-03-15 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38546?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-38546.

Resolution: Implemented

> replace deprecated ChiSqSelector with UnivariateFeatureSelector
> ---
>
> Key: SPARK-38546
> URL: https://issues.apache.org/jira/browse/SPARK-38546
> Project: Spark
>  Issue Type: Improvement
>  Components: Examples
>Affects Versions: 3.1.2, 3.2.0, 3.2.1
>Reporter: qian
>Priority: Major
>
> UnivariateFeatureSelector was added and ChiSqSelector was labeled as 
> deprecated in  
> SPARK-34080
> So we need replace deprecated ChiSqSelector with UnivariateFeatureSelector.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38414) Remove redundant SuppressWarnings

2022-03-07 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-38414.

Fix Version/s: 3.3.0
 Assignee: Yang Jie
   Resolution: Fixed

> Remove redundant SuppressWarnings
> -
>
> Key: SPARK-38414
> URL: https://issues.apache.org/jira/browse/SPARK-38414
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38269) Clean up redundant type cast

2022-03-02 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38269?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-38269.

Fix Version/s: 3.3.0
 Assignee: Yang Jie
   Resolution: Fixed

> Clean up redundant type cast
> 
>
> Key: SPARK-38269
> URL: https://issues.apache.org/jira/browse/SPARK-38269
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36553) KMeans fails with NegativeArraySizeException for K = 50000 after issue #27758 was introduced

2022-03-02 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36553?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-36553.

Fix Version/s: 3.1.3
   3.3.0
   3.2.2
 Assignee: zhengruifeng
   Resolution: Fixed

> KMeans fails with NegativeArraySizeException for K = 5 after issue #27758 
> was introduced
> 
>
> Key: SPARK-36553
> URL: https://issues.apache.org/jira/browse/SPARK-36553
> Project: Spark
>  Issue Type: Bug
>  Components: ML, MLlib, PySpark
>Affects Versions: 3.1.1
>Reporter: Anders Rydbirk
>Assignee: zhengruifeng
>Priority: Major
> Fix For: 3.1.3, 3.3.0, 3.2.2
>
>
> We are running KMeans on approximately 350M rows of x, y, z coordinates using 
> the following configuration:
> {code:java}
> KMeans(
>   featuresCol='features',
>   predictionCol='centroid_id',
>   k=5,
>   initMode='k-means||',
>   initSteps=2,
>   tol=0.5,
>   maxIter=20,
>   seed=SEED,
>   distanceMeasure='euclidean'
> )
> {code}
> When using Spark 3.0.0 this worked fine, but  when upgrading to 3.1.1 we are 
> consistently getting errors unless we reduce K.
> Stacktrace:
>  
> {code:java}
> An error occurred while calling o167.fit.An error occurred while calling 
> o167.fit.: java.lang.NegativeArraySizeException: -897458648 at 
> scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:194) at 
> scala.reflect.ManifestFactory$DoubleManifest.newArray(Manifest.scala:191) at 
> scala.Array$.ofDim(Array.scala:221) at 
> org.apache.spark.mllib.clustering.DistanceMeasure.computeStatistics(DistanceMeasure.scala:52)
>  at 
> org.apache.spark.mllib.clustering.KMeans.runAlgorithmWithWeight(KMeans.scala:280)
>  at org.apache.spark.mllib.clustering.KMeans.runWithWeight(KMeans.scala:231) 
> at org.apache.spark.ml.clustering.KMeans.$anonfun$fit$1(KMeans.scala:354) at 
> org.apache.spark.ml.util.Instrumentation$.$anonfun$instrumented$1(Instrumentation.scala:191)
>  at scala.util.Try$.apply(Try.scala:213) at 
> org.apache.spark.ml.util.Instrumentation$.instrumented(Instrumentation.scala:191)
>  at org.apache.spark.ml.clustering.KMeans.fit(KMeans.scala:329) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native 
> Method) at 
> java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(Unknown 
> Source) at 
> java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(Unknown 
> Source) at java.base/java.lang.reflect.Method.invoke(Unknown Source) at 
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
> py4j.Gateway.invoke(Gateway.java:282) at 
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
> py4j.commands.CallCommand.execute(CallCommand.java:79) at 
> py4j.GatewayConnection.run(GatewayConnection.java:238) at 
> java.base/java.lang.Thread.run(Unknown Source)
> {code}
>  
> The issue is introduced by 
> [#27758|#diff-725d4624ddf4db9cc51721c2ddaef50a1bc30e7b471e0439da28c5b5582efdfdR52]]
>  which significantly reduces the maximum value of K. Snippit of line that 
> throws error from [DistanceMeasure.scala:|#L52]]
> {code:java}
> val packedValues = Array.ofDim[Double](k * (k + 1) / 2)
> {code}
>  
> *What we have tried:*
>  * Reducing iterations
>  * Reducing input volume
>  * Reducing K
> Only reducing K have yielded success.
>  
> *Possible workaround:*
>  # Roll back to Spark 3.0.0 since a KMeansModel generated with 3.0.0 cannot 
> be loaded in 3.1.1.
>  # Reduce K. Currently trying with 45000.
>  
> *What we don't understand*:
> Given the line of code above, we do not understand why we would get an 
> integer overflow.
> For K=50,000, packedValues should be allocated with the size of 1,250,025,000 
> < (2^31) and not result in a negative array size.
>  
> *Suggested resolution:*
> I'm not strong in the inner workings on KMeans, but my immediate thought 
> would be to add a fallback to previous logic for K larger than a set 
> threshold if the optimisation is to stay in place, as it breaks compatibility 
> from 3.0.0 to 3.1.1 for edge cases.
>  
> Please let me know if more information is needed, this is my first time 
> raising a bug for a OS.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)

2022-02-28 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17499326#comment-17499326
 ] 

Huaxin Gao commented on SPARK-38357:


I will submit a PR soon.

> StackOverflowError with OR(data filter, partition filter)
> -
>
> Key: SPARK-38357
> URL: https://issues.apache.org/jira/browse/SPARK-38357
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Huaxin Gao
>Priority: Major
>
> If the filter has OR and contains both data filter and partition filter, 
> e.g. p is partition col and id is data col
> {code:java}
> SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) 
> {code}
> throws StackOverflowError



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-38357) StackOverflowError with OR(data filter, partition filter)

2022-02-28 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-38357:
--

 Summary: StackOverflowError with OR(data filter, partition filter)
 Key: SPARK-38357
 URL: https://issues.apache.org/jira/browse/SPARK-38357
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: Huaxin Gao


If the filter has OR and contains both data filter and partition filter, 
e.g. p is partition col and id is data col

{code:java}
SELECT * FROM tmp WHERE (p = 0 AND id > 0) OR (p = 1 AND id = 2) 
{code}

throws StackOverflowError




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-38100) Remove unused method in `Decimal`

2022-02-03 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-38100.

Fix Version/s: 3.2.2
   3.3
   Resolution: Fixed

> Remove unused method in `Decimal`
> -
>
> Key: SPARK-38100
> URL: https://issues.apache.org/jira/browse/SPARK-38100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Trivial
> Fix For: 3.2.2, 3.3
>
>
> there is a unused method `overflowException` in 
> `org.apache.spark.sql.types.Decimal`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-38100) Remove unused method in `Decimal`

2022-02-03 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-38100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-38100:
--

Assignee: Yang Jie

> Remove unused method in `Decimal`
> -
>
> Key: SPARK-38100
> URL: https://issues.apache.org/jira/browse/SPARK-38100
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.2.2, 3.3
>
>
> there is a unused method `overflowException` in 
> `org.apache.spark.sql.types.Decimal`.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30062) bug with DB2Driver using mode("overwrite") option("truncate",True)

2022-01-26 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-30062:
--

Assignee: Ivan Karol

> bug with DB2Driver using mode("overwrite") option("truncate",True)
> --
>
> Key: SPARK-30062
> URL: https://issues.apache.org/jira/browse/SPARK-30062
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: Guy Huinen
>Assignee: Ivan Karol
>Priority: Major
>  Labels: db2, pyspark
> Fix For: 3.2.2, 3.3
>
>
> using DB2Driver using mode("overwrite") option("truncate",True) gives sql 
> error
>  
> {code:java}
> dfClient.write\
>  .format("jdbc")\
>  .mode("overwrite")\
>  .option('driver', 'com.ibm.db2.jcc.DB2Driver')\
>  .option("url","jdbc:db2://")\
>  .option("user","xxx")\
>  .option("password","")\
>  .option("dbtable","")\
>  .option("truncate",True)\{code}
>  
>  gives the error below
> in summary i belief the semicolon is misplaced or malformated
>  
> {code:java}
> EXPO.EXPO#CMR_STG;IMMEDIATE{code}
>  
>  
> full error
> {code:java}
> An error occurred while calling o47.save. : 
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=END-OF-STATEMENT;LE EXPO.EXPO#CMR_STG;IMMEDIATE, 
> DRIVER=4.19.77 at com.ibm.db2.jcc.am.b4.a(b4.java:747) at 
> com.ibm.db2.jcc.am.b4.a(b4.java:66) at com.ibm.db2.jcc.am.b4.a(b4.java:135) 
> at com.ibm.db2.jcc.am.kh.c(kh.java:2788) at 
> com.ibm.db2.jcc.am.kh.d(kh.java:2776) at 
> com.ibm.db2.jcc.am.kh.b(kh.java:2143) at com.ibm.db2.jcc.t4.ab.i(ab.java:226) 
> at com.ibm.db2.jcc.t4.ab.c(ab.java:48) at com.ibm.db2.jcc.t4.p.b(p.java:38) 
> at com.ibm.db2.jcc.t4.av.h(av.java:124) at 
> com.ibm.db2.jcc.am.kh.ak(kh.java:2138) at 
> com.ibm.db2.jcc.am.kh.a(kh.java:3325) at com.ibm.db2.jcc.am.kh.c(kh.java:765) 
> at com.ibm.db2.jcc.am.kh.executeUpdate(kh.java:744) at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.truncateTable(JdbcUtils.scala:113)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:56)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
> py4j.Gateway.invoke(Gateway.java:282) at 
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
> py4j.commands.CallCommand.execute(CallCommand.java:79) at 
> py4j.GatewayConnection.run(GatewayConnection.java:238) at 
> java.lang.Thread.run(Thread.java:748){code}

[jira] [Resolved] (SPARK-30062) bug with DB2Driver using mode("overwrite") option("truncate",True)

2022-01-25 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30062?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-30062.

Fix Version/s: 3.2.2
   3.3
   Resolution: Fixed

> bug with DB2Driver using mode("overwrite") option("truncate",True)
> --
>
> Key: SPARK-30062
> URL: https://issues.apache.org/jira/browse/SPARK-30062
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.4.4
>Reporter: Guy Huinen
>Priority: Major
>  Labels: db2, pyspark
> Fix For: 3.2.2, 3.3
>
>
> using DB2Driver using mode("overwrite") option("truncate",True) gives sql 
> error
>  
> {code:java}
> dfClient.write\
>  .format("jdbc")\
>  .mode("overwrite")\
>  .option('driver', 'com.ibm.db2.jcc.DB2Driver')\
>  .option("url","jdbc:db2://")\
>  .option("user","xxx")\
>  .option("password","")\
>  .option("dbtable","")\
>  .option("truncate",True)\{code}
>  
>  gives the error below
> in summary i belief the semicolon is misplaced or malformated
>  
> {code:java}
> EXPO.EXPO#CMR_STG;IMMEDIATE{code}
>  
>  
> full error
> {code:java}
> An error occurred while calling o47.save. : 
> com.ibm.db2.jcc.am.SqlSyntaxErrorException: DB2 SQL Error: SQLCODE=-104, 
> SQLSTATE=42601, SQLERRMC=END-OF-STATEMENT;LE EXPO.EXPO#CMR_STG;IMMEDIATE, 
> DRIVER=4.19.77 at com.ibm.db2.jcc.am.b4.a(b4.java:747) at 
> com.ibm.db2.jcc.am.b4.a(b4.java:66) at com.ibm.db2.jcc.am.b4.a(b4.java:135) 
> at com.ibm.db2.jcc.am.kh.c(kh.java:2788) at 
> com.ibm.db2.jcc.am.kh.d(kh.java:2776) at 
> com.ibm.db2.jcc.am.kh.b(kh.java:2143) at com.ibm.db2.jcc.t4.ab.i(ab.java:226) 
> at com.ibm.db2.jcc.t4.ab.c(ab.java:48) at com.ibm.db2.jcc.t4.p.b(p.java:38) 
> at com.ibm.db2.jcc.t4.av.h(av.java:124) at 
> com.ibm.db2.jcc.am.kh.ak(kh.java:2138) at 
> com.ibm.db2.jcc.am.kh.a(kh.java:3325) at com.ibm.db2.jcc.am.kh.c(kh.java:765) 
> at com.ibm.db2.jcc.am.kh.executeUpdate(kh.java:744) at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.truncateTable(JdbcUtils.scala:113)
>  at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcRelationProvider.createRelation(JdbcRelationProvider.scala:56)
>  at 
> org.apache.spark.sql.execution.datasources.SaveIntoDataSourceCommand.run(SaveIntoDataSourceCommand.scala:45)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:70)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:68)
>  at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.doExecute(commands.scala:86)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:80) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:676)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:676) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:285)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:271) at 
> sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  at java.lang.reflect.Method.invoke(Method.java:498) at 
> py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at 
> py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:357) at 
> py4j.Gateway.invoke(Gateway.java:282) at 
> py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at 
> py4j.commands.CallCommand.execute(CallCommand.java:79) at 
> py4j.GatewayConnection.run(GatewayConnection.java:238) at 
> 

[jira] [Commented] (SPARK-37963) Need to update Partition URI after renaming table in InMemoryCatalog

2022-01-20 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37963?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17479813#comment-17479813
 ] 

Huaxin Gao commented on SPARK-37963:


Changed the fix version to 3.2.2 for now. Will change back if RC2 fails.

> Need to update Partition URI after renaming table in InMemoryCatalog
> 
>
> Key: SPARK-37963
> URL: https://issues.apache.org/jira/browse/SPARK-37963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>
> After renaming a partitioned table, select from the new table from 
> InMemoryCatalog will get an empty result.
> The following checkAnswer will fail as the result is empty.
> {code:java}
> sql(s"create table foo(i int, j int) using PARQUET partitioned by (j)")
> sql("insert into table foo partition(j=2) values (1)")
> sql(s"alter table foo rename to bar")
> checkAnswer(spark.table("bar"), Row(1, 2)) {code}
> To fix the bug, we need to update Partition URI after renaming a table in 
> InMemoryCatalog
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37963) Need to update Partition URI after renaming table in InMemoryCatalog

2022-01-20 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-37963:
---
Fix Version/s: 3.2.2
   (was: 3.2.1)

> Need to update Partition URI after renaming table in InMemoryCatalog
> 
>
> Key: SPARK-37963
> URL: https://issues.apache.org/jira/browse/SPARK-37963
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0, 3.2.2
>
>
> After renaming a partitioned table, select from the new table from 
> InMemoryCatalog will get an empty result.
> The following checkAnswer will fail as the result is empty.
> {code:java}
> sql(s"create table foo(i int, j int) using PARQUET partitioned by (j)")
> sql("insert into table foo partition(j=2) values (1)")
> sql(s"alter table foo rename to bar")
> checkAnswer(spark.table("bar"), Row(1, 2)) {code}
> To fix the bug, we need to update Partition URI after renaming a table in 
> InMemoryCatalog
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37959) Fix the UT of checking norm in KMeans & BiKMeans

2022-01-19 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37959.

Fix Version/s: 3.2.1
   3.3.0
 Assignee: zhengruifeng  (was: Apache Spark)
   Resolution: Fixed

> Fix the UT of checking norm in KMeans & BiKMeans
> 
>
> Key: SPARK-37959
> URL: https://issues.apache.org/jira/browse/SPARK-37959
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.3.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
>
> In KMeansSuite and BisectingKMeansSuite, there are some unused lines:
>  
> {code:java}
> model1.clusterCenters.forall(Vectors.norm(_, 2) == 1.0 {code}
>  
> For cosine distance, the norm of centering vector should be 1, so the norm 
> checking is meaningful;
> For euclidean distance, the norm checking is meaningless;
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37923) Generate partition transforms for BucketSpec inside parser

2022-01-16 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-37923:
--

 Summary: Generate partition transforms for BucketSpec inside parser
 Key: SPARK-37923
 URL: https://issues.apache.org/jira/browse/SPARK-37923
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3
Reporter: Huaxin Gao


We currently generate partition transforms for BucketSpec in Analyzer. It's 
cleaner to do this inside Parser.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37818) Add option for show create table command

2022-01-12 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-37818:
---
Fix Version/s: 3.2.1

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.2.1, 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-36717:
---
Fix Version/s: (was: 3.2.0)

> Wrong order of variable initialization may lead to incorrect behavior
> -
>
> Key: SPARK-36717
> URL: https://issues.apache.org/jira/browse/SPARK-36717
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Jianmeng Li
>Assignee: Jianmeng Li
>Priority: Minor
> Fix For: 3.1.3, 3.0.4, 3.2.1, 3.3.0
>
>
> Incorrect order of variable initialization may lead to incorrect behavior, 
> Related code: 
> [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94]
>  , TorrentBroadCast will get wrong checksumEnabled value after 
> initialization, this may not be what we need, we can move L94 front of 
> setConf(SparkEnv.get.conf) to avoid this.
> Supplement:
> Snippet 1:
> {code:java}
> class Broadcast {
>   def setConf(): Unit = {
> checksumEnabled = true
>   }
>   setConf()
>   var checksumEnabled = false
> }
> println(new Broadcast().checksumEnabled){code}
> output:
> {code:java}
> false{code}
> Snippet 2:
> {code:java}
> class Broadcast {
>   var checksumEnabled = false
>   def setConf(): Unit = {
> checksumEnabled = true
>   }
>   setConf()
> }
> println(new Broadcast().checksumEnabled){code}
> output: 
> {code:java}
> true{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36717) Wrong order of variable initialization may lead to incorrect behavior

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-36717:
---
Fix Version/s: 3.2.1

> Wrong order of variable initialization may lead to incorrect behavior
> -
>
> Key: SPARK-36717
> URL: https://issues.apache.org/jira/browse/SPARK-36717
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Jianmeng Li
>Assignee: Jianmeng Li
>Priority: Minor
> Fix For: 3.2.0, 3.1.3, 3.0.4, 3.2.1, 3.3.0
>
>
> Incorrect order of variable initialization may lead to incorrect behavior, 
> Related code: 
> [TorrentBroadcast.scala|https://github.com/apache/spark/blob/0494dc90af48ce7da0625485a4dc6917a244d580/core/src/main/scala/org/apache/spark/broadcast/TorrentBroadcast.scala#L94]
>  , TorrentBroadCast will get wrong checksumEnabled value after 
> initialization, this may not be what we need, we can move L94 front of 
> setConf(SparkEnv.get.conf) to avoid this.
> Supplement:
> Snippet 1:
> {code:java}
> class Broadcast {
>   def setConf(): Unit = {
> checksumEnabled = true
>   }
>   setConf()
>   var checksumEnabled = false
> }
> println(new Broadcast().checksumEnabled){code}
> output:
> {code:java}
> false{code}
> Snippet 2:
> {code:java}
> class Broadcast {
>   var checksumEnabled = false
>   def setConf(): Unit = {
> checksumEnabled = true
>   }
>   setConf()
> }
> println(new Broadcast().checksumEnabled){code}
> output: 
> {code:java}
> true{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36979) Add RewriteLateralSubquery rule into nonExcludableRules

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-36979:
---
Fix Version/s: 3.2.1
   (was: 3.2.0)

> Add RewriteLateralSubquery rule into nonExcludableRules
> ---
>
> Key: SPARK-36979
> URL: https://issues.apache.org/jira/browse/SPARK-36979
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: XiDuo You
>Priority: Minor
> Fix For: 3.2.1
>
>
> Lateral Join has no meaning without rule `RewriteLateralSubquery`. So now if 
> we set 
> `spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.RewriteLateralSubquery`,
>  the lateral join query will fail with:
> {code:java}
> java.lang.AssertionError: assertion failed: No plan for LateralJoin 
> lateral-subquery#218
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33277) Python/Pandas UDF right after off-heap vectorized reader could cause executor crash.

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-33277:
---
Fix Version/s: 3.2.1

> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> 
>
> Key: SPARK-33277
> URL: https://issues.apache.org/jira/browse/SPARK-33277
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 2.4.7, 3.0.1
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0, 3.2.1
>
>
> Python/Pandas UDF right after off-heap vectorized reader could cause executor 
> crash.
> E.g.,:
> {code:java}
> spark.range(0, 10, 1, 1).write.parquet(path)
> spark.conf.set("spark.sql.columnVector.offheap.enabled", True)
> def f(x):
> return 0
> fUdf = udf(f, LongType())
> spark.read.parquet(path).select(fUdf('id')).head()
> {code}
> This is because, the Python evaluation consumes the parent iterator in a 
> separate thread and it consumes more data from the parent even after the task 
> ends and the parent is closed. If an off-heap column vector exists in the 
> parent iterator, it could cause segmentation fault which crashes the executor.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-36464:
---
Fix Version/s: 3.2.0

> Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream 
> for Writing Over 2GB Data
> --
>
> Key: SPARK-36464
> URL: https://issues.apache.org/jira/browse/SPARK-36464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
> Fix For: 3.2.0, 3.1.3, 3.0.4, 3.2.1
>
>
> The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; 
> however, the underlying `_size` variable is initialized as `Int`.
> That causes an overflow and returns a negative size when over 2GB data is 
> written into `ChunkedByteBufferOutputStream`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36464) Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream for Writing Over 2GB Data

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-36464:
---
Fix Version/s: 3.2.1
   (was: 3.2.0)

> Fix Underlying Size Variable Initialization in ChunkedByteBufferOutputStream 
> for Writing Over 2GB Data
> --
>
> Key: SPARK-36464
> URL: https://issues.apache.org/jira/browse/SPARK-36464
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.8, 3.0.3, 3.1.2, 3.3.0
>Reporter: Kazuyuki Tanimura
>Assignee: Kazuyuki Tanimura
>Priority: Major
> Fix For: 3.1.3, 3.0.4, 3.2.1
>
>
> The `size` method of `ChunkedByteBufferOutputStream` returns a `Long` value; 
> however, the underlying `_size` variable is initialized as `Int`.
> That causes an overflow and returns a negative size when over 2GB data is 
> written into `ChunkedByteBufferOutputStream`



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-30789:
---
Fix Version/s: (was: 3.2.0)

> Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
> --
>
> Key: SPARK-30789
> URL: https://issues.apache.org/jira/browse/SPARK-30789
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.1
>
>
> All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS 
> | RESPECT NULLS. For example:
> {code:java}
> LEAD (value_expr [, offset ])
> [ IGNORE NULLS | RESPECT NULLS ]
> OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code}
>  
> {code:java}
> LAG (value_expr [, offset ])
> [ IGNORE NULLS | RESPECT NULLS ]
> OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code}
>  
> {code:java}
> NTH_VALUE (expr, offset)
> [ IGNORE NULLS | RESPECT NULLS ]
> OVER
> ( [ PARTITION BY window_partition ]
> [ ORDER BY window_ordering 
>  frame_clause ] ){code}
>  
> *Oracle:*
> [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0]
> *Redshift*
> [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html]
> *Presto*
> [https://prestodb.io/docs/current/functions/window.html]
> *DB2*
> [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm]
> *Teradata*
> [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w]
> *Snowflake*
> [https://docs.snowflake.com/en/sql-reference/functions/lead.html]
> [https://docs.snowflake.com/en/sql-reference/functions/lag.html]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-34399) Add file commit time to metrics and shown in SQL Tab UI

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-34399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-34399:
---
Fix Version/s: 3.2.1
   (was: 3.2.0)

> Add file commit time to metrics and shown in SQL Tab UI
> ---
>
> Key: SPARK-34399
> URL: https://issues.apache.org/jira/browse/SPARK-34399
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.2.1
>
>
> Add file commit time to metrics and shown in SQL Tab UI



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-35714) Bug fix for deadlock during the executor shutdown

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-35714:
---
Fix Version/s: 3.2.1

> Bug fix for deadlock during the executor shutdown
> -
>
> Key: SPARK-35714
> URL: https://issues.apache.org/jira/browse/SPARK-35714
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Minor
> Fix For: 3.0.3, 3.2.0, 3.1.3, 3.2.1
>
> Attachments: three_thread_lock.log
>
>
> When a executor received a TERM signal, it (the second TERM signal) will lock 
> java.lang.Shutdown class and then call Shutdown.exit() method to exit the JVM.
>  Shutdown will call SparkShutdownHook to shutdown the executor.
>  During the executor shutdown phase, RemoteProcessDisconnected event will be 
> send to the RPC inbox, and then WorkerWatcher will try to call 
> System.exit(-1) again.
>  Because java.lang.Shutdown has already locked, a deadlock has occurred.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30789) Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30789?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-30789:
---
Fix Version/s: 3.2.1

> Support (IGNORE | RESPECT) NULLS for LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE
> --
>
> Key: SPARK-30789
> URL: https://issues.apache.org/jira/browse/SPARK-30789
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.2.0, 3.2.1
>
>
> All of LEAD/LAG/NTH_VALUE/FIRST_VALUE/LAST_VALUE should support IGNORE NULLS 
> | RESPECT NULLS. For example:
> {code:java}
> LEAD (value_expr [, offset ])
> [ IGNORE NULLS | RESPECT NULLS ]
> OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code}
>  
> {code:java}
> LAG (value_expr [, offset ])
> [ IGNORE NULLS | RESPECT NULLS ]
> OVER ( [ PARTITION BY window_partition ] ORDER BY window_ordering ){code}
>  
> {code:java}
> NTH_VALUE (expr, offset)
> [ IGNORE NULLS | RESPECT NULLS ]
> OVER
> ( [ PARTITION BY window_partition ]
> [ ORDER BY window_ordering 
>  frame_clause ] ){code}
>  
> *Oracle:*
> [https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/NTH_VALUE.html#GUID-F8A0E88C-67E5-4AA6-9515-95D03A7F9EA0]
> *Redshift*
> [https://docs.aws.amazon.com/redshift/latest/dg/r_WF_NTH.html]
> *Presto*
> [https://prestodb.io/docs/current/functions/window.html]
> *DB2*
> [https://www.ibm.com/support/knowledgecenter/SSGU8G_14.1.0/com.ibm.sqls.doc/ids_sqs_1513.htm]
> *Teradata*
> [https://docs.teradata.com/r/756LNiPSFdY~4JcCCcR5Cw/GjCT6l7trjkIEjt~7Dhx4w]
> *Snowflake*
> [https://docs.snowflake.com/en/sql-reference/functions/lead.html]
> [https://docs.snowflake.com/en/sql-reference/functions/lag.html]
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37818) Add option for show create table command

2022-01-10 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472183#comment-17472183
 ] 

Huaxin Gao commented on SPARK-37818:


[~Gengliang.Wang] version 3.2.2 doesn't exist yet. I will just set the version 
to 3.3.0 for now. Will update the version to 3.2.2 later.

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37818) Add option for show create table command

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-37818:
---
Fix Version/s: (was: 3.2.1)

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37818) Add option for show create table command

2022-01-10 Thread Huaxin Gao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37818?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17472181#comment-17472181
 ] 

Huaxin Gao commented on SPARK-37818:


[~Gengliang.Wang] I am drafting the 3.2.1 voting email now. I will need to 
change the fixed version to 3.2.2, otherwise, the list of bug fixes will 
contain this one. I will change this back to 3.2.1 if RC1 doesn't pass.

> Add option for show create table command
> 
>
> Key: SPARK-37818
> URL: https://issues.apache.org/jira/browse/SPARK-37818
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: PengLei
>Priority: Trivial
> Fix For: 3.2.1, 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down

2022-01-10 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-37802:
---
Fix Version/s: 3.2.1
   (was: 3.2.0)

> composite field name like `field name` doesn't work with Aggregate push down
> 
>
> Key: SPARK-37802
> URL: https://issues.apache.org/jira/browse/SPARK-37802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.2.1, 3.3.0
>
>
> {code:java}
> sql("SELECT SUM(`field name`) FROM h2.test.table")
> org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input 'name' expecting (line 1, pos 9)
>   at 
> org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212)
>   at 
> org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
>   at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467)
>   at org.antlr.v4.runtime.Parser.match(Parser.java:206)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down

2022-01-09 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-37802:
---
Fix Version/s: 3.2.0

> composite field name like `field name` doesn't work with Aggregate push down
> 
>
> Key: SPARK-37802
> URL: https://issues.apache.org/jira/browse/SPARK-37802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.2.0, 3.3.0
>
>
> {code:java}
> sql("SELECT SUM(`field name`) FROM h2.test.table")
> org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input 'name' expecting (line 1, pos 9)
>   at 
> org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212)
>   at 
> org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
>   at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467)
>   at org.antlr.v4.runtime.Parser.match(Parser.java:206)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37527) Translate more standard aggregate functions for pushdown

2022-01-06 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37527.

Fix Version/s: 3.3.0
 Assignee: jiaan.geng
   Resolution: Fixed

> Translate more standard aggregate functions for pushdown
> 
>
> Key: SPARK-37527
> URL: https://issues.apache.org/jira/browse/SPARK-37527
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently, Spark aggregate pushdown will translate some standard aggregate 
> functions, so that compile these functions suitable specify database.
> After this job, users could override JdbcDialect.compileAggregate to 
> implement some aggregate functions supported by some database.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down

2022-01-03 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-37802:
--

Assignee: Huaxin Gao

> composite field name like `field name` doesn't work with Aggregate push down
> 
>
> Key: SPARK-37802
> URL: https://issues.apache.org/jira/browse/SPARK-37802
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
>
> {code:java}
> sql("SELECT SUM(`field name`) FROM h2.test.table")
> org.apache.spark.sql.catalyst.parser.ParseException: 
> extraneous input 'name' expecting (line 1, pos 9)
>   at 
> org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212)
>   at 
> org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
>   at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548)
>   at 
> org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467)
>   at org.antlr.v4.runtime.Parser.match(Parser.java:206)
>   at 
> org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37802) composite field name like `field name` doesn't work with Aggregate push down

2022-01-02 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-37802:
--

 Summary: composite field name like `field name` doesn't work with 
Aggregate push down
 Key: SPARK-37802
 URL: https://issues.apache.org/jira/browse/SPARK-37802
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Huaxin Gao



{code:java}
sql("SELECT SUM(`field name`) FROM h2.test.table")

org.apache.spark.sql.catalyst.parser.ParseException: 
extraneous input 'name' expecting (line 1, pos 9)

at 
org.apache.spark.sql.catalyst.parser.ParseErrorListener$.syntaxError(ParseDriver.scala:212)
at 
org.antlr.v4.runtime.ProxyErrorListener.syntaxError(ProxyErrorListener.java:41)
at org.antlr.v4.runtime.Parser.notifyErrorListeners(Parser.java:544)
at 
org.antlr.v4.runtime.DefaultErrorStrategy.reportUnwantedToken(DefaultErrorStrategy.java:377)
at 
org.antlr.v4.runtime.DefaultErrorStrategy.singleTokenDeletion(DefaultErrorStrategy.java:548)
at 
org.antlr.v4.runtime.DefaultErrorStrategy.recoverInline(DefaultErrorStrategy.java:467)
at org.antlr.v4.runtime.Parser.match(Parser.java:206)
at 
org.apache.spark.sql.catalyst.parser.SqlBaseParser.singleMultipartIdentifier(SqlBaseParser.java:519)
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37627) Add sorted column in BucketTransform

2021-12-12 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-37627:
--

 Summary: Add sorted column in BucketTransform
 Key: SPARK-37627
 URL: https://issues.apache.org/jira/browse/SPARK-37627
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao


In V1, we can create table with sorted bucket like the following:

{code:java}
  sql("CREATE TABLE tbl(a INT, b INT) USING parquet " +
"CLUSTERED BY (a) SORTED BY (b) INTO 5 BUCKETS")
{code}

However, creating table with sorted bucket in V2 failed with Exception

{code:java}
org.apache.spark.sql.AnalysisException: Cannot convert bucketing with sort 
columns to a transform.
{code}

We should be able to create table with sorted bucket in V2.




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37545) V2 CreateTableAsSelect command should qualify location

2021-12-04 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37545?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37545.

Fix Version/s: 3.3.0
 Assignee: Terry Kim
   Resolution: Fixed

> V2 CreateTableAsSelect command should qualify location
> --
>
> Key: SPARK-37545
> URL: https://issues.apache.org/jira/browse/SPARK-37545
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.3.0
>
>
> V2 CreateTableAsSelect command should qualify location. Currently, 
>  
> {code:java}
> spark.sql("CREATE TABLE testcat.t USING foo LOCATION '/tmp/foo' AS SELECT id 
> FROM source")
> spark.sql("DESCRIBE EXTENDED testcat.t").show(false)
> {code}
> displays the location as `/tmp/foo` whereas V1 command displays/stores it as 
> qualified (`[file:/tmp/foo|file:///tmp/foo]`).
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37546) V2 ReplaceTableAsSelect command should qualify location

2021-12-04 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-37546:
--

 Summary: V2 ReplaceTableAsSelect command should qualify location
 Key: SPARK-37546
 URL: https://issues.apache.org/jira/browse/SPARK-37546
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao


V2 ReplaceTableAsSelect command should qualify location. Currently, 


{code:java}
spark.sql("REPLACE TABLE testcat.t USING foo LOCATION '/tmp/foo' AS SELECT id 
FROM source")
spark.sql("DESCRIBE EXTENDED testcat.t").show(false)
{code}

displays the location as `/tmp/foo` whereas V1 command displays/stores it as 
qualified (`file:/tmp/foo`).




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37330) Migrate ReplaceTableStatement to v2 command

2021-12-03 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37330.

Fix Version/s: 3.3.0
 Assignee: dch nguyen
   Resolution: Fixed

> Migrate ReplaceTableStatement to v2 command
> ---
>
> Key: SPARK-37330
> URL: https://issues.apache.org/jira/browse/SPARK-37330
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37523) Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-03 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao updated SPARK-37523:
---
Affects Version/s: 3.2.1

> Support optimize skewed partitions in Distribution and Ordering if 
> numPartitions is not specified
> -
>
> Key: SPARK-37523
> URL: https://issues.apache.org/jira/browse/SPARK-37523
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.1, 3.3.0
>Reporter: Huaxin Gao
>Priority: Major
>
> When doing repartition in distribution and sort, if data source requests for 
> a specific number of partitions, we should not optimize repartition. However, 
> if data source does not request for a specific number of partitions, Spark 
> should optimize repartition and split the skewed partitions if necessary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37523) Support optimize skewed partitions in Distribution and Ordering if numPartitions is not specified

2021-12-02 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-37523:
--

 Summary: Support optimize skewed partitions in Distribution and 
Ordering if numPartitions is not specified
 Key: SPARK-37523
 URL: https://issues.apache.org/jira/browse/SPARK-37523
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Huaxin Gao


When doing repartition in distribution and sort, if data source requests for a 
specific number of partitions, we should not optimize repartition. However, if 
data source does not request for a specific number of partitions, Spark should 
optimize repartition and split the skewed partitions if necessary.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37496) Migrate ReplaceTableAsSelectStatement to v2 command

2021-12-01 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao reassigned SPARK-37496:
--

Assignee: Huaxin Gao

> Migrate ReplaceTableAsSelectStatement to v2 command
> ---
>
> Key: SPARK-37496
> URL: https://issues.apache.org/jira/browse/SPARK-37496
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37496) Migrate ReplaceTableAsSelectStatement to v2 command

2021-12-01 Thread Huaxin Gao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Huaxin Gao resolved SPARK-37496.

Resolution: Fixed

> Migrate ReplaceTableAsSelectStatement to v2 command
> ---
>
> Key: SPARK-37496
> URL: https://issues.apache.org/jira/browse/SPARK-37496
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   >