[jira] [Comment Edited] (SPARK-33638) Full support of V2 table creation in Structured Streaming writer path

2020-12-03 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243803#comment-17243803
 ] 

Jungtaek Lim edited comment on SPARK-33638 at 12/4/20, 7:56 AM:


I don't agree with handling this in DataStreamWriter, hence I changed the 
title. My claim is designing DataStreamWriterV2, nothing else.

I also don't agree that we need to deal with partition columns verification in 
such way. DataFrameWriterV2 does this nicely, via branching the path between 
appending/overwriting/truncating table vs creating/replacing table and enforce 
latter whenever the configuration for creating table is provided. I think this 
is pretty much clearer for end users, rather than letting they concern about 
the impact.

For sure, even we address it with DataStreamWriterV2, we still need to deal 
with the consistency in DataStreamWriter.toTable(). Given DataStreamWriterV2 is 
taking place and recommended for table write, that would be less important.


was (Author: kabhwan):
I don't agree with handling this in DataStreamWriter, hence I changed the 
title. My claim is designing DataStreamWriterV2, nothing else.

I also don't agree that we need to deal with partition columns verification in 
such way. DataFrameWriterV2 does this nicely, via branching the path between 
appending/overwriting/truncating table vs creating/replacing table and enforce 
latter whenever the configuration for creating table is provided. I think this 
is pretty much clearer for end users, rather than letting they concern about 
the impact.

For sure, even we address it with DataStreamWriterV2, we still need to deal 
with the consistency in DataStreamWriter.toTable(). Given DataStreamWriterV2 is 
taking place and recommended for table write, that would be less important.

> Full support of V2 table creation in Structured Streaming writer path
> -
>
> Key: SPARK-33638
> URL: https://issues.apache.org/jira/browse/SPARK-33638
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Yuanjian Li
>Priority: Blocker
>
> Currently, we want to add support of creating if not exists in 
> DataStreamWriter.toTable API. Since the file format in streaming doesn't 
> support DSv2 for now, the current implementation mainly focuses on V1 
> support. We need more work to do for the full support of V2 table creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33638) Full support of V2 table creation in Structured Streaming writer path

2020-12-03 Thread Jungtaek Lim (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243803#comment-17243803
 ] 

Jungtaek Lim commented on SPARK-33638:
--

I don't agree with handling this in DataStreamWriter, hence I changed the 
title. My claim is designing DataStreamWriterV2, nothing else.

I also don't agree that we need to deal with partition columns verification in 
such way. DataFrameWriterV2 does this nicely, via branching the path between 
appending/overwriting/truncating table vs creating/replacing table and enforce 
latter whenever the configuration for creating table is provided. I think this 
is pretty much clearer for end users, rather than letting they concern about 
the impact.

For sure, even we address it with DataStreamWriterV2, we still need to deal 
with the consistency in DataStreamWriter.toTable(). Given DataStreamWriterV2 is 
taking place and recommended for table write, that would be less important.

> Full support of V2 table creation in Structured Streaming writer path
> -
>
> Key: SPARK-33638
> URL: https://issues.apache.org/jira/browse/SPARK-33638
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Yuanjian Li
>Priority: Blocker
>
> Currently, we want to add support of creating if not exists in 
> DataStreamWriter.toTable API. Since the file format in streaming doesn't 
> support DSv2 for now, the current implementation mainly focuses on V1 
> support. We need more work to do for the full support of V2 table creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33638) Full support of V2 table creation in Structured Streaming writer path

2020-12-03 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim updated SPARK-33638:
-
Summary: Full support of V2 table creation in Structured Streaming writer 
path  (was: Full support of V2 table creation in DataStreamWriter.toTable API)

> Full support of V2 table creation in Structured Streaming writer path
> -
>
> Key: SPARK-33638
> URL: https://issues.apache.org/jira/browse/SPARK-33638
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Yuanjian Li
>Priority: Blocker
>
> Currently, we want to add support of creating if not exists in 
> DataStreamWriter.toTable API. Since the file format in streaming doesn't 
> support DSv2 for now, the current implementation mainly focuses on V1 
> support. We need more work to do for the full support of V2 table creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33656) Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33656.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30601
[https://github.com/apache/spark/pull/30601]

> Add option to keep container after tests finish for 
> DockerJDBCIntegrationSuites for debug
> -
>
> Key: SPARK-33656
> URL: https://issues.apache.org/jira/browse/SPARK-33656
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.1.0
>
>
> DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, 
> PostgresIntegrationSuite) launch a docker container which is removed after 
> tests finish.
> If we have an option to keep the container, it would be useful for debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33577) Add support for V1Table in stream writer table API

2020-12-03 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-33577.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30521
[https://github.com/apache/spark/pull/30521]

> Add support for V1Table in stream writer table API
> --
>
> Key: SPARK-33577
> URL: https://issues.apache.org/jira/browse/SPARK-33577
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
> Fix For: 3.1.0
>
>
> After SPARK-32896, we have table API for stream writer but only support 
> DataSource v2 tables. Here we add the following enhancements:
>  * Create non-existing tables by default
>  * Support both managed and external V1Tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33577) Add support for V1Table in stream writer table API

2020-12-03 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-33577:


Assignee: Yuanjian Li

> Add support for V1Table in stream writer table API
> --
>
> Key: SPARK-33577
> URL: https://issues.apache.org/jira/browse/SPARK-33577
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.0.0
>Reporter: Yuanjian Li
>Assignee: Yuanjian Li
>Priority: Major
>
> After SPARK-32896, we have table API for stream writer but only support 
> DataSource v2 tables. Here we add the following enhancements:
>  * Create non-existing tables by default
>  * Support both managed and external V1Tables



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33659) Document the current behavior for DataStreamWriter.toTable API

2020-12-03 Thread Yuanjian Li (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243795#comment-17243795
 ] 

Yuanjian Li commented on SPARK-33659:
-

I'm working on this.

> Document the current behavior for DataStreamWriter.toTable API
> --
>
> Key: SPARK-33659
> URL: https://issues.apache.org/jira/browse/SPARK-33659
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Yuanjian Li
>Priority: Blocker
>
> Follow up work for SPARK-33577 and need to be done before the 3.1 release. As 
> we didn't have full support for the V2 table created in the API, the 
> following documentation work is needed:
>  * figure out the effects when configurations are (provider/partitionBy) 
> conflicting with existing table, and document in javadoc of {{toTable}}. I 
> think you'll need to make a matrix and describe which takes effect (table vs 
> input) - creating table vs table exists, DSv1 vs DSv2 (4 different situations 
> should be all documented).
>  * document the lack of functionality on creating v2 table in javadoc of 
> {{toTable}}, and guide that they should ensure table is created in prior to 
> avoid the behavior unintended/insufficient table is being created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33659) Document the current behavior for DataStreamWriter.toTable API

2020-12-03 Thread Yuanjian Li (Jira)
Yuanjian Li created SPARK-33659:
---

 Summary: Document the current behavior for 
DataStreamWriter.toTable API
 Key: SPARK-33659
 URL: https://issues.apache.org/jira/browse/SPARK-33659
 Project: Spark
  Issue Type: Improvement
  Components: Structured Streaming
Affects Versions: 3.1.0
Reporter: Yuanjian Li


Follow up work for SPARK-33577 and need to be done before the 3.1 release. As 
we didn't have full support for the V2 table created in the API, the following 
documentation work is needed:
 * figure out the effects when configurations are (provider/partitionBy) 
conflicting with existing table, and document in javadoc of {{toTable}}. I 
think you'll need to make a matrix and describe which takes effect (table vs 
input) - creating table vs table exists, DSv1 vs DSv2 (4 different situations 
should be all documented).
 * document the lack of functionality on creating v2 table in javadoc of 
{{toTable}}, and guide that they should ensure table is created in prior to 
avoid the behavior unintended/insufficient table is being created.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33571.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/30596

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
> Fix For: 3.1.0
>
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33658) Suggest using datetime conversion functions for invalid ANSI casting

2020-12-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33658.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30603
[https://github.com/apache/spark/pull/30603]

> Suggest using datetime conversion functions for invalid ANSI casting
> 
>
> Key: SPARK-33658
> URL: https://issues.apache.org/jira/browse/SPARK-33658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> In ANSI mode, explicit cast between DateTime types and Numeric types is not 
> allowed.
> As of now, we have introduced new functions 
> UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS/UNIX_DATE/DATE_FROM_UNIX_DATE, we can 
> show suggestions to users so that they can complete these type conversions 
> precisely and easily in ANSI mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33430) Support namespaces in JDBC v2 Table Catalog

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33430.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30473
[https://github.com/apache/spark/pull/30473]

> Support namespaces in JDBC v2 Table Catalog
> ---
>
> Key: SPARK-33430
> URL: https://issues.apache.org/jira/browse/SPARK-33430
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> When I extend JDBCTableCatalogSuite by 
> org.apache.spark.sql.execution.command.v2.ShowTablesSuite, for instance:
> {code:scala}
> import org.apache.spark.sql.execution.command.v2.ShowTablesSuite
> class JDBCTableCatalogSuite extends ShowTablesSuite {
>   override def version: String = "JDBC V2"
>   override def catalog: String = "h2"
> ...
> {code}
> some tests from JDBCTableCatalogSuite fail with:
> {code}
> [info] - SHOW TABLES JDBC V2: show an existing table *** FAILED *** (2 
> seconds, 502 milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Cannot use catalog h2: does 
> not support namespaces;
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$CatalogHelper.asNamespaceCatalog(CatalogV2Implicits.scala:83)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:208)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33430) Support namespaces in JDBC v2 Table Catalog

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33430:
---

Assignee: Huaxin Gao

> Support namespaces in JDBC v2 Table Catalog
> ---
>
> Key: SPARK-33430
> URL: https://issues.apache.org/jira/browse/SPARK-33430
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Huaxin Gao
>Priority: Major
>
> When I extend JDBCTableCatalogSuite by 
> org.apache.spark.sql.execution.command.v2.ShowTablesSuite, for instance:
> {code:scala}
> import org.apache.spark.sql.execution.command.v2.ShowTablesSuite
> class JDBCTableCatalogSuite extends ShowTablesSuite {
>   override def version: String = "JDBC V2"
>   override def catalog: String = "h2"
> ...
> {code}
> some tests from JDBCTableCatalogSuite fail with:
> {code}
> [info] - SHOW TABLES JDBC V2: show an existing table *** FAILED *** (2 
> seconds, 502 milliseconds)
> [info]   org.apache.spark.sql.AnalysisException: Cannot use catalog h2: does 
> not support namespaces;
> [info]   at 
> org.apache.spark.sql.connector.catalog.CatalogV2Implicits$CatalogHelper.asNamespaceCatalog(CatalogV2Implicits.scala:83)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:208)
> [info]   at 
> org.apache.spark.sql.catalyst.analysis.ResolveCatalogs$$anonfun$apply$1.applyOrElse(ResolveCatalogs.scala:34)
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33638) Full support of V2 table creation in DataStreamWriter.toTable API

2020-12-03 Thread Yuanjian Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuanjian Li updated SPARK-33638:

Priority: Blocker  (was: Major)

> Full support of V2 table creation in DataStreamWriter.toTable API
> -
>
> Key: SPARK-33638
> URL: https://issues.apache.org/jira/browse/SPARK-33638
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.1.0
>Reporter: Yuanjian Li
>Priority: Blocker
>
> Currently, we want to add support of creating if not exists in 
> DataStreamWriter.toTable API. Since the file format in streaming doesn't 
> support DSv2 for now, the current implementation mainly focuses on V1 
> support. We need more work to do for the full support of V2 table creation.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33142) SQL temp view should store SQL text as well

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33142:
---

Assignee: Linhong Liu

> SQL temp view should store SQL text as well
> ---
>
> Key: SPARK-33142
> URL: https://issues.apache.org/jira/browse/SPARK-33142
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.1.0
>
>
> TODO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33142) SQL temp view should store SQL text as well

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33142.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30567
[https://github.com/apache/spark/pull/30567]

> SQL temp view should store SQL text as well
> ---
>
> Key: SPARK-33142
> URL: https://issues.apache.org/jira/browse/SPARK-33142
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Leanken.Lin
>Priority: Major
> Fix For: 3.1.0
>
>
> TODO



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33647) cache table not working for persisted view

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-33647:
---

Assignee: Linhong Liu

> cache table not working for persisted view
> --
>
> Key: SPARK-33647
> URL: https://issues.apache.org/jira/browse/SPARK-33647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
>
> In `CacheManager`, tables (including views) are cached by its logical plan, 
> and
> use `QueryPlan.sameResult` to lookup the cache. But the PersistedView wraps
> the child plan with a `View` which always lead false for `sameResult` check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33647) cache table not working for persisted view

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33647.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30567
[https://github.com/apache/spark/pull/30567]

> cache table not working for persisted view
> --
>
> Key: SPARK-33647
> URL: https://issues.apache.org/jira/browse/SPARK-33647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Assignee: Linhong Liu
>Priority: Major
> Fix For: 3.1.0
>
>
> In `CacheManager`, tables (including views) are cached by its logical plan, 
> and
> use `QueryPlan.sameResult` to lookup the cache. But the PersistedView wraps
> the child plan with a `View` which always lead false for `sameResult` check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33647) cache table not working for persisted view

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33647?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243758#comment-17243758
 ] 

Apache Spark commented on SPARK-33647:
--

User 'linhongliu-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/30567

> cache table not working for persisted view
> --
>
> Key: SPARK-33647
> URL: https://issues.apache.org/jira/browse/SPARK-33647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Linhong Liu
>Priority: Major
>
> In `CacheManager`, tables (including views) are cached by its logical plan, 
> and
> use `QueryPlan.sameResult` to lookup the cache. But the PersistedView wraps
> the child plan with a `View` which always lead false for `sameResult` check.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29625) Spark Structure Streaming Kafka Wrong Reset Offset twice

2020-12-03 Thread leesf (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29625?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243748#comment-17243748
 ] 

leesf commented on SPARK-29625:
---

hi,[~sanysand...@gmail.com] any updates here, how would you solve the error, 
thanks.

> Spark Structure Streaming Kafka Wrong Reset Offset twice
> 
>
> Key: SPARK-29625
> URL: https://issues.apache.org/jira/browse/SPARK-29625
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Sandish Kumar HN
>Priority: Major
>
> Spark Structure Streaming Kafka Reset Offset twice, once with right offsets 
> and second time with very old offsets 
> {code}
> [2019-10-28 19:27:40,013] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO Fetcher: [Consumer clientId=consumer-1, 
> groupId=spark-kafka-source-cfacf6b7-b0aa-443f-b01d-b17212087545--1376165614-driver-0]
>  Resetting offset for partition topic-151 to offset 0.
> [2019-10-28 19:27:40,013] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO Fetcher: [Consumer clientId=consumer-1, 
> groupId=spark-kafka-source-cfacf6b7-b0aa-443f-b01d-b17212087545--1376165614-driver-0]
>  Resetting offset for partition topic-118 to offset 0.
> [2019-10-28 19:27:40,013] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO Fetcher: [Consumer clientId=consumer-1, 
> groupId=spark-kafka-source-cfacf6b7-b0aa-443f-b01d-b17212087545--1376165614-driver-0]
>  Resetting offset for partition topic-85 to offset 0.
> [2019-10-28 19:27:40,013] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO Fetcher: [Consumer clientId=consumer-1, 
> groupId=spark-kafka-source-cfacf6b7-b0aa-443f-b01d-b17212087545--1376165614-driver-0]
>  Resetting offset for partition topic-52 to offset 122677634.
> [2019-10-28 19:27:40,013] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO Fetcher: [Consumer clientId=consumer-1, 
> groupId=spark-kafka-source-cfacf6b7-b0aa-443f-b01d-b17212087545--1376165614-driver-0]
>  Resetting offset for partition topic-19 to offset 0.
> [2019-10-28 19:27:40,013] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO Fetcher: [Consumer clientId=consumer-1, 
> groupId=spark-kafka-source-cfacf6b7-b0aa-443f-b01d-b17212087545--1376165614-driver-0]
>  Resetting offset for partition topic-52 to offset 120504922.*
> [2019-10-28 19:27:40,153] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> INFO ContextCleaner: Cleaned accumulator 810
> {code}
> which is causing a Data loss issue.  
> {code}
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO - 19/10/28 19:27:40 
> ERROR StreamExecution: Query [id = d62ca9e4-6650-454f-8691-a3d576d1e4ba, 
> runId = 3946389f-222b-495c-9ab2-832c0422cbbb] terminated with error
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO - 
> java.lang.IllegalStateException: Partition topic-52's offset was changed from 
> 122677598 to 120504922, some data may have been missed.
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO - Some data may have 
> been lost because they are not available in Kafka any more; either the
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  data was aged out 
> by Kafka or the topic may have been deleted before all the data in the
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  topic was 
> processed. If you don't want your streaming query to fail on such cases, set 
> the
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  source option 
> "failOnDataLoss" to "false".
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO - 
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> org.apache.spark.sql.kafka010.KafkaSource.org$apache$spark$sql$kafka010$KafkaSource$$reportDataLoss(KafkaSource.scala:329)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> org.apache.spark.sql.kafka010.KafkaSource$$anonfun$8.apply(KafkaSource.scala:283)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> org.apache.spark.sql.kafka010.KafkaSource$$anonfun$8.apply(KafkaSource.scala:281)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> scala.collection.TraversableLike$$anonfun$filterImpl$1.apply(TraversableLike.scala:248)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> scala.collection.TraversableLike$class.filterImpl(TraversableLike.scala:247)
> [2019-10-28 19:27:40,351] \{bash_operator.py:128} INFO -  at 
> scala.collection.TraversableLike$class.filter(TraversableLike.scala:259)
> [2019-10-28 19:27:

[jira] [Updated] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2020-12-03 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-24607:
-
Labels: bulk-closed correctness  (was: bulk-closed)

> Distribute by rand() can lead to data inconsistency
> ---
>
> Key: SPARK-24607
> URL: https://issues.apache.org/jira/browse/SPARK-24607
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.1
>Reporter: zenglinxi
>Priority: Major
>  Labels: bulk-closed, correctness
>
> Noticed the following queries can give different results:
> {code:java}
> select count(*) from tbl;
> select count(*) from (select * from tbl distribute by rand()) a;{code}
> this issue was first reported by someone using kylin for building cube with 
> hiveSQL which include  distribute by rand, data inconsistency may happen 
> during failure tolerance operations. Since spark has similar failure 
> tolerance mechanism, I think it's also an hidden serious problem in sparksql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2020-12-03 Thread Kent Yao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243743#comment-17243743
 ] 

Kent Yao edited comment on SPARK-24607 at 12/4/20, 6:17 AM:


This could happen when the map stage retries, the same record that in the map 
task probably targets to different reduce tasks among task attempts.

 This could result in an incomplete result set when introducing a 
non-deterministic expression, e.g. rand(), in the jobs that need shuffle, e.g. 
aggregates, sort-merge join.

We may need a random but replayable function to handle these use cases because 
it is a common way that users use to deal with data skewness.

Otherwise, we may forbid non-deterministic functions to be used shuffle related 
operations.

cc [~cloud_fan] [~ulysses] [~maropu]


was (Author: qin yao):
This could happen when the map stage retries, the same record that in the map 
task probably targets to different reduce tasks among task attempts.

 his could result in an incomplete result set when introducing a 
non-deterministic expression, e.g. rand(), in the jobs that need shuffle, e.g. 
aggregates, sort-merge join.

We may need a random but replayable function to handle these use cases because 
it is a common way that users use to deal with data skewness.

Otherwise, we may forbid non-deterministic functions to be used shuffle related 
operations.

cc [~cloud_fan] [~ulysses] [~maropu]

> Distribute by rand() can lead to data inconsistency
> ---
>
> Key: SPARK-24607
> URL: https://issues.apache.org/jira/browse/SPARK-24607
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.1
>Reporter: zenglinxi
>Priority: Major
>  Labels: bulk-closed
>
> Noticed the following queries can give different results:
> {code:java}
> select count(*) from tbl;
> select count(*) from (select * from tbl distribute by rand()) a;{code}
> this issue was first reported by someone using kylin for building cube with 
> hiveSQL which include  distribute by rand, data inconsistency may happen 
> during failure tolerance operations. Since spark has similar failure 
> tolerance mechanism, I think it's also an hidden serious problem in sparksql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24607) Distribute by rand() can lead to data inconsistency

2020-12-03 Thread Kent Yao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-24607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243743#comment-17243743
 ] 

Kent Yao commented on SPARK-24607:
--

This could happen when the map stage retries, the same record that in the map 
task probably targets to different reduce tasks among task attempts.

 his could result in an incomplete result set when introducing a 
non-deterministic expression, e.g. rand(), in the jobs that need shuffle, e.g. 
aggregates, sort-merge join.

We may need a random but replayable function to handle these use cases because 
it is a common way that users use to deal with data skewness.

Otherwise, we may forbid non-deterministic functions to be used shuffle related 
operations.

cc [~cloud_fan] [~ulysses] [~maropu]

> Distribute by rand() can lead to data inconsistency
> ---
>
> Key: SPARK-24607
> URL: https://issues.apache.org/jira/browse/SPARK-24607
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.2.0, 2.3.1
>Reporter: zenglinxi
>Priority: Major
>  Labels: bulk-closed
>
> Noticed the following queries can give different results:
> {code:java}
> select count(*) from tbl;
> select count(*) from (select * from tbl distribute by rand()) a;{code}
> this issue was first reported by someone using kylin for building cube with 
> hiveSQL which include  distribute by rand, data inconsistency may happen 
> during failure tolerance operations. Since spark has similar failure 
> tolerance mechanism, I think it's also an hidden serious problem in sparksql.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33658) Suggest using datetime conversion functions for invalid ANSI casting

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33658:


Assignee: Gengliang Wang  (was: Apache Spark)

> Suggest using datetime conversion functions for invalid ANSI casting
> 
>
> Key: SPARK-33658
> URL: https://issues.apache.org/jira/browse/SPARK-33658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In ANSI mode, explicit cast between DateTime types and Numeric types is not 
> allowed.
> As of now, we have introduced new functions 
> UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS/UNIX_DATE/DATE_FROM_UNIX_DATE, we can 
> show suggestions to users so that they can complete these type conversions 
> precisely and easily in ANSI mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33658) Suggest using datetime conversion functions for invalid ANSI casting

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33658?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243690#comment-17243690
 ] 

Apache Spark commented on SPARK-33658:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30603

> Suggest using datetime conversion functions for invalid ANSI casting
> 
>
> Key: SPARK-33658
> URL: https://issues.apache.org/jira/browse/SPARK-33658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> In ANSI mode, explicit cast between DateTime types and Numeric types is not 
> allowed.
> As of now, we have introduced new functions 
> UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS/UNIX_DATE/DATE_FROM_UNIX_DATE, we can 
> show suggestions to users so that they can complete these type conversions 
> precisely and easily in ANSI mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33658) Suggest using datetime conversion functions for invalid ANSI casting

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33658:


Assignee: Apache Spark  (was: Gengliang Wang)

> Suggest using datetime conversion functions for invalid ANSI casting
> 
>
> Key: SPARK-33658
> URL: https://issues.apache.org/jira/browse/SPARK-33658
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> In ANSI mode, explicit cast between DateTime types and Numeric types is not 
> allowed.
> As of now, we have introduced new functions 
> UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS/UNIX_DATE/DATE_FROM_UNIX_DATE, we can 
> show suggestions to users so that they can complete these type conversions 
> precisely and easily in ANSI mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33658) Suggest using datetime conversion functions for invalid ANSI casting

2020-12-03 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-33658:
--

 Summary: Suggest using datetime conversion functions for invalid 
ANSI casting
 Key: SPARK-33658
 URL: https://issues.apache.org/jira/browse/SPARK-33658
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


In ANSI mode, explicit cast between DateTime types and Numeric types is not 
allowed.
As of now, we have introduced new functions 
UNIX_SECONDS/UNIX_MILLIS/UNIX_MICROS/UNIX_DATE/DATE_FROM_UNIX_DATE, we can show 
suggestions to users so that they can complete these type conversions precisely 
and easily in ANSI mode.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33657) After spark sql is executed to generate hdfs data, the relevant status information is printed

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33657:


Assignee: (was: Apache Spark)

>  After spark sql is executed to generate hdfs data, the relevant status 
> information is printed
> --
>
> Key: SPARK-33657
> URL: https://issues.apache.org/jira/browse/SPARK-33657
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Priority: Major
>
> Spark sql executes the operation that needs to be executed after the data 
> generated by hdfs, which is very user-friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33657) After spark sql is executed to generate hdfs data, the relevant status information is printed

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33657:


Assignee: (was: Apache Spark)

>  After spark sql is executed to generate hdfs data, the relevant status 
> information is printed
> --
>
> Key: SPARK-33657
> URL: https://issues.apache.org/jira/browse/SPARK-33657
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Priority: Major
>
> Spark sql executes the operation that needs to be executed after the data 
> generated by hdfs, which is very user-friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33657) After spark sql is executed to generate hdfs data, the relevant status information is printed

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243687#comment-17243687
 ] 

Apache Spark commented on SPARK-33657:
--

User 'guixiaowen' has created a pull request for this issue:
https://github.com/apache/spark/pull/30602

>  After spark sql is executed to generate hdfs data, the relevant status 
> information is printed
> --
>
> Key: SPARK-33657
> URL: https://issues.apache.org/jira/browse/SPARK-33657
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Priority: Major
>
> Spark sql executes the operation that needs to be executed after the data 
> generated by hdfs, which is very user-friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33657) After spark sql is executed to generate hdfs data, the relevant status information is printed

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33657:


Assignee: Apache Spark

>  After spark sql is executed to generate hdfs data, the relevant status 
> information is printed
> --
>
> Key: SPARK-33657
> URL: https://issues.apache.org/jira/browse/SPARK-33657
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: guihuawen
>Assignee: Apache Spark
>Priority: Major
>
> Spark sql executes the operation that needs to be executed after the data 
> generated by hdfs, which is very user-friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243681#comment-17243681
 ] 

Liu Neng edited comment on SPARK-33632 at 12/4/20, 3:46 AM:


This is not an issue, you may misunderstand the docs.

You should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!


was (Author: qwe1398775315):
you should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liu Neng updated SPARK-33632:
-
Attachment: image-2020-12-04-11-45-10-379.png

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33632) to_date doesn't behave as documented

2020-12-03 Thread Liu Neng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243681#comment-17243681
 ] 

Liu Neng commented on SPARK-33632:
--

you should use pattern m/d/yy, parse mode is determined by count of letter 'y'.

below is source code from DateTimeFormatterBuilder.

!image-2020-12-04-11-45-10-379.png!

> to_date doesn't behave as documented
> 
>
> Key: SPARK-33632
> URL: https://issues.apache.org/jira/browse/SPARK-33632
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.1
>Reporter: Frank Oosterhuis
>Priority: Major
> Attachments: image-2020-12-04-11-45-10-379.png
>
>
> I'm trying to use to_date on a string formatted as "10/31/20".
> Expected output is "2020-10-31".
> Actual output is "0020-01-31".
> The 
> [documentation|https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html]
>  suggests 2020 or 20 as input for "y".
> Example below. Expected behaviour is included in the udf.
> {code:scala}
> import java.sql.Date
> import org.apache.spark.sql.SparkSession
> import org.apache.spark.sql.functions.{to_date, udf}
> object ToDate {
>   val toDate = udf((date: String) => {
> val split = date.split("/")
> val month = "%02d".format(split(0).toInt)
> val day = "%02d".format(split(1).toInt)
> val year = split(2).toInt + 2000
> Date.valueOf(s"${year}-${month}-${day}")
>   })
>   def main(args: Array[String]): Unit = {
> val spark = SparkSession.builder().master("local[2]").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> import spark.implicits._
> Seq("1/1/20", "10/31/20")
>   .toDF("raw")
>   .withColumn("to_date", to_date($"raw", "m/d/y"))
>   .withColumn("udf", toDate($"raw"))
>   .show
>   }
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33657) After spark sql is executed to generate hdfs data, the relevant status information is printed

2020-12-03 Thread guihuawen (Jira)
guihuawen created SPARK-33657:
-

 Summary:  After spark sql is executed to generate hdfs data, the 
relevant status information is printed
 Key: SPARK-33657
 URL: https://issues.apache.org/jira/browse/SPARK-33657
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.0
Reporter: guihuawen


Spark sql executes the operation that needs to be executed after the data 
generated by hdfs, which is very user-friendly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33649) Improve the doc of spark.sql.ansi.enabled

2020-12-03 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-33649.

Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30593
[https://github.com/apache/spark/pull/30593]

> Improve the doc of spark.sql.ansi.enabled
> -
>
> Key: SPARK-33649
> URL: https://issues.apache.org/jira/browse/SPARK-33649
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> As there are more and more new features under the SQL configuration 
> spark.sql.ansi.enabled, we should make it more clear about:
> 1. what exactly it is
> 2. where user can find all the features of the ANSI mode
> 3. whether all the feature exactly from the SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33656) Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33656:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Add option to keep container after tests finish for 
> DockerJDBCIntegrationSuites for debug
> -
>
> Key: SPARK-33656
> URL: https://issues.apache.org/jira/browse/SPARK-33656
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, 
> PostgresIntegrationSuite) launch a docker container which is removed after 
> tests finish.
> If we have an option to keep the container, it would be useful for debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33656) Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33656:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Add option to keep container after tests finish for 
> DockerJDBCIntegrationSuites for debug
> -
>
> Key: SPARK-33656
> URL: https://issues.apache.org/jira/browse/SPARK-33656
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Minor
>
> DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, 
> PostgresIntegrationSuite) launch a docker container which is removed after 
> tests finish.
> If we have an option to keep the container, it would be useful for debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33656) Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243665#comment-17243665
 ] 

Apache Spark commented on SPARK-33656:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/30601

> Add option to keep container after tests finish for 
> DockerJDBCIntegrationSuites for debug
> -
>
> Key: SPARK-33656
> URL: https://issues.apache.org/jira/browse/SPARK-33656
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
>
> DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, 
> PostgresIntegrationSuite) launch a docker container which is removed after 
> tests finish.
> If we have an option to keep the container, it would be useful for debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33656) Add option to keep container after tests finish for DockerJDBCIntegrationSuites for debug

2020-12-03 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-33656:
--

 Summary: Add option to keep container after tests finish for 
DockerJDBCIntegrationSuites for debug
 Key: SPARK-33656
 URL: https://issues.apache.org/jira/browse/SPARK-33656
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.1.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


DockerJDBCIntegrationSuites (e.g. DB2IntegrationSuite, 
PostgresIntegrationSuite) launch a docker container which is removed after 
tests finish.
If we have an option to keep the container, it would be useful for debug.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33655) Thrift server : FETCH_PRIOR does not cause to reiterate from start position.

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33655:


Assignee: Apache Spark

> Thrift server : FETCH_PRIOR does not cause to reiterate from start position. 
> -
>
> Key: SPARK-33655
> URL: https://issues.apache.org/jira/browse/SPARK-33655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dooyoung Hwang
>Assignee: Apache Spark
>Priority: Major
>
> Currently, when a client requests FETCH_PRIOR to thrift server, thrift server 
> reiterates from start position. Because thrift server caches a query result 
> with an array, FETCH_PRIOR can be implemented without reiterating the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33655) Thrift server : FETCH_PRIOR does not cause to reiterate from start position.

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33655:


Assignee: (was: Apache Spark)

> Thrift server : FETCH_PRIOR does not cause to reiterate from start position. 
> -
>
> Key: SPARK-33655
> URL: https://issues.apache.org/jira/browse/SPARK-33655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dooyoung Hwang
>Priority: Major
>
> Currently, when a client requests FETCH_PRIOR to thrift server, thrift server 
> reiterates from start position. Because thrift server caches a query result 
> with an array, FETCH_PRIOR can be implemented without reiterating the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33655) Thrift server : FETCH_PRIOR does not cause to reiterate from start position.

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243656#comment-17243656
 ] 

Apache Spark commented on SPARK-33655:
--

User 'Dooyoung-Hwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30600

> Thrift server : FETCH_PRIOR does not cause to reiterate from start position. 
> -
>
> Key: SPARK-33655
> URL: https://issues.apache.org/jira/browse/SPARK-33655
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Dooyoung Hwang
>Priority: Major
>
> Currently, when a client requests FETCH_PRIOR to thrift server, thrift server 
> reiterates from start position. Because thrift server caches a query result 
> with an array, FETCH_PRIOR can be implemented without reiterating the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32405) Apply table options while creating tables in JDBC Table Catalog

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243654#comment-17243654
 ] 

Apache Spark commented on SPARK-32405:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30599

> Apply table options while creating tables in JDBC Table Catalog
> ---
>
> Key: SPARK-32405
> URL: https://issues.apache.org/jira/browse/SPARK-32405
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> We need to add an API to `JdbcDialect` to generate the SQL statement to 
> specify table options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32405) Apply table options while creating tables in JDBC Table Catalog

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32405?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243653#comment-17243653
 ] 

Apache Spark commented on SPARK-32405:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30599

> Apply table options while creating tables in JDBC Table Catalog
> ---
>
> Key: SPARK-32405
> URL: https://issues.apache.org/jira/browse/SPARK-32405
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> We need to add an API to `JdbcDialect` to generate the SQL statement to 
> specify table options.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33654) Migrate CACHE TABLE to new resolution framework

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33654?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243637#comment-17243637
 ] 

Apache Spark commented on SPARK-33654:
--

User 'imback82' has created a pull request for this issue:
https://github.com/apache/spark/pull/30598

> Migrate CACHE TABLE to new resolution framework
> ---
>
> Key: SPARK-33654
> URL: https://issues.apache.org/jira/browse/SPARK-33654
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate CACHE TABLE to new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33654) Migrate CACHE TABLE to new resolution framework

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33654:


Assignee: (was: Apache Spark)

> Migrate CACHE TABLE to new resolution framework
> ---
>
> Key: SPARK-33654
> URL: https://issues.apache.org/jira/browse/SPARK-33654
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Priority: Minor
>
> Migrate CACHE TABLE to new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33654) Migrate CACHE TABLE to new resolution framework

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33654?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33654:


Assignee: Apache Spark

> Migrate CACHE TABLE to new resolution framework
> ---
>
> Key: SPARK-33654
> URL: https://issues.apache.org/jira/browse/SPARK-33654
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Terry Kim
>Assignee: Apache Spark
>Priority: Minor
>
> Migrate CACHE TABLE to new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33655) Thrift server : FETCH_PRIOR does not cause to reiterate from start position.

2020-12-03 Thread Dooyoung Hwang (Jira)
Dooyoung Hwang created SPARK-33655:
--

 Summary: Thrift server : FETCH_PRIOR does not cause to reiterate 
from start position. 
 Key: SPARK-33655
 URL: https://issues.apache.org/jira/browse/SPARK-33655
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Dooyoung Hwang


Currently, when a client requests FETCH_PRIOR to thrift server, thrift server 
reiterates from start position. Because thrift server caches a query result 
with an array, FETCH_PRIOR can be implemented without reiterating the result.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33654) Migrate CACHE TABLE to new resolution framework

2020-12-03 Thread Terry Kim (Jira)
Terry Kim created SPARK-33654:
-

 Summary: Migrate CACHE TABLE to new resolution framework
 Key: SPARK-33654
 URL: https://issues.apache.org/jira/browse/SPARK-33654
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Terry Kim


Migrate CACHE TABLE to new resolution framework



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33653) DSv2: REFRESH TABLE should recache the table itself

2020-12-03 Thread Chao Sun (Jira)
Chao Sun created SPARK-33653:


 Summary: DSv2: REFRESH TABLE should recache the table itself
 Key: SPARK-33653
 URL: https://issues.apache.org/jira/browse/SPARK-33653
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Chao Sun


As "CACHE TABLE" is supported in DSv2 now, we should also recache the table 
itself in "REFRESH TABLE" command, to match the behavior in DSv1.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33652) DSv2: DeleteFrom should refresh cache

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243624#comment-17243624
 ] 

Apache Spark commented on SPARK-33652:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30597

> DSv2: DeleteFrom should refresh cache
> -
>
> Key: SPARK-33652
> URL: https://issues.apache.org/jira/browse/SPARK-33652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently DeleteFrom in DSv2 doesn't refresh cache, which could lead to 
> correctness issue if the cache becomes stale and queried after.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33652) DSv2: DeleteFrom should refresh cache

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33652:


Assignee: (was: Apache Spark)

> DSv2: DeleteFrom should refresh cache
> -
>
> Key: SPARK-33652
> URL: https://issues.apache.org/jira/browse/SPARK-33652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently DeleteFrom in DSv2 doesn't refresh cache, which could lead to 
> correctness issue if the cache becomes stale and queried after.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33652) DSv2: DeleteFrom should refresh cache

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33652:


Assignee: Apache Spark

> DSv2: DeleteFrom should refresh cache
> -
>
> Key: SPARK-33652
> URL: https://issues.apache.org/jira/browse/SPARK-33652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Assignee: Apache Spark
>Priority: Major
>
> Currently DeleteFrom in DSv2 doesn't refresh cache, which could lead to 
> correctness issue if the cache becomes stale and queried after.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33652) DSv2: DeleteFrom should refresh cache

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243623#comment-17243623
 ] 

Apache Spark commented on SPARK-33652:
--

User 'sunchao' has created a pull request for this issue:
https://github.com/apache/spark/pull/30597

> DSv2: DeleteFrom should refresh cache
> -
>
> Key: SPARK-33652
> URL: https://issues.apache.org/jira/browse/SPARK-33652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently DeleteFrom in DSv2 doesn't refresh cache, which could lead to 
> correctness issue if the cache becomes stale and queried after.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33652) DSv2: DeleteFrom should refresh cache

2020-12-03 Thread Chao Sun (Jira)
Chao Sun created SPARK-33652:


 Summary: DSv2: DeleteFrom should refresh cache
 Key: SPARK-33652
 URL: https://issues.apache.org/jira/browse/SPARK-33652
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Chao Sun


Currently DeleteFrom in DSv2 doesn't refresh cache, which could lead to 
correctness issue if the cache becomes stale and queried after.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33652) DSv2: DeleteFrom should refresh cache

2020-12-03 Thread Chao Sun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chao Sun updated SPARK-33652:
-
Parent: SPARK-33507
Issue Type: Sub-task  (was: Improvement)

> DSv2: DeleteFrom should refresh cache
> -
>
> Key: SPARK-33652
> URL: https://issues.apache.org/jira/browse/SPARK-33652
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Chao Sun
>Priority: Major
>
> Currently DeleteFrom in DSv2 doesn't refresh cache, which could lead to 
> correctness issue if the cache becomes stale and queried after.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32968) Column pruning for CsvToStructs

2020-12-03 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243610#comment-17243610
 ] 

L. C. Hsieh commented on SPARK-32968:
-

You could try to help on the tickets without assignee. Thanks.

> Column pruning for CsvToStructs
> ---
>
> Key: SPARK-32968
> URL: https://issues.apache.org/jira/browse/SPARK-32968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We could do column pruning for CsvToStructs expression if we only require 
> some fields from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32968) Column pruning for CsvToStructs

2020-12-03 Thread L. C. Hsieh (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243608#comment-17243608
 ] 

L. C. Hsieh commented on SPARK-32968:
-

Sorry but I am working on it.

> Column pruning for CsvToStructs
> ---
>
> Key: SPARK-32968
> URL: https://issues.apache.org/jira/browse/SPARK-32968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We could do column pruning for CsvToStructs expression if we only require 
> some fields from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33650:
-

Assignee: Maxim Gekk

> Misleading error from ALTER TABLE .. PARTITION for non-supported partition 
> management table
> ---
>
> Key: SPARK-33650
> URL: https://issues.apache.org/jira/browse/SPARK-33650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
>
> For a V2 table that doesn't support partition management, ALTER TABLE .. 
> ADD/DROP PARTITION throws misleading exception:
> {code:java}
> PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
> org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
> {code}
> The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33650.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30594
[https://github.com/apache/spark/pull/30594]

> Misleading error from ALTER TABLE .. PARTITION for non-supported partition 
> management table
> ---
>
> Key: SPARK-33650
> URL: https://issues.apache.org/jira/browse/SPARK-33650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 3.1.0
>
>
> For a V2 table that doesn't support partition management, ALTER TABLE .. 
> ADD/DROP PARTITION throws misleading exception:
> {code:java}
> PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
> org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
> {code}
> The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33520) make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/evaluator

2020-12-03 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu resolved SPARK-33520.

Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30471
[https://github.com/apache/spark/pull/30471]

> make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python 
> backend estimator/evaluator
> -
>
> Key: SPARK-33520
> URL: https://issues.apache.org/jira/browse/SPARK-33520
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
> Fix For: 3.1.0
>
>
> Currently, pyspark support third-party library to define python backend 
> estimator/evaluator, i.e., estimator that inherit `Estimator` instead of 
> `JavaEstimator`, and only can be used in pyspark.
> CrossValidator and TrainValidateSplit support tuning these python backend 
> estimator,
> but cannot support saving/load, becase CrossValidator and TrainValidateSplit 
> writer implementation is use JavaMLWriter, which require to convert nested 
> estimator and evaluator into java instance.
> OneVsRest saving/load now only support java backend classifier due to similar 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33520) make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python backend estimator/evaluator

2020-12-03 Thread Weichen Xu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weichen Xu reassigned SPARK-33520:
--

Assignee: Weichen Xu

> make CrossValidator/TrainValidateSplit/OneVsRest Reader/Writer support Python 
> backend estimator/evaluator
> -
>
> Key: SPARK-33520
> URL: https://issues.apache.org/jira/browse/SPARK-33520
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>
> Currently, pyspark support third-party library to define python backend 
> estimator/evaluator, i.e., estimator that inherit `Estimator` instead of 
> `JavaEstimator`, and only can be used in pyspark.
> CrossValidator and TrainValidateSplit support tuning these python backend 
> estimator,
> but cannot support saving/load, becase CrossValidator and TrainValidateSplit 
> writer implementation is use JavaMLWriter, which require to convert nested 
> estimator and evaluator into java instance.
> OneVsRest saving/load now only support java backend classifier due to similar 
> issue.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-32968) Column pruning for CsvToStructs

2020-12-03 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243599#comment-17243599
 ] 

Yesheng Ma edited comment on SPARK-32968 at 12/4/20, 12:07 AM:
---

Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 
and I can help out if necessary.


was (Author: manifoldqaq):
Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 
and I can take a look.

> Column pruning for CsvToStructs
> ---
>
> Key: SPARK-32968
> URL: https://issues.apache.org/jira/browse/SPARK-32968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We could do column pruning for CsvToStructs expression if we only require 
> some fields from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32968) Column pruning for CsvToStructs

2020-12-03 Thread Yesheng Ma (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243599#comment-17243599
 ] 

Yesheng Ma commented on SPARK-32968:


Looks like it is similar to https://issues.apache.org/jira/browse/SPARK-32958 
and I can take a look.

> Column pruning for CsvToStructs
> ---
>
> Key: SPARK-32968
> URL: https://issues.apache.org/jira/browse/SPARK-32968
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
>
> We could do column pruning for CsvToStructs expression if we only require 
> some fields from it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33295) Upgrade ORC to 1.6.6

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33295:
--
Target Version/s: 3.2.0

>  Upgrade ORC to 1.6.6
> -
>
> Key: SPARK-33295
> URL: https://issues.apache.org/jira/browse/SPARK-33295
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jaanai Zhang
>Assignee: Dongjoon Hyun
>Priority: Major
>
> support zstd compression algorithm for ORC format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33295) Upgrade ORC to 1.6.6

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-33295:
--
Affects Version/s: (was: 3.1.0)
   3.2.0

>  Upgrade ORC to 1.6.6
> -
>
> Key: SPARK-33295
> URL: https://issues.apache.org/jira/browse/SPARK-33295
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jaanai Zhang
>Priority: Major
>
> support zstd compression algorithm for ORC format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33295) Upgrade ORC to 1.6.6

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33295:
-

Assignee: Dongjoon Hyun

>  Upgrade ORC to 1.6.6
> -
>
> Key: SPARK-33295
> URL: https://issues.apache.org/jira/browse/SPARK-33295
> Project: Spark
>  Issue Type: New Feature
>  Components: Build
>Affects Versions: 3.2.0
>Reporter: Jaanai Zhang
>Assignee: Dongjoon Hyun
>Priority: Major
>
> support zstd compression algorithm for ORC format



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-03 Thread Maxim Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243441#comment-17243441
 ] 

Maxim Gekk commented on SPARK-33571:


I opened the PR [https://github.com/apache/spark/pull/30596] with some 
improvements for config docs. [~hyukjin.kwon] [~cloud_fan] could you review it, 
please.

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33571:


Assignee: (was: Apache Spark)

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33571:


Assignee: Apache Spark

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Assignee: Apache Spark
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33571) Handling of hybrid to proleptic calendar when reading and writing Parquet data not working correctly

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243440#comment-17243440
 ] 

Apache Spark commented on SPARK-33571:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30596

> Handling of hybrid to proleptic calendar when reading and writing Parquet 
> data not working correctly
> 
>
> Key: SPARK-33571
> URL: https://issues.apache.org/jira/browse/SPARK-33571
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.0.0, 3.0.1
>Reporter: Simon
>Priority: Major
>
> The handling of old dates written with older Spark versions (<2.4.6) using 
> the hybrid calendar in Spark 3.0.0 and 3.0.1 seems to be broken/not working 
> correctly.
> From what I understand it should work like this:
>  * Only relevant for `DateType` before 1582-10-15 or `TimestampType` before 
> 1900-01-01T00:00:00Z
>  * Only applies when reading or writing parquet files
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInRead`
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `LEGACY` the dates and timestamps should 
> show the same values in Spark 3.0.1. with for example `df.show()` as they did 
> in Spark 2.4.5
>  * When reading parquet files written with Spark < 2.4.6 which contain dates 
> or timestamps before the above mentioned moments in time and 
> `datetimeRebaseModeInRead` is set to `CORRECTED` the dates and timestamps 
> should show different values in Spark 3.0.1. with for example `df.show()` as 
> they did in Spark 2.4.5
>  * When writing parqet files with Spark > 3.0.0 which contain dates or 
> timestamps before the above mentioned moment in time a 
> `SparkUpgradeException` should be raised informing the user to choose either 
> `LEGACY` or `CORRECTED` for the `datetimeRebaseModeInWrite`
> First of all I'm not 100% sure all of this is correct. I've been unable to 
> find any clear documentation on the expected behavior. The understanding I 
> have was pieced together from the mailing list 
> ([http://apache-spark-user-list.1001560.n3.nabble.com/Spark-3-0-1-new-Proleptic-Gregorian-calendar-td38914.html)]
>  the blog post linked there and looking at the Spark code.
> From our testing we're seeing several issues:
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` which contain timestamps before 
> the above mentioned moments in time without `datetimeRebaseModeInRead` set 
> doesn't raise the `SparkUpgradeException`, it succeeds without any changes to 
> the resulting dataframe compares to that dataframe in Spark 2.4.5
>  * Reading parquet data with Spark 3.0.1 that was written with Spark 2.4.5. 
> that contains fields of type `TimestampType` or `DateType` which contain 
> dates or timestamps before the above mentioned moments in time with 
> `datetimeRebaseModeInRead` set to `LEGACY` results in the same values in the 
> dataframe as when using `CORRECTED`, so it seems like no rebasing is 
> happening.
> I've made some scripts to help with testing/show the behavior, it uses 
> pyspark 2.4.5, 2.4.6 and 3.0.1. You can find them here 
> [https://github.com/simonvanderveldt/spark3-rebasemode-issue]. I'll post the 
> outputs in a comment below as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33651) allow CREATE EXTERNAL TABLE with LOCATION for data source tables

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243413#comment-17243413
 ] 

Apache Spark commented on SPARK-33651:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/30595

> allow CREATE EXTERNAL TABLE with LOCATION for data source tables
> 
>
> Key: SPARK-33651
> URL: https://issues.apache.org/jira/browse/SPARK-33651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33651) allow CREATE EXTERNAL TABLE with LOCATION for data source tables

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33651:


Assignee: Apache Spark  (was: Wenchen Fan)

> allow CREATE EXTERNAL TABLE with LOCATION for data source tables
> 
>
> Key: SPARK-33651
> URL: https://issues.apache.org/jira/browse/SPARK-33651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33651) allow CREATE EXTERNAL TABLE with LOCATION for data source tables

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33651:


Assignee: Wenchen Fan  (was: Apache Spark)

> allow CREATE EXTERNAL TABLE with LOCATION for data source tables
> 
>
> Key: SPARK-33651
> URL: https://issues.apache.org/jira/browse/SPARK-33651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33651) allow CREATE EXTERNAL TABLE with LOCATION for data source tables

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-33651:

Summary: allow CREATE EXTERNAL TABLE with LOCATION for data source tables  
(was: allow CREATE EXTERNAL TABLE without LOCATION for data source tables)

> allow CREATE EXTERNAL TABLE with LOCATION for data source tables
> 
>
> Key: SPARK-33651
> URL: https://issues.apache.org/jira/browse/SPARK-33651
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33651) allow CREATE EXTERNAL TABLE without LOCATION for data source tables

2020-12-03 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-33651:
---

 Summary: allow CREATE EXTERNAL TABLE without LOCATION for data 
source tables
 Key: SPARK-33651
 URL: https://issues.apache.org/jira/browse/SPARK-33651
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Wenchen Fan
Assignee: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33634) use Analyzer in PlanResolutionSuite

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33634.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30574
[https://github.com/apache/spark/pull/30574]

> use Analyzer in PlanResolutionSuite
> ---
>
> Key: SPARK-33634
> URL: https://issues.apache.org/jira/browse/SPARK-33634
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33623) Add canDeleteWhere to SupportsDelete

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-33623.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 30562
[https://github.com/apache/spark/pull/30562]

> Add canDeleteWhere to SupportsDelete
> 
>
> Key: SPARK-33623
> URL: https://issues.apache.org/jira/browse/SPARK-33623
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
> Fix For: 3.1.0
>
>
> The only way to support delete statements right now is to implement 
> \{{SupportsDelete}}. According to its Javadoc, that interface is meant for 
> cases when we can delete data without much effort (e.g. like deleting a 
> complete partition in a Hive table). It is clear we need a more sophisticated 
> API for row-level deletes. That's why it would be beneficial to add a method 
> to \{{SupportsDelete}} so that Spark can check if a source can easily delete 
> data with just having filters or it will need a full rewrite later on. This 
> way, we have more control in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33623) Add canDeleteWhere to SupportsDelete

2020-12-03 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-33623:
-

Assignee: Anton Okolnychyi

> Add canDeleteWhere to SupportsDelete
> 
>
> Key: SPARK-33623
> URL: https://issues.apache.org/jira/browse/SPARK-33623
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Anton Okolnychyi
>Assignee: Anton Okolnychyi
>Priority: Major
>
> The only way to support delete statements right now is to implement 
> \{{SupportsDelete}}. According to its Javadoc, that interface is meant for 
> cases when we can delete data without much effort (e.g. like deleting a 
> complete partition in a Hive table). It is clear we need a more sophisticated 
> API for row-level deletes. That's why it would be beneficial to add a method 
> to \{{SupportsDelete}} so that Spark can check if a source can easily delete 
> data with just having filters or it will need a full rewrite later on. This 
> way, we have more control in the future.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33650:


Assignee: Apache Spark

> Misleading error from ALTER TABLE .. PARTITION for non-supported partition 
> management table
> ---
>
> Key: SPARK-33650
> URL: https://issues.apache.org/jira/browse/SPARK-33650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Assignee: Apache Spark
>Priority: Major
>
> For a V2 table that doesn't support partition management, ALTER TABLE .. 
> ADD/DROP PARTITION throws misleading exception:
> {code:java}
> PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
> org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
> {code}
> The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33650:


Assignee: (was: Apache Spark)

> Misleading error from ALTER TABLE .. PARTITION for non-supported partition 
> management table
> ---
>
> Key: SPARK-33650
> URL: https://issues.apache.org/jira/browse/SPARK-33650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For a V2 table that doesn't support partition management, ALTER TABLE .. 
> ADD/DROP PARTITION throws misleading exception:
> {code:java}
> PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
> org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
> {code}
> The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243372#comment-17243372
 ] 

Apache Spark commented on SPARK-33650:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/30594

> Misleading error from ALTER TABLE .. PARTITION for non-supported partition 
> management table
> ---
>
> Key: SPARK-33650
> URL: https://issues.apache.org/jira/browse/SPARK-33650
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Maxim Gekk
>Priority: Major
>
> For a V2 table that doesn't support partition management, ALTER TABLE .. 
> ADD/DROP PARTITION throws misleading exception:
> {code:java}
> PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
> org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
> 'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
> +- ResolvedTable 
> org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, ns1.ns2.tbl, 
> org.apache.spark.sql.connector.InMemoryTable@5d3ff859
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
>   at 
> org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
> {code}
> The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33650) Misleading error from ALTER TABLE .. PARTITION for non-supported partition management table

2020-12-03 Thread Maxim Gekk (Jira)
Maxim Gekk created SPARK-33650:
--

 Summary: Misleading error from ALTER TABLE .. PARTITION for 
non-supported partition management table
 Key: SPARK-33650
 URL: https://issues.apache.org/jira/browse/SPARK-33650
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.1.0
Reporter: Maxim Gekk


For a V2 table that doesn't support partition management, ALTER TABLE .. 
ADD/DROP PARTITION throws misleading exception:
{code:java}
PartitionSpecs are not resolved;;
'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
+- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, 
ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable@5d3ff859

org.apache.spark.sql.AnalysisException: PartitionSpecs are not resolved;;
'AlterTableAddPartition [UnresolvedPartitionSpec(Map(id -> 1),None)], false
+- ResolvedTable org.apache.spark.sql.connector.InMemoryTableCatalog@2fd64b11, 
ns1.ns2.tbl, org.apache.spark.sql.connector.InMemoryTable@5d3ff859

at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis(CheckAnalysis.scala:50)
at 
org.apache.spark.sql.catalyst.analysis.CheckAnalysis.failAnalysis$(CheckAnalysis.scala:49)
{code}

The error should say that the table doesn't support partition management.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-33629) spark.buffer.size not applied in driver from pyspark

2020-12-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-33629:
-
Fix Version/s: (was: 3.2.0)
   3.1.0

> spark.buffer.size not applied in driver from pyspark
> 
>
> Key: SPARK-33629
> URL: https://issues.apache.org/jira/browse/SPARK-33629
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.2, 3.1.0
>
>
> The problem has been discovered here: 
> [https://github.com/apache/spark/pull/30389#issuecomment-729524618]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33629) spark.buffer.size not applied in driver from pyspark

2020-12-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-33629:


Assignee: Gabor Somogyi

> spark.buffer.size not applied in driver from pyspark
> 
>
> Key: SPARK-33629
> URL: https://issues.apache.org/jira/browse/SPARK-33629
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
>
> The problem has been discovered here: 
> [https://github.com/apache/spark/pull/30389#issuecomment-729524618]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33629) spark.buffer.size not applied in driver from pyspark

2020-12-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-33629.
--
Fix Version/s: 3.2.0
   3.0.2
   Resolution: Fixed

Issue resolved by pull request 30592
[https://github.com/apache/spark/pull/30592]

> spark.buffer.size not applied in driver from pyspark
> 
>
> Key: SPARK-33629
> URL: https://issues.apache.org/jira/browse/SPARK-33629
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Gabor Somogyi
>Assignee: Gabor Somogyi
>Priority: Major
> Fix For: 3.0.2, 3.2.0
>
>
> The problem has been discovered here: 
> [https://github.com/apache/spark/pull/30389#issuecomment-729524618]
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27733) Upgrade to Avro 1.10.1

2020-12-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ismaël Mejía updated SPARK-27733:
-
Summary: Upgrade to Avro 1.10.1  (was: Upgrade to Avro 1.10.0)

> Upgrade to Avro 1.10.1
> --
>
> Key: SPARK-27733
> URL: https://issues.apache.org/jira/browse/SPARK-27733
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.1.0
>Reporter: Ismaël Mejía
>Priority: Major
>
> Avro 1.9.2 was released with many nice features including reduced size (1MB 
> less), and removed dependencies, no paranamer, no shaded guava, security 
> updates, so probably a worth upgrade.
> Avro 1.10.0 was released and this is still not done.
> There is at the moment (2020/08) still a blocker because of Hive related 
> transitive dependencies bringing older versions of Avro, so we could say that 
> this is somehow still blocked until HIVE-21737 is solved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33649) Improve the doc of spark.sql.ansi.enabled

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243280#comment-17243280
 ] 

Apache Spark commented on SPARK-33649:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30593

> Improve the doc of spark.sql.ansi.enabled
> -
>
> Key: SPARK-33649
> URL: https://issues.apache.org/jira/browse/SPARK-33649
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> As there are more and more new features under the SQL configuration 
> spark.sql.ansi.enabled, we should make it more clear about:
> 1. what exactly it is
> 2. where user can find all the features of the ANSI mode
> 3. whether all the feature exactly from the SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33649) Improve the doc of spark.sql.ansi.enabled

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243282#comment-17243282
 ] 

Apache Spark commented on SPARK-33649:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30593

> Improve the doc of spark.sql.ansi.enabled
> -
>
> Key: SPARK-33649
> URL: https://issues.apache.org/jira/browse/SPARK-33649
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> As there are more and more new features under the SQL configuration 
> spark.sql.ansi.enabled, we should make it more clear about:
> 1. what exactly it is
> 2. where user can find all the features of the ANSI mode
> 3. whether all the feature exactly from the SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33649) Improve the doc of spark.sql.ansi.enabled

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33649:


Assignee: Apache Spark  (was: Gengliang Wang)

> Improve the doc of spark.sql.ansi.enabled
> -
>
> Key: SPARK-33649
> URL: https://issues.apache.org/jira/browse/SPARK-33649
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> As there are more and more new features under the SQL configuration 
> spark.sql.ansi.enabled, we should make it more clear about:
> 1. what exactly it is
> 2. where user can find all the features of the ANSI mode
> 3. whether all the feature exactly from the SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33649) Improve the doc of spark.sql.ansi.enabled

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33649:


Assignee: Gengliang Wang  (was: Apache Spark)

> Improve the doc of spark.sql.ansi.enabled
> -
>
> Key: SPARK-33649
> URL: https://issues.apache.org/jira/browse/SPARK-33649
> Project: Spark
>  Issue Type: New Feature
>  Components: Documentation, SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> As there are more and more new features under the SQL configuration 
> spark.sql.ansi.enabled, we should make it more clear about:
> 1. what exactly it is
> 2. where user can find all the features of the ANSI mode
> 3. whether all the feature exactly from the SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-33649) Improve the doc of spark.sql.ansi.enabled

2020-12-03 Thread Gengliang Wang (Jira)
Gengliang Wang created SPARK-33649:
--

 Summary: Improve the doc of spark.sql.ansi.enabled
 Key: SPARK-33649
 URL: https://issues.apache.org/jira/browse/SPARK-33649
 Project: Spark
  Issue Type: New Feature
  Components: Documentation, SQL
Affects Versions: 3.1.0
Reporter: Gengliang Wang
Assignee: Gengliang Wang


As there are more and more new features under the SQL configuration 
spark.sql.ansi.enabled, we should make it more clear about:
1. what exactly it is
2. where user can find all the features of the ANSI mode
3. whether all the feature exactly from the SQL standard



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30098) Add a configuration to use default datasource as provider for CREATE TABLE command

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30098:

Description: 
Changing the default provider from `hive` to the value of 
`spark.sql.sources.default` for "CREATE TABLE" command to make it be consistent 
with DataFrameWriter.saveAsTable API, w.r.t. to the new cofig. (by default we 
don't change the table provider)

Also, it brings more friendly to end users since Spark is well know of using 
parquet(default value of `spark.sql.sources.default`) as its default I/O format.

  was:
Changing the default provider from `hive` to the value of 
`spark.sql.sources.default` for "CREATE TABLE" command to make it be consistent 
with DataFrameWriter.saveAsTable API.

Also, it brings more friendly to end users since Spark is well know of using 
parquet(default value of `spark.sql.sources.default`) as its default I/O format.


> Add a configuration to use default datasource as provider for CREATE TABLE 
> command
> --
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" command to make it be 
> consistent with DataFrameWriter.saveAsTable API, w.r.t. to the new cofig. (by 
> default we don't change the table provider)
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-30098) Add a configuration to use default datasource as provider for CREATE TABLE command

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-30098:

Summary: Add a configuration to use default datasource as provider for 
CREATE TABLE command  (was: Use default datasource as provider for CREATE TABLE 
command)

> Add a configuration to use default datasource as provider for CREATE TABLE 
> command
> --
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" command to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30098) Use default datasource as provider for CREATE TABLE command

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-30098.
-
Fix Version/s: (was: 3.0.0)
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 30554
[https://github.com/apache/spark/pull/30554]

> Use default datasource as provider for CREATE TABLE command
> ---
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.1.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" command to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30098) Use default datasource as provider for CREATE TABLE command

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-30098:
---

Assignee: Wenchen Fan

> Use default datasource as provider for CREATE TABLE command
> ---
>
> Key: SPARK-30098
> URL: https://issues.apache.org/jira/browse/SPARK-30098
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Wenchen Fan
>Priority: Major
> Fix For: 3.0.0
>
>
> Changing the default provider from `hive` to the value of 
> `spark.sql.sources.default` for "CREATE TABLE" command to make it be 
> consistent with DataFrameWriter.saveAsTable API.
> Also, it brings more friendly to end users since Spark is well know of using 
> parquet(default value of `spark.sql.sources.default`) as its default I/O 
> format.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-33646) Add new function DATE_FROM_UNIX_DATE and UNIX_DATE

2020-12-03 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-33646.
-
Fix Version/s: 3.1.0
   Resolution: Fixed

> Add new function DATE_FROM_UNIX_DATE and UNIX_DATE
> --
>
> Key: SPARK-33646
> URL: https://issues.apache.org/jira/browse/SPARK-33646
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.1.0
>
>
> h2. What changes were proposed in this pull request?
> Add new functions DATE_FROM_UNIX_DATE and UNIX_DATE for conversion between 
> Date type and Numeric types.
> h2. Why are the changes needed?
> 1. Explicit conversion between Date type and Numeric types is disallowed in 
> ANSI mode. We need to provide new functions for users to complete the 
> conversion.
> 2. We have introduced new functions from Bigquery for conversion between 
> Timestamp type and Numeric types: TIMESTAMP_SECONDS, TIMESTAMP_MILLIS, 
> TIMESTAMP_MICROS , UNIX_SECONDS, UNIX_MILLIS, and UNIX_MICROS. It makes sense 
> to add functions for conversion between Date type and Numeric types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33646) Add new function DATE_FROM_UNIX_DATE and UNIX_DATE

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33646:


Assignee: Gengliang Wang  (was: Apache Spark)

> Add new function DATE_FROM_UNIX_DATE and UNIX_DATE
> --
>
> Key: SPARK-33646
> URL: https://issues.apache.org/jira/browse/SPARK-33646
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> h2. What changes were proposed in this pull request?
> Add new functions DATE_FROM_UNIX_DATE and UNIX_DATE for conversion between 
> Date type and Numeric types.
> h2. Why are the changes needed?
> 1. Explicit conversion between Date type and Numeric types is disallowed in 
> ANSI mode. We need to provide new functions for users to complete the 
> conversion.
> 2. We have introduced new functions from Bigquery for conversion between 
> Timestamp type and Numeric types: TIMESTAMP_SECONDS, TIMESTAMP_MILLIS, 
> TIMESTAMP_MICROS , UNIX_SECONDS, UNIX_MILLIS, and UNIX_MICROS. It makes sense 
> to add functions for conversion between Date type and Numeric types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33646) Add new function DATE_FROM_UNIX_DATE and UNIX_DATE

2020-12-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33646?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17243198#comment-17243198
 ] 

Apache Spark commented on SPARK-33646:
--

User 'gengliangwang' has created a pull request for this issue:
https://github.com/apache/spark/pull/30588

> Add new function DATE_FROM_UNIX_DATE and UNIX_DATE
> --
>
> Key: SPARK-33646
> URL: https://issues.apache.org/jira/browse/SPARK-33646
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
>
> h2. What changes were proposed in this pull request?
> Add new functions DATE_FROM_UNIX_DATE and UNIX_DATE for conversion between 
> Date type and Numeric types.
> h2. Why are the changes needed?
> 1. Explicit conversion between Date type and Numeric types is disallowed in 
> ANSI mode. We need to provide new functions for users to complete the 
> conversion.
> 2. We have introduced new functions from Bigquery for conversion between 
> Timestamp type and Numeric types: TIMESTAMP_SECONDS, TIMESTAMP_MILLIS, 
> TIMESTAMP_MICROS , UNIX_SECONDS, UNIX_MILLIS, and UNIX_MICROS. It makes sense 
> to add functions for conversion between Date type and Numeric types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-33646) Add new function DATE_FROM_UNIX_DATE and UNIX_DATE

2020-12-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-33646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-33646:


Assignee: Apache Spark  (was: Gengliang Wang)

> Add new function DATE_FROM_UNIX_DATE and UNIX_DATE
> --
>
> Key: SPARK-33646
> URL: https://issues.apache.org/jira/browse/SPARK-33646
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Gengliang Wang
>Assignee: Apache Spark
>Priority: Major
>
> h2. What changes were proposed in this pull request?
> Add new functions DATE_FROM_UNIX_DATE and UNIX_DATE for conversion between 
> Date type and Numeric types.
> h2. Why are the changes needed?
> 1. Explicit conversion between Date type and Numeric types is disallowed in 
> ANSI mode. We need to provide new functions for users to complete the 
> conversion.
> 2. We have introduced new functions from Bigquery for conversion between 
> Timestamp type and Numeric types: TIMESTAMP_SECONDS, TIMESTAMP_MILLIS, 
> TIMESTAMP_MICROS , UNIX_SECONDS, UNIX_MILLIS, and UNIX_MICROS. It makes sense 
> to add functions for conversion between Date type and Numeric types as well.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   >