[jira] [Created] (SPARK-37266) Optimize the analysis for view text of persist view and fix security vulnerabilities caused by sql tampering

2021-11-10 Thread jiaan.geng (Jira)
jiaan.geng created SPARK-37266:
--

 Summary: Optimize the analysis for view text of persist view and 
fix security vulnerabilities caused by sql tampering 
 Key: SPARK-37266
 URL: https://issues.apache.org/jira/browse/SPARK-37266
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


The current implementation of persist view is create hive table with view text.
The view text is just a query string, so the hackers may tamper with it through 
various means.
Such as:
{code:java}
select * from tab1
{code}
 tampered with
 
{code:java}
drop table tab1
{code}




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering

2021-11-10 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37266:
---
Description: 
The current implementation of persistent view is create hive table with view 
text.
The view text is just a query string, so the hackers may tamper with it through 
various means.
Such as:
{code:java}
select * from tab1
{code}
 tampered with
 
{code:java}
drop table tab1
{code}


  was:
The current implementation of persist view is create hive table with view text.
The view text is just a query string, so the hackers may tamper with it through 
various means.
Such as:
{code:java}
select * from tab1
{code}
 tampered with
 
{code:java}
drop table tab1
{code}



> Optimize the analysis for view text of persistent view and fix security 
> vulnerabilities caused by sql tampering 
> 
>
> Key: SPARK-37266
> URL: https://issues.apache.org/jira/browse/SPARK-37266
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implementation of persistent view is create hive table with view 
> text.
> The view text is just a query string, so the hackers may tamper with it 
> through various means.
> Such as:
> {code:java}
> select * from tab1
> {code}
>  tampered with
>  
> {code:java}
> drop table tab1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering

2021-11-10 Thread jiaan.geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37266:
---
Summary: Optimize the analysis for view text of persistent view and fix 
security vulnerabilities caused by sql tampering   (was: Optimize the analysis 
for view text of persist view and fix security vulnerabilities caused by sql 
tampering )

> Optimize the analysis for view text of persistent view and fix security 
> vulnerabilities caused by sql tampering 
> 
>
> Key: SPARK-37266
> URL: https://issues.apache.org/jira/browse/SPARK-37266
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implementation of persist view is create hive table with view 
> text.
> The view text is just a query string, so the hackers may tamper with it 
> through various means.
> Such as:
> {code:java}
> select * from tab1
> {code}
>  tampered with
>  
> {code:java}
> drop table tab1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node

2021-11-10 Thread XiDuo You (Jira)
XiDuo You created SPARK-37267:
-

 Summary: OptimizeSkewInRebalancePartitions support optimize 
non-root node
 Key: SPARK-37267
 URL: https://issues.apache.org/jira/browse/SPARK-37267
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.3.0
Reporter: XiDuo You


`OptimizeSkewInRebalancePartitions` now is applied if the `RebalancePartitions` 
is the root node, but sometimes, we expect a local sort after do 
RebalancePartitions that can improve the compression ratio.

After SPARK-36184, we make validate easy that whether the rule introduces extra 
shuffle or not and the output partitioning is ensured by 
`AQEShuffleReadExec.outputPartitioning`.

So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize 
non-root node.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441578#comment-17441578
 ] 

Apache Spark commented on SPARK-37267:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/34542

> OptimizeSkewInRebalancePartitions support optimize non-root node
> 
>
> Key: SPARK-37267
> URL: https://issues.apache.org/jira/browse/SPARK-37267
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> `OptimizeSkewInRebalancePartitions` now is applied if the 
> `RebalancePartitions` is the root node, but sometimes, we expect a local sort 
> after do RebalancePartitions that can improve the compression ratio.
> After SPARK-36184, we make validate easy that whether the rule introduces 
> extra shuffle or not and the output partitioning is ensured by 
> `AQEShuffleReadExec.outputPartitioning`.
> So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize 
> non-root node.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37267:


Assignee: (was: Apache Spark)

> OptimizeSkewInRebalancePartitions support optimize non-root node
> 
>
> Key: SPARK-37267
> URL: https://issues.apache.org/jira/browse/SPARK-37267
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Priority: Major
>
> `OptimizeSkewInRebalancePartitions` now is applied if the 
> `RebalancePartitions` is the root node, but sometimes, we expect a local sort 
> after do RebalancePartitions that can improve the compression ratio.
> After SPARK-36184, we make validate easy that whether the rule introduces 
> extra shuffle or not and the output partitioning is ensured by 
> `AQEShuffleReadExec.outputPartitioning`.
> So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize 
> non-root node.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37267:


Assignee: Apache Spark

> OptimizeSkewInRebalancePartitions support optimize non-root node
> 
>
> Key: SPARK-37267
> URL: https://issues.apache.org/jira/browse/SPARK-37267
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> `OptimizeSkewInRebalancePartitions` now is applied if the 
> `RebalancePartitions` is the root node, but sometimes, we expect a local sort 
> after do RebalancePartitions that can improve the compression ratio.
> After SPARK-36184, we make validate easy that whether the rule introduces 
> extra shuffle or not and the output partitioning is ensured by 
> `AQEShuffleReadExec.outputPartitioning`.
> So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize 
> non-root node.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441580#comment-17441580
 ] 

Apache Spark commented on SPARK-37266:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/34543

> Optimize the analysis for view text of persistent view and fix security 
> vulnerabilities caused by sql tampering 
> 
>
> Key: SPARK-37266
> URL: https://issues.apache.org/jira/browse/SPARK-37266
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implementation of persistent view is create hive table with view 
> text.
> The view text is just a query string, so the hackers may tamper with it 
> through various means.
> Such as:
> {code:java}
> select * from tab1
> {code}
>  tampered with
>  
> {code:java}
> drop table tab1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37266:


Assignee: Apache Spark

> Optimize the analysis for view text of persistent view and fix security 
> vulnerabilities caused by sql tampering 
> 
>
> Key: SPARK-37266
> URL: https://issues.apache.org/jira/browse/SPARK-37266
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> The current implementation of persistent view is create hive table with view 
> text.
> The view text is just a query string, so the hackers may tamper with it 
> through various means.
> Such as:
> {code:java}
> select * from tab1
> {code}
>  tampered with
>  
> {code:java}
> drop table tab1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37266:


Assignee: (was: Apache Spark)

> Optimize the analysis for view text of persistent view and fix security 
> vulnerabilities caused by sql tampering 
> 
>
> Key: SPARK-37266
> URL: https://issues.apache.org/jira/browse/SPARK-37266
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> The current implementation of persistent view is create hive table with view 
> text.
> The view text is just a query string, so the hackers may tamper with it 
> through various means.
> Such as:
> {code:java}
> select * from tab1
> {code}
>  tampered with
>  
> {code:java}
> drop table tab1
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37022:


Assignee: Maciej Szymkiewicz

> Use black as a formatter for the whole PySpark codebase.
> 
>
> Key: SPARK-37022
> URL: https://issues.apache.org/jira/browse/SPARK-37022
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Attachments: black-diff-stats.txt, pyproject.toml
>
>
> [{{black}}|https://github.com/psf/black] is a popular Python code formatter. 
> It is used by a number of projects, both small and large, including prominent 
> ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used 
> to format a {{pyspark.pandas}} and (though not enforced) stubs files.
> We should consider using black to enforce formatting of all PySpark files. 
> There are multiple reasons to do that:
>  - Consistency: black is already used across existing codebase and black 
> formatted chunks of code are already added to modules other than 
> pyspark.pandas as a result of type hints inlining (SPARK-36845).
>  - Lower cost of contributing and reviewing: Formatting can be automatically 
> enforced and applied.
>  - Simplify reviews: In general, black formatted code, produces small and 
> highly readable diffs.
>  - Reduce effort required to maintain patched forks: smaller diffs + 
> predictable formatting.
> Risks:
>  - Initial reformatting requires quite significant changes.
>  - Applying black will break blame in GitHub UI (for git in general see 
> [Avoiding ruining git 
> blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]).
> Additional steps:
>  - To simplify backporting, black will have to be applied to all active 
> branches.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37022.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34297
[https://github.com/apache/spark/pull/34297]

> Use black as a formatter for the whole PySpark codebase.
> 
>
> Key: SPARK-37022
> URL: https://issues.apache.org/jira/browse/SPARK-37022
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: black-diff-stats.txt, pyproject.toml
>
>
> [{{black}}|https://github.com/psf/black] is a popular Python code formatter. 
> It is used by a number of projects, both small and large, including prominent 
> ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used 
> to format a {{pyspark.pandas}} and (though not enforced) stubs files.
> We should consider using black to enforce formatting of all PySpark files. 
> There are multiple reasons to do that:
>  - Consistency: black is already used across existing codebase and black 
> formatted chunks of code are already added to modules other than 
> pyspark.pandas as a result of type hints inlining (SPARK-36845).
>  - Lower cost of contributing and reviewing: Formatting can be automatically 
> enforced and applied.
>  - Simplify reviews: In general, black formatted code, produces small and 
> highly readable diffs.
>  - Reduce effort required to maintain patched forks: smaller diffs + 
> predictable formatting.
> Risks:
>  - Initial reformatting requires quite significant changes.
>  - Applying black will break blame in GitHub UI (for git in general see 
> [Avoiding ruining git 
> blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]).
> Additional steps:
>  - To simplify backporting, black will have to be applied to all active 
> branches.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441610#comment-17441610
 ] 

Apache Spark commented on SPARK-37022:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34544

> Use black as a formatter for the whole PySpark codebase.
> 
>
> Key: SPARK-37022
> URL: https://issues.apache.org/jira/browse/SPARK-37022
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: black-diff-stats.txt, pyproject.toml
>
>
> [{{black}}|https://github.com/psf/black] is a popular Python code formatter. 
> It is used by a number of projects, both small and large, including prominent 
> ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used 
> to format a {{pyspark.pandas}} and (though not enforced) stubs files.
> We should consider using black to enforce formatting of all PySpark files. 
> There are multiple reasons to do that:
>  - Consistency: black is already used across existing codebase and black 
> formatted chunks of code are already added to modules other than 
> pyspark.pandas as a result of type hints inlining (SPARK-36845).
>  - Lower cost of contributing and reviewing: Formatting can be automatically 
> enforced and applied.
>  - Simplify reviews: In general, black formatted code, produces small and 
> highly readable diffs.
>  - Reduce effort required to maintain patched forks: smaller diffs + 
> predictable formatting.
> Risks:
>  - Initial reformatting requires quite significant changes.
>  - Applying black will break blame in GitHub UI (for git in general see 
> [Avoiding ruining git 
> blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]).
> Additional steps:
>  - To simplify backporting, black will have to be applied to all active 
> branches.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441612#comment-17441612
 ] 

Apache Spark commented on SPARK-37022:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/34544

> Use black as a formatter for the whole PySpark codebase.
> 
>
> Key: SPARK-37022
> URL: https://issues.apache.org/jira/browse/SPARK-37022
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.3.0
>
> Attachments: black-diff-stats.txt, pyproject.toml
>
>
> [{{black}}|https://github.com/psf/black] is a popular Python code formatter. 
> It is used by a number of projects, both small and large, including prominent 
> ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used 
> to format a {{pyspark.pandas}} and (though not enforced) stubs files.
> We should consider using black to enforce formatting of all PySpark files. 
> There are multiple reasons to do that:
>  - Consistency: black is already used across existing codebase and black 
> formatted chunks of code are already added to modules other than 
> pyspark.pandas as a result of type hints inlining (SPARK-36845).
>  - Lower cost of contributing and reviewing: Formatting can be automatically 
> enforced and applied.
>  - Simplify reviews: In general, black formatted code, produces small and 
> highly readable diffs.
>  - Reduce effort required to maintain patched forks: smaller diffs + 
> predictable formatting.
> Risks:
>  - Initial reformatting requires quite significant changes.
>  - Applying black will break blame in GitHub UI (for git in general see 
> [Avoiding ruining git 
> blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]).
> Additional steps:
>  - To simplify backporting, black will have to be applied to all active 
> branches.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37268) Remove unused method call in FileScanRDD

2021-11-10 Thread Junfan Zhang (Jira)
Junfan Zhang created SPARK-37268:


 Summary: Remove unused method call in FileScanRDD
 Key: SPARK-37268
 URL: https://issues.apache.org/jira/browse/SPARK-37268
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Junfan Zhang






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37268) Remove unused method call in FileScanRDD

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37268:


Assignee: (was: Apache Spark)

> Remove unused method call in FileScanRDD
> 
>
> Key: SPARK-37268
> URL: https://issues.apache.org/jira/browse/SPARK-37268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Junfan Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37268) Remove unused method call in FileScanRDD

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37268:


Assignee: Apache Spark

> Remove unused method call in FileScanRDD
> 
>
> Key: SPARK-37268
> URL: https://issues.apache.org/jira/browse/SPARK-37268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Junfan Zhang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37268) Remove unused method call in FileScanRDD

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37268?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441669#comment-17441669
 ] 

Apache Spark commented on SPARK-37268:
--

User 'zuston' has created a pull request for this issue:
https://github.com/apache/spark/pull/34545

> Remove unused method call in FileScanRDD
> 
>
> Key: SPARK-37268
> URL: https://issues.apache.org/jira/browse/SPARK-37268
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Junfan Zhang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37269) The partitionOverwriteMode option is not respected when using insertInto

2021-11-10 Thread David Szakallas (Jira)
David Szakallas created SPARK-37269:
---

 Summary: The partitionOverwriteMode option is not respected when 
using insertInto
 Key: SPARK-37269
 URL: https://issues.apache.org/jira/browse/SPARK-37269
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0
Reporter: David Szakallas


>From the documentation of the {{spark.sql.sources.partitionOverwriteMode}} 
>configuration option:
{quote}This can also be set as an output option for a data source using key 
partitionOverwriteMode (which takes precedence over this setting), e.g. 
dataframe.write.option("partitionOverwriteMode", "dynamic").save(path).
{quote}
This is true when using .save(), however .insertInto() does not respect the 
output option.

 

 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37270) Incorect result of filter using isNull condition

2021-11-10 Thread Tomasz Kus (Jira)
Tomasz Kus created SPARK-37270:
--

 Summary: Incorect result of filter using isNull condition
 Key: SPARK-37270
 URL: https://issues.apache.org/jira/browse/SPARK-37270
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.2.0
Reporter: Tomasz Kus


Simple code that allows to reproduce this issue:
{code:java}
 val frame = Seq((false, 1)).toDF("bool", "number")
frame
  .checkpoint()
  .withColumn("conditions", when(col("bool"), "I am not null"))
  .filter(col("conditions").isNull)
  .show(false){code}
Although "conditions" column is null
{code:java}
 +-+--+--+
|bool |number|conditions|
+-+--+--+
|false|1     |null      |
+-+--+--+{code}
empty result is shown.

Execution plans:
{code:java}
== Parsed Logical Plan ==
'Filter isnull('conditions)
+- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END AS 
conditions#252]
   +- LogicalRDD [bool#124, number#125], false

== Analyzed Logical Plan ==
bool: boolean, number: int, conditions: string
Filter isnull(conditions#252)
+- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END AS 
conditions#252]
   +- LogicalRDD [bool#124, number#125], false

== Optimized Logical Plan ==
LocalRelation , [bool#124, number#125, conditions#252]

== Physical Plan ==
LocalTableScan , [bool#124, number#125, conditions#252]
 {code}
After removing checkpoint proper result is returned  and execution plans are as 
follow:
{code:java}
== Parsed Logical Plan ==
'Filter isnull('conditions)
+- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END AS 
conditions#256]
   +- Project [_1#119 AS bool#124, _2#120 AS number#125]
      +- LocalRelation [_1#119, _2#120]

== Analyzed Logical Plan ==
bool: boolean, number: int, conditions: string
Filter isnull(conditions#256)
+- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END AS 
conditions#256]
   +- Project [_1#119 AS bool#124, _2#120 AS number#125]
      +- LocalRelation [_1#119, _2#120]

== Optimized Logical Plan ==
LocalRelation [bool#124, number#125, conditions#256]

== Physical Plan ==
LocalTableScan [bool#124, number#125, conditions#256]
 {code}
It seems that the most important difference is LogicalRDD ->  LocalRelation

There are following ways (workarounds) to retrieve correct result:

1) remove checkpoint

2) add explicit .otherwise(null) to when

3) add checkpoint() or cache() just before filter

4) downgrade to Spark 3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37261) Check adding partitions with ANSI intervals

2021-11-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37261?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-37261.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34537
[https://github.com/apache/spark/pull/34537]

> Check adding partitions with ANSI intervals
> ---
>
> Key: SPARK-37261
> URL: https://issues.apache.org/jira/browse/SPARK-37261
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
> Fix For: 3.3.0
>
>
> Add tests that should check adding partitions with ANSI intervals via the 
> ALTER TABLE command.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37270) Incorect result of filter using isNull condition

2021-11-10 Thread Tomasz Kus (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tomasz Kus updated SPARK-37270:
---
Component/s: SQL

> Incorect result of filter using isNull condition
> 
>
> Key: SPARK-37270
> URL: https://issues.apache.org/jira/browse/SPARK-37270
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Tomasz Kus
>Priority: Major
>
> Simple code that allows to reproduce this issue:
> {code:java}
>  val frame = Seq((false, 1)).toDF("bool", "number")
> frame
>   .checkpoint()
>   .withColumn("conditions", when(col("bool"), "I am not null"))
>   .filter(col("conditions").isNull)
>   .show(false){code}
> Although "conditions" column is null
> {code:java}
>  +-+--+--+
> |bool |number|conditions|
> +-+--+--+
> |false|1     |null      |
> +-+--+--+{code}
> empty result is shown.
> Execution plans:
> {code:java}
> == Parsed Logical Plan ==
> 'Filter isnull('conditions)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#252]
>    +- LogicalRDD [bool#124, number#125], false
> == Analyzed Logical Plan ==
> bool: boolean, number: int, conditions: string
> Filter isnull(conditions#252)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#252]
>    +- LogicalRDD [bool#124, number#125], false
> == Optimized Logical Plan ==
> LocalRelation , [bool#124, number#125, conditions#252]
> == Physical Plan ==
> LocalTableScan , [bool#124, number#125, conditions#252]
>  {code}
> After removing checkpoint proper result is returned  and execution plans are 
> as follow:
> {code:java}
> == Parsed Logical Plan ==
> 'Filter isnull('conditions)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#256]
>    +- Project [_1#119 AS bool#124, _2#120 AS number#125]
>       +- LocalRelation [_1#119, _2#120]
> == Analyzed Logical Plan ==
> bool: boolean, number: int, conditions: string
> Filter isnull(conditions#256)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#256]
>    +- Project [_1#119 AS bool#124, _2#120 AS number#125]
>       +- LocalRelation [_1#119, _2#120]
> == Optimized Logical Plan ==
> LocalRelation [bool#124, number#125, conditions#256]
> == Physical Plan ==
> LocalTableScan [bool#124, number#125, conditions#256]
>  {code}
> It seems that the most important difference is LogicalRDD ->  LocalRelation
> There are following ways (workarounds) to retrieve correct result:
> 1) remove checkpoint
> 2) add explicit .otherwise(null) to when
> 3) add checkpoint() or cache() just before filter
> 4) downgrade to Spark 3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37236) Inline type hints for KernelDensity.pyi, test.py in python/pyspark/mllib/stat/

2021-11-10 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37236.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34510
[https://github.com/apache/spark/pull/34510]

> Inline type hints for KernelDensity.pyi, test.py in python/pyspark/mllib/stat/
> --
>
> Key: SPARK-37236
> URL: https://issues.apache.org/jira/browse/SPARK-37236
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37236) Inline type hints for KernelDensity.pyi, test.py in python/pyspark/mllib/stat/

2021-11-10 Thread Maciej Szymkiewicz (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37236?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37236:
--

Assignee: dch nguyen

> Inline type hints for KernelDensity.pyi, test.py in python/pyspark/mllib/stat/
> --
>
> Key: SPARK-37236
> URL: https://issues.apache.org/jira/browse/SPARK-37236
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37045) Unify v1 and v2 ALTER TABLE .. ADD COLUMNS tests

2021-11-10 Thread Max Gekk (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37045?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441755#comment-17441755
 ] 

Max Gekk commented on SPARK-37045:
--

I am working on this.

> Unify v1 and v2 ALTER TABLE .. ADD COLUMNS tests
> 
>
> Key: SPARK-37045
> URL: https://issues.apache.org/jira/browse/SPARK-37045
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Priority: Major
>
> Extract ALTER TABLE .. ADD COLUMNS tests to the common place to run them for 
> V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37045) Unify v1 and v2 ALTER TABLE .. ADD COLUMNS tests

2021-11-10 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37045?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-37045:


Assignee: Max Gekk

> Unify v1 and v2 ALTER TABLE .. ADD COLUMNS tests
> 
>
> Key: SPARK-37045
> URL: https://issues.apache.org/jira/browse/SPARK-37045
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Terry Kim
>Assignee: Max Gekk
>Priority: Major
>
> Extract ALTER TABLE .. ADD COLUMNS tests to the common place to run them for 
> V1 and v2 datasources. Some tests can be places to V1 and V2 specific test 
> suites.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-36575) Executor lost may cause spark stage to hang

2021-11-10 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-36575?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441796#comment-17441796
 ] 

wuyi commented on SPARK-36575:
--

FYI: the fix is reverted due to test issues.

> Executor lost may cause spark stage to hang
> ---
>
> Key: SPARK-36575
> URL: https://issues.apache.org/jira/browse/SPARK-36575
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler
>Affects Versions: 2.3.3
>Reporter: hujiahua
>Assignee: hujiahua
>Priority: Major
> Fix For: 3.3.0
>
>
> When a executor finished a task of some stage, the driver will receive a 
> `StatusUpdate` event to handle it. At the same time the driver found the 
> executor heartbeat timed out, so the dirver also need handle ExecutorLost 
> event simultaneously. There was a race condition issues here, which will make 
> the task never been rescheduled again and the stage hang over.
>  The problem is that `TaskResultGetter.enqueueSuccessfulTask` use 
> asynchronous thread to handle successful task, that mean the synchronized 
> lock of `TaskSchedulerImpl` was released prematurely during midway 
> [https://github.com/apache/spark/blob/branch-2.3/core/src/main/scala/org/apache/spark/scheduler/TaskResultGetter.scala#L61].
>  So `TaskSchedulerImpl` may handle executorLost first, then the asynchronous 
> thread will go on to handle successful task. It cause 
> `TaskSetManager.successful` and `TaskSetManager.tasksSuccessful` wrong 
> result. 
> Then `HeartbeatReceiver.expireDeadHosts` executed `killAndReplaceExecutor`, 
> which make `TaskSchedulerImpl.executorLost` was executed twice. 
> `copiesRunning(index) -= 1` were processed in `executorLost`, twice 
> `executorLost` made `copiesRunning(index)` to -1, which lead stage to hang. 
> related log when the issue produce: 
>  21/08/05 02:58:14,784 INFO [dispatcher-event-loop-8] TaskSetManager: 
> Starting task 4004.0 in stage 1328625.0 (TID 347212402, 10.109.89.3, executor 
> 366724, partition 4004, ANY, 7994 bytes)
>  21/08/05 03:00:24,126 ERROR [dispatcher-event-loop-4] TaskSchedulerImpl: 
> Lost executor 366724 on 10.109.89.3: Executor heartbeat timed out after 
> 140830 ms
>  21/08/05 03:00:24,218 WARN [dispatcher-event-loop-4] TaskSetManager: Lost 
> task 4004.0 in stage 1328625.0 (TID 347212402, 10.109.89.3, executor 366724): 
> ExecutorLostFailure (executor 366724 exited caused by one of the running 
> tasks) Reason: Executor heartbeat timed out after 140830 ms
>  21/08/05 03:00:24,542 INFO [task-result-getter-2] TaskSetManager: Finished 
> task 4004.0 in stage 1328625.0 (TID 347212402) in 129758 ms on 10.109.89.3 
> (executor 366724) (3047/5400)
> 21/08/05 03:00:34,621 INFO [dispatcher-event-loop-8] TaskSchedulerImpl: 
> Executor 366724 on 10.109.89.3 killed by driver.
>  21/08/05 03:00:34,771 INFO [spark-listener-group-executorManagement] 
> ExecutorMonitor: Executor 366724 removed (new total is 793)
> 21/08/05 03:00:42,360 INFO [dag-scheduler-event-loop] DAGScheduler: Executor 
> lost: 366724 (epoch 417416)
>  21/08/05 03:00:42,360 INFO [dispatcher-event-loop-14] 
> BlockManagerMasterEndpoint: Trying to remove executor 366724 from 
> BlockManagerMaster.
>  21/08/05 03:00:42,360 INFO [dispatcher-event-loop-14] 
> BlockManagerMasterEndpoint: Removing block manager BlockManagerId(366724, 
> 10.109.89.3, 43402, None)
>  21/08/05 03:00:42,360 INFO [dag-scheduler-event-loop] BlockManagerMaster: 
> Removed 366724 successfully in removeExecutor
>  21/08/05 03:00:42,360 INFO [dag-scheduler-event-loop] DAGScheduler: Shuffle 
> files lost for executor: 366724 (epoch 417416)
>  21/08/05 03:00:44,584 INFO [dag-scheduler-event-loop] DAGScheduler: Executor 
> lost: 366724 (epoch 417473)
>  21/08/05 03:00:44,584 INFO [dispatcher-event-loop-15] 
> BlockManagerMasterEndpoint: Trying to remove executor 366724 from 
> BlockManagerMaster.
>  21/08/05 03:00:44,584 INFO [dag-scheduler-event-loop] BlockManagerMaster: 
> Removed 366724 successfully in removeExecutor
>  21/08/05 03:00:44,584 INFO [dag-scheduler-event-loop] DAGScheduler: Shuffle 
> files lost for executor: 366724 (epoch 417473)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37271) Spark OOM issue

2021-11-10 Thread M Shadab (Jira)
M Shadab created SPARK-37271:


 Summary: Spark OOM issue
 Key: SPARK-37271
 URL: https://issues.apache.org/jira/browse/SPARK-37271
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 3.1.0
Reporter: M Shadab






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37271) Spark OOM issue

2021-11-10 Thread M Shadab (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M Shadab updated SPARK-37271:
-
Shepherd: M Shadab

> Spark OOM issue
> ---
>
> Key: SPARK-37271
> URL: https://issues.apache.org/jira/browse/SPARK-37271
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: M Shadab
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37271) Spark OOM issue

2021-11-10 Thread M Shadab (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37271?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441805#comment-17441805
 ] 

M Shadab commented on SPARK-37271:
--

Memory increased for the container

> Spark OOM issue
> ---
>
> Key: SPARK-37271
> URL: https://issues.apache.org/jira/browse/SPARK-37271
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: M Shadab
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37271) Spark OOM issue

2021-11-10 Thread M Shadab (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

M Shadab resolved SPARK-37271.
--
Resolution: Fixed

done

> Spark OOM issue
> ---
>
> Key: SPARK-37271
> URL: https://issues.apache.org/jira/browse/SPARK-37271
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.1.0
>Reporter: M Shadab
>Priority: Critical
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37265) Support Java 17 in `dev/test-dependencies.sh`

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37265?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37265.
---
Resolution: Invalid

Let me close this Invalid.

> Support Java 17 in `dev/test-dependencies.sh`
> -
>
> Key: SPARK-37265
> URL: https://issues.apache.org/jira/browse/SPARK-37265
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-35557) Adapt uses of JDK 17 Internal APIs

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-35557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-35557.
---
Resolution: Duplicate

This is superseded by SPARK-36796 via adding `--add-open` options.

> Adapt uses of JDK 17 Internal APIs
> --
>
> Key: SPARK-35557
> URL: https://issues.apache.org/jira/browse/SPARK-35557
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.2.0
>Reporter: Ismaël Mejía
>Priority: Major
>
> I tried to run a Spark pipeline using the most recent 3.2.0-SNAPSHOT with 
> Spark 2.12.4 on Java 17 and I found this exception:
> {code:java}
> java.lang.ExceptionInInitializerError
>  at org.apache.spark.unsafe.array.ByteArrayMethods. 
> (ByteArrayMethods.java:54)
>  at org.apache.spark.internal.config.package$. (package.scala:1149)
>  at org.apache.spark.SparkConf$. (SparkConf.scala:654)
>  at org.apache.spark.SparkConf.contains (SparkConf.scala:455)
> ...
> Caused by: java.lang.reflect.InaccessibleObjectException: Unable to make 
> private java.nio.DirectByteBuffer(long,int) accessible: module java.base does 
> not "opens java.nio" to unnamed module @110df513
>  at java.lang.reflect.AccessibleObject.checkCanSetAccessible 
> (AccessibleObject.java:357)
>  at java.lang.reflect.AccessibleObject.checkCanSetAccessible 
> (AccessibleObject.java:297)
>  at java.lang.reflect.Constructor.checkCanSetAccessible (Constructor.java:188)
>  at java.lang.reflect.Constructor.setAccessible (Constructor.java:181)
>  at org.apache.spark.unsafe.Platform. (Platform.java:56)
>  at org.apache.spark.unsafe.array.ByteArrayMethods. 
> (ByteArrayMethods.java:54)
>  at org.apache.spark.internal.config.package$. (package.scala:1149)
>  at org.apache.spark.SparkConf$. (SparkConf.scala:654)
>  at org.apache.spark.SparkConf.contains (SparkConf.scala:455)}}
> {code}
> It seems that Java 17 will be more strict about uses of JDK Internals 
> [https://openjdk.java.net/jeps/403]



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-33502) Large number of SELECT columns causes StackOverflowError

2021-11-10 Thread Arwin S Tio (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33502?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17236434#comment-17236434
 ] 

Arwin S Tio edited comment on SPARK-33502 at 11/10/21, 7:22 PM:


Note, running my program with "-Xss3072k" fixed it. Giving Spark a bigger stack 
lets you hold more columns in memory.


was (Author: cozos):
Note, running my program with "-Xss3072k" fixed it

> Large number of SELECT columns causes StackOverflowError
> 
>
> Key: SPARK-33502
> URL: https://issues.apache.org/jira/browse/SPARK-33502
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7
>Reporter: Arwin S Tio
>Priority: Minor
>
> On Spark 2.4.7 Standalone Mode on my laptop (Macbook Pro 2015), I ran the 
> following:
> {code:java}
> public class TestSparkStackOverflow {
>   public static void main(String [] args) {
> SparkSession spark = SparkSession
>   .builder()
>   .config("spark.master", "local[8]")
>   .appName(TestSparkStackOverflow.class.getSimpleName())
>   .getOrCreate();
> StructType inputSchema = new StructType();
> inputSchema = inputSchema.add("foo", DataTypes.StringType);
> 
> Dataset inputDf = spark.createDataFrame(
>   Arrays.asList(
> RowFactory.create("1"),
> RowFactory.create("2"),
> RowFactory.create("3")
>   ),
>   inputSchema
> );
>  
> List lotsOfColumns = new ArrayList<>();
> for (int i = 0; i < 3000; i++) {
>   lotsOfColumns.add(lit("").as("field" + i).cast(DataTypes.StringType));
> }
> lotsOfColumns.add(new Column("foo"));
> inputDf
>   
> .select(JavaConverters.collectionAsScalaIterableConverter(lotsOfColumns).asScala().toSeq())
>   .write()
>   .format("csv")
>   .mode(SaveMode.Append)
>   .save("file:///tmp/testoutput");
>   }
> }
>  {code}
>  
> And I get a StackOverflowError:
> {code:java}
> Exception in thread "main" org.apache.spark.SparkException: Job 
> aborted.Exception in thread "main" org.apache.spark.SparkException: Job 
> aborted. at 
> org.apache.spark.sql.execution.datasources.FileFormatWriter$.write(FileFormatWriter.scala:198)
>  at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.run(InsertIntoHadoopFsRelationCommand.scala:159)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult$lzycompute(commands.scala:104)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.sideEffectResult(commands.scala:102)
>  at 
> org.apache.spark.sql.execution.command.DataWritingCommandExec.doExecute(commands.scala:122)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  at 
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  at 
> org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152) at 
> org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127) at 
> org.apache.spark.sql.execution.QueryExecution.toRdd$lzycompute(QueryExecution.scala:83)
>  at 
> org.apache.spark.sql.execution.QueryExecution.toRdd(QueryExecution.scala:81) 
> at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>  at 
> org.apache.spark.sql.DataFrameWriter$$anonfun$runCommand$1.apply(DataFrameWriter.scala:696)
>  at 
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:80)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:127)
>  at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:75)
>  at 
> org.apache.spark.sql.DataFrameWriter.runCommand(DataFrameWriter.scala:696) at 
> org.apache.spark.sql.DataFrameWriter.saveToV1Source(DataFrameWriter.scala:305)
>  at org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:291) at 
> org.apache.spark.sql.DataFrameWriter.save(DataFrameWriter.scala:249) at 
> udp.task.TestSparkStackOverflow.main(TestSparkStackOverflow.java:52)Caused 
> by: java.lang.StackOverflowError at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1522) 
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.java:1509) 
> at 
> java.io.ObjectOutputStream.writeOrdinaryObject(ObjectOutputStream.java:1432) 
> at java.io.ObjectOutputStream.writeObject0(ObjectOutputStream.java:1178) at 
> java.io.ObjectOutputStream.defaultWriteFields(ObjectOutputStream.java:1548) 
> at java.io.ObjectOutputStream.writeSerialData(ObjectOutputStream.j

[jira] [Created] (SPARK-37272) Add ExtendedRocksDBTest

2021-11-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-37272:
-

 Summary: Add ExtendedRocksDBTest
 Key: SPARK-37272
 URL: https://issues.apache.org/jira/browse/SPARK-37272
 Project: Spark
  Issue Type: Improvement
  Components: SQL, Tests
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37273) Hidden File Metadata Support for Spark SQL

2021-11-10 Thread Yaohua Zhao (Jira)
Yaohua Zhao created SPARK-37273:
---

 Summary: Hidden File Metadata Support for Spark SQL
 Key: SPARK-37273
 URL: https://issues.apache.org/jira/browse/SPARK-37273
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Yaohua Zhao


Provide a new interface in Spark SQL that allows users to query the metadata of 
the input files for all file formats, expose them as *built-in hidden columns* 
meaning *users can only see them when they explicitly reference them* (e.g. 
file path, file name)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37272) Add ExtendedRocksDBTest

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442020#comment-17442020
 ] 

Apache Spark commented on SPARK-37272:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34547

> Add ExtendedRocksDBTest
> ---
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37272) Add ExtendedRocksDBTest

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37272:


Assignee: (was: Apache Spark)

> Add ExtendedRocksDBTest
> ---
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37272) Add ExtendedRocksDBTest

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37272:


Assignee: Apache Spark

> Add ExtendedRocksDBTest
> ---
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442044#comment-17442044
 ] 

Hyukjin Kwon commented on SPARK-37260:
--

oh yeah. that's fixed via #34475. There are some more ongoing issues on the 
docs. I will fix them up and probably we could initiate spark 3.2.1.

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Priority: Major
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37260.
--
Resolution: Fixed

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Priority: Major
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37260) PYSPARK Arrow 3.2.0 docs link invalid

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37260:
-
Fix Version/s: 3.2.1

> PYSPARK Arrow 3.2.0 docs link invalid
> -
>
> Key: SPARK-37260
> URL: https://issues.apache.org/jira/browse/SPARK-37260
> Project: Spark
>  Issue Type: Bug
>  Components: Documentation
>Affects Versions: 3.2.0
>Reporter: Thomas Graves
>Priority: Major
> Fix For: 3.2.1
>
>
> [http://spark.apache.org/docs/latest/sql-pyspark-pandas-with-arrow.html]
> links to:
> [https://spark.apache.org/docs/latest/api/python/user_guide/arrow_pandas.html]
> which links to:
> [https://spark.apache.org/docs/latest/api/python/sql/arrow_pandas.rst]
> But that is an invalid link.
> I assume its supposed to point to:
> https://spark.apache.org/docs/latest/api/python/user_guide/sql/arrow_pandas.html



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37233) Inline type hints for files in python/pyspark/mllib

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37233:


Assignee: dch nguyen

> Inline type hints for files in python/pyspark/mllib
> ---
>
> Key: SPARK-37233
> URL: https://issues.apache.org/jira/browse/SPARK-37233
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: dch nguyen
>Assignee: dch nguyen
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37254) 100% CPU usage on Spark Thrift Server.

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37254?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442046#comment-17442046
 ] 

Hyukjin Kwon commented on SPARK-37254:
--

it would be much easier to investigate the issue if there're reproducible steps.

> 100% CPU usage on Spark Thrift Server.
> --
>
> Key: SPARK-37254
> URL: https://issues.apache.org/jira/browse/SPARK-37254
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: ramakrishna chilaka
>Priority: Major
>
> We are trying to use Spark thrift server as a distributed sql query engine, 
> the queries work when the resident memory occupied by Spark thrift server 
> identified through HTOP is comparatively less than the driver memory. The 
> same queries result in 100% cpu usage when the resident memory occupied by 
> spark thrift server is greater than the configured driver memory and keeps 
> running at 100% cpu usage. I am using incremental collect as false, as i need 
> faster responses for exploratory queries. I am trying to understand the 
> following points
>  * Why isn't spark thrift server releasing back the memory, when there are no 
> queries. 
>  * What is causing spark thrift server to go into 100% cpu usage on all the 
> cores, when spark thrift server's memory is greater than the driver memory 
> (by 10% usually) and why are queries just stuck.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37270) Incorect result of filter using isNull condition

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37270:
-
Labels: correctness  (was: )

> Incorect result of filter using isNull condition
> 
>
> Key: SPARK-37270
> URL: https://issues.apache.org/jira/browse/SPARK-37270
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Tomasz Kus
>Priority: Major
>  Labels: correctness
>
> Simple code that allows to reproduce this issue:
> {code:java}
>  val frame = Seq((false, 1)).toDF("bool", "number")
> frame
>   .checkpoint()
>   .withColumn("conditions", when(col("bool"), "I am not null"))
>   .filter(col("conditions").isNull)
>   .show(false){code}
> Although "conditions" column is null
> {code:java}
>  +-+--+--+
> |bool |number|conditions|
> +-+--+--+
> |false|1     |null      |
> +-+--+--+{code}
> empty result is shown.
> Execution plans:
> {code:java}
> == Parsed Logical Plan ==
> 'Filter isnull('conditions)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#252]
>    +- LogicalRDD [bool#124, number#125], false
> == Analyzed Logical Plan ==
> bool: boolean, number: int, conditions: string
> Filter isnull(conditions#252)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#252]
>    +- LogicalRDD [bool#124, number#125], false
> == Optimized Logical Plan ==
> LocalRelation , [bool#124, number#125, conditions#252]
> == Physical Plan ==
> LocalTableScan , [bool#124, number#125, conditions#252]
>  {code}
> After removing checkpoint proper result is returned  and execution plans are 
> as follow:
> {code:java}
> == Parsed Logical Plan ==
> 'Filter isnull('conditions)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#256]
>    +- Project [_1#119 AS bool#124, _2#120 AS number#125]
>       +- LocalRelation [_1#119, _2#120]
> == Analyzed Logical Plan ==
> bool: boolean, number: int, conditions: string
> Filter isnull(conditions#256)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#256]
>    +- Project [_1#119 AS bool#124, _2#120 AS number#125]
>       +- LocalRelation [_1#119, _2#120]
> == Optimized Logical Plan ==
> LocalRelation [bool#124, number#125, conditions#256]
> == Physical Plan ==
> LocalTableScan [bool#124, number#125, conditions#256]
>  {code}
> It seems that the most important difference is LogicalRDD ->  LocalRelation
> There are following ways (workarounds) to retrieve correct result:
> 1) remove checkpoint
> 2) add explicit .otherwise(null) to when
> 3) add checkpoint() or cache() just before filter
> 4) downgrade to Spark 3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37272) Add ExtendedRocksDBTest

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37272:
-

Assignee: Dongjoon Hyun

> Add ExtendedRocksDBTest
> ---
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37272) Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37272:
--
Summary: Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple 
Silicon  (was: Add ExtendedRocksDBTest)

> Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon
> 
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37272) Add ExtendedRocksDBTest

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37272.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34547
[https://github.com/apache/spark/pull/34547]

> Add ExtendedRocksDBTest
> ---
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37272) Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37272:
--
Description: 
Javava 17 officially support Apple Silicon

- JEP 391: macOS/AArch64 Port
- https://bugs.openjdk.java.net/browse/JDK-8251280

Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
natively.
{code}
/Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
arm64
{code}

Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
failures occur on M1.
{code}
$ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
...
[info] Run completed in 23 seconds, 281 milliseconds.
[info] Total number of tests run: 32
[info] Suites: completed 2, aborted 2
[info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
[info] *** 2 SUITES ABORTED ***
[info] *** 10 TESTS FAILED ***
[error] Failed tests:
[error] org.apache.spark.sql.streaming.StreamingSessionWindowSuite
[error] 
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
[error] Error during tests:
[error] org.apache.spark.sql.execution.streaming.state.RocksDBSuite
[error] 
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite
[error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM
{code}

This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on 
Apple Silicon.

> Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon
> 
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> Javava 17 officially support Apple Silicon
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
> failures occur on M1.
> {code}
> $ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
> ...
> [info] Run completed in 23 seconds, 281 milliseconds.
> [info] Total number of tests run: 32
> [info] Suites: completed 2, aborted 2
> [info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
> [info] *** 2 SUITES ABORTED ***
> [info] *** 10 TESTS FAILED ***
> [error] Failed tests:
> [error]   org.apache.spark.sql.streaming.StreamingSessionWindowSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
> [error] Error during tests:
> [error]   org.apache.spark.sql.execution.streaming.state.RocksDBSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite
> [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
> [error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM
> {code}
> This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on 
> Apple Silicon.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37272) Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37272:
--
Parent: SPARK-33772
Issue Type: Sub-task  (was: Improvement)

> Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon
> 
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> Javava 17 officially support Apple Silicon
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
> failures occur on M1.
> {code}
> $ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
> ...
> [info] Run completed in 23 seconds, 281 milliseconds.
> [info] Total number of tests run: 32
> [info] Suites: completed 2, aborted 2
> [info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
> [info] *** 2 SUITES ABORTED ***
> [info] *** 10 TESTS FAILED ***
> [error] Failed tests:
> [error]   org.apache.spark.sql.streaming.StreamingSessionWindowSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
> [error] Error during tests:
> [error]   org.apache.spark.sql.execution.streaming.state.RocksDBSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite
> [error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
> [error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM
> {code}
> This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on 
> Apple Silicon.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37272) Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37272:
--
Description: 
Java 17 officially support Apple Silicon

- JEP 391: macOS/AArch64 Port
- https://bugs.openjdk.java.net/browse/JDK-8251280

Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
natively.
{code}
/Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
arm64
{code}

Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
failures occur on M1.
{code}
$ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
...
[info] Run completed in 23 seconds, 281 milliseconds.
[info] Total number of tests run: 32
[info] Suites: completed 2, aborted 2
[info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
[info] *** 2 SUITES ABORTED ***
[info] *** 10 TESTS FAILED ***
[error] Failed tests:
[error] org.apache.spark.sql.streaming.StreamingSessionWindowSuite
[error] 
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
[error] Error during tests:
[error] org.apache.spark.sql.execution.streaming.state.RocksDBSuite
[error] 
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite
[error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM
{code}

This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on 
Apple Silicon.

  was:
Javava 17 officially support Apple Silicon

- JEP 391: macOS/AArch64 Port
- https://bugs.openjdk.java.net/browse/JDK-8251280

Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
natively.
{code}
/Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
arm64
{code}

Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
failures occur on M1.
{code}
$ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
...
[info] Run completed in 23 seconds, 281 milliseconds.
[info] Total number of tests run: 32
[info] Suites: completed 2, aborted 2
[info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
[info] *** 2 SUITES ABORTED ***
[info] *** 10 TESTS FAILED ***
[error] Failed tests:
[error] org.apache.spark.sql.streaming.StreamingSessionWindowSuite
[error] 
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
[error] Error during tests:
[error] org.apache.spark.sql.execution.streaming.state.RocksDBSuite
[error] 
org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreSuite
[error] (sql / Test / testOnly) sbt.TestsFailedException: Tests unsuccessful
[error] Total time: 43 s, completed Nov 10, 2021 4:29:50 PM
{code}

This issue aims to add ExtendedRocksDBTest to disable RocksDB selectively on 
Apple Silicon.


> Add `ExtendedRocksDBTest` and disable RocksDB tests on Apple Silicon
> 
>
> Key: SPARK-37272
> URL: https://issues.apache.org/jira/browse/SPARK-37272
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> Java 17 officially support Apple Silicon
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since RocksDBJNI still doesn't support Apple Silicon natively, the following 
> failures occur on M1.
> {code}
> $ build/sbt "sql/testOnly *RocksDB* *.StreamingSessionWindowSuite"
> ...
> [info] Run completed in 23 seconds, 281 milliseconds.
> [info] Total number of tests run: 32
> [info] Suites: completed 2, aborted 2
> [info] Tests: succeeded 22, failed 10, canceled 0, ignored 0, pending 0
> [info] *** 2 SUITES ABORTED ***
> [info] *** 10 TESTS FAILED ***
> [error] Failed tests:
> [error]   org.apache.spark.sql.streaming.StreamingSessionWindowSuite
> [error]   
> org.apache.spark.sql.execution.streaming.state.RocksDBStateStoreIntegrationSuite
> [error] Error during tests:
> [err

[jira] [Closed] (SPARK-37109) Install Java 17 on all of the Jenkins workers

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun closed SPARK-37109.
-

> Install Java 17 on all of the Jenkins workers
> -
>
> Key: SPARK-37109
> URL: https://issues.apache.org/jira/browse/SPARK-37109
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36900) "SPARK-36464: size returns correct positive number even with over 2GB data" will oom with JDK17

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-36900:
-

Assignee: Yang Jie

> "SPARK-36464: size returns correct positive number even with over 2GB data" 
> will oom with JDK17 
> 
>
> Key: SPARK-36900
> URL: https://issues.apache.org/jira/browse/SPARK-36900
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
> Fix For: 3.3.0
>
>
> Execute
>  
> {code:java}
> build/mvn clean install  -pl core -am -Dtest=none 
> -DwildcardSuites=org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite
> {code}
> with JDK 17,
> {code:java}
> ChunkedByteBufferOutputStreamSuite:
> - empty output
> - write a single byte
> - write a single near boundary
> - write a single at boundary
> - single chunk output
> - single chunk output at boundary size
> - multiple chunk output
> - multiple chunk output at boundary size
> *** RUN ABORTED ***
>   java.lang.OutOfMemoryError: Java heap space
>   at java.base/java.lang.Integer.valueOf(Integer.java:1081)
>   at scala.runtime.BoxesRunTime.boxToInteger(BoxesRunTime.java:67)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.allocateNewChunkIfNeeded(ChunkedByteBufferOutputStream.scala:87)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStream.write(ChunkedByteBufferOutputStream.scala:75)
>   at java.base/java.io.OutputStream.write(OutputStream.java:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite.$anonfun$new$22(ChunkedByteBufferOutputStreamSuite.scala:127)
>   at 
> org.apache.spark.util.io.ChunkedByteBufferOutputStreamSuite$$Lambda$179/0x0008011a75d8.apply(Unknown
>  Source)
>   at org.scalatest.OutcomeOf.outcomeOf(OutcomeOf.scala:85)
>   at org.scalatest.OutcomeOf.outcomeOf$(OutcomeOf.scala:83)
>   at org.scalatest.OutcomeOf$.outcomeOf(OutcomeOf.scala:104)
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37109) Install Java 17 on all of the Jenkins workers

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37109:
--
Parent: (was: SPARK-33772)
Issue Type: Bug  (was: Sub-task)

> Install Java 17 on all of the Jenkins workers
> -
>
> Key: SPARK-37109
> URL: https://issues.apache.org/jira/browse/SPARK-37109
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37264) Exclude hadoop-client-api transitive dependency from orc-core

2021-11-10 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37264:
--
Summary: Exclude hadoop-client-api transitive dependency from orc-core  
(was: [SPARK-37264][BUILD] Exclude hadoop-client-api transitive dependency from 
orc-core)

> Exclude hadoop-client-api transitive dependency from orc-core
> -
>
> Key: SPARK-37264
> URL: https://issues.apache.org/jira/browse/SPARK-37264
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Minor
> Fix For: 3.3.0
>
>
> Like hadoop-common and hadoop-hdfs, this PR proposes to exclude 
> hadoop-client-api transitive dependency from orc-core.
> Why are the changes needed?
> Since Apache Hadoop 2.7 doesn't work on Java 17, Apache ORC has a dependency 
> on Hadoop 3.3.1.
> This causes test-dependencies.sh failure on Java 17. As a result, 
> run-tests.py also fails.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37273) Hidden File Metadata Support for Spark SQL

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37273.
--
Resolution: Duplicate

> Hidden File Metadata Support for Spark SQL
> --
>
> Key: SPARK-37273
> URL: https://issues.apache.org/jira/browse/SPARK-37273
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yaohua Zhao
>Priority: Major
>
> Provide a new interface in Spark SQL that allows users to query the metadata 
> of the input files for all file formats, expose them as *built-in hidden 
> columns* meaning *users can only see them when they explicitly reference 
> them* (e.g. file path, file name)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37273) Hidden File Metadata Support for Spark SQL

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37273?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442057#comment-17442057
 ] 

Hyukjin Kwon commented on SPARK-37273:
--

Don't we already have this in DSv2? e.g.) SPARK-31255

> Hidden File Metadata Support for Spark SQL
> --
>
> Key: SPARK-37273
> URL: https://issues.apache.org/jira/browse/SPARK-37273
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Yaohua Zhao
>Priority: Major
>
> Provide a new interface in Spark SQL that allows users to query the metadata 
> of the input files for all file formats, expose them as *built-in hidden 
> columns* meaning *users can only see them when they explicitly reference 
> them* (e.g. file path, file name)



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-37255) When Used with PyHive (by dropbox) query timeout doesn't result in propagation to the UI

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37255.
--
Resolution: Invalid

> When Used with PyHive (by dropbox) query timeout doesn't result in 
> propagation to the UI
> 
>
> Key: SPARK-37255
> URL: https://issues.apache.org/jira/browse/SPARK-37255
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: ramakrishna chilaka
>Priority: Major
>
> When we run a large query and it is timed out by spark thrift server and when 
> it is cancelled, PyHive doesn't show that query is cancelled. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37255) When Used with PyHive (by dropbox) query timeout doesn't result in propagation to the UI

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442058#comment-17442058
 ] 

Hyukjin Kwon commented on SPARK-37255:
--

That's very likely an issue in PyHive.

> When Used with PyHive (by dropbox) query timeout doesn't result in 
> propagation to the UI
> 
>
> Key: SPARK-37255
> URL: https://issues.apache.org/jira/browse/SPARK-37255
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.2
>Reporter: ramakrishna chilaka
>Priority: Major
>
> When we run a large query and it is timed out by spark thrift server and when 
> it is cancelled, PyHive doesn't show that query is cancelled. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37274) These parameters should be of type long, not int

2021-11-10 Thread hao (Jira)
hao created SPARK-37274:
---

 Summary: These parameters should be of type long, not int
 Key: SPARK-37274
 URL: https://issues.apache.org/jira/browse/SPARK-37274
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: hao


These parameters [spark.sql.orc.columnarReaderBatchSize], 
[spark.sql.inMemoryColumnarStorage.batchSize], 
[spark.sql.parquet.columnarReaderBatchSize] should be of type long, not of type 
int. when the user sets the value to be greater than the maximum value of type 
int, an error will be thrown



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37275) Support ANSI intervals in PySpark

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37275:


 Summary: Support ANSI intervals in PySpark
 Key: SPARK-37275
 URL: https://issues.apache.org/jira/browse/SPARK-37275
 Project: Spark
  Issue Type: Umbrella
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


This JIRA targets to implement ANSI interval types in PySpark:
- 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala
- 
https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37276) Support YearMonthIntervalType in Arrow

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37276:


 Summary: Support YearMonthIntervalType in Arrow
 Key: SPARK-37276
 URL: https://issues.apache.org/jira/browse/SPARK-37276
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


Implements the support of YearMonthIntervalType in Arrow code path:
- pandas UDFs
- pandas functions APIs
- createDataFrame/toPandas when Arrow is enabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37277) Support DayTimeIntervalType in Arrow

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37277:


 Summary: Support DayTimeIntervalType in Arrow
 Key: SPARK-37277
 URL: https://issues.apache.org/jira/browse/SPARK-37277
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


Implements the support of DayTimeIntervalType in Arrow code path:
- pandas UDFs
- pandas functions APIs
- createDataFrame/toPandas when Arrow is enabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37278) Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37278:


 Summary: Support YearMonthIntervalType in createDataFrame/toPandas 
and Python UDFs
 Key: SPARK-37278
 URL: https://issues.apache.org/jira/browse/SPARK-37278
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


Implements the support of YearMonthIntervalType in Arrow code path:
- Python UDFs
- createDataFrame/toPandas when Arrow is disabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37278) Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37278:
-
Description: 
Implements the support of YearMonthIntervalType in:
- Python UDFs
- createDataFrame/toPandas when Arrow is disabled

  was:
Implements the support of YearMonthIntervalType in Arrow code path:
- Python UDFs
- createDataFrame/toPandas when Arrow is disabled


> Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs
> -
>
> Key: SPARK-37278
> URL: https://issues.apache.org/jira/browse/SPARK-37278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Implements the support of YearMonthIntervalType in:
> - Python UDFs
> - createDataFrame/toPandas when Arrow is disabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37279) Support DayTimeIntervalType in createDataFrame/toPandas and Python UDFs

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37279:


 Summary: Support DayTimeIntervalType in createDataFrame/toPandas 
and Python UDFs
 Key: SPARK-37279
 URL: https://issues.apache.org/jira/browse/SPARK-37279
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


Implements the support of DayTimeIntervalType in:
- Python UDFs
- createDataFrame/toPandas when Arrow is disabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37281) Support DayTimeIntervalType in Py4J

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37281:


 Summary: Support DayTimeIntervalType in Py4J
 Key: SPARK-37281
 URL: https://issues.apache.org/jira/browse/SPARK-37281
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37280) Support YearMonthIntervalType in Py4J

2021-11-10 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-37280:


 Summary: Support YearMonthIntervalType in Py4J
 Key: SPARK-37280
 URL: https://issues.apache.org/jira/browse/SPARK-37280
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 3.3.0
Reporter: Hyukjin Kwon


This PR adds the support of YearMonthIntervalType in Py4J. For example, 
functions.lit(YearMonthIntervalType) should work.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37281) Support DayTimeIntervalType in Py4J

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37281:
-
Description: This PR adds the support of YearMonthIntervalType in Py4J. For 
example, functions.lit(DayTimeIntervalType) should work.

> Support DayTimeIntervalType in Py4J
> ---
>
> Key: SPARK-37281
> URL: https://issues.apache.org/jira/browse/SPARK-37281
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This PR adds the support of YearMonthIntervalType in Py4J. For example, 
> functions.lit(DayTimeIntervalType) should work.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37275) Support ANSI intervals in PySpark

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37275?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442066#comment-17442066
 ] 

Hyukjin Kwon commented on SPARK-37275:
--

cc [~maxgekk] FYI

> Support ANSI intervals in PySpark
> -
>
> Key: SPARK-37275
> URL: https://issues.apache.org/jira/browse/SPARK-37275
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> This JIRA targets to implement ANSI interval types in PySpark:
> - 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/DayTimeIntervalType.scala
> - 
> https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/types/YearMonthIntervalType.scala



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37278) Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442068#comment-17442068
 ] 

Hyukjin Kwon commented on SPARK-37278:
--

I am working on this.

> Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs
> -
>
> Key: SPARK-37278
> URL: https://issues.apache.org/jira/browse/SPARK-37278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Implements the support of YearMonthIntervalType in:
> - Python UDFs
> - createDataFrame/toPandas when Arrow is disabled



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37270) Incorect result of filter using isNull condition

2021-11-10 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442074#comment-17442074
 ] 

Hyukjin Kwon commented on SPARK-37270:
--

Hm, I can't reproduce this locally. Are you able to reproduce this with running 
locally too? e.g.)

{code}
spark.sparkContext.setCheckpointDir("/tmp/checkpoints")
val frame = Seq((false, 1)).toDF("bool", "number")
frame
  .checkpoint()
  .withColumn("conditions", when(col("bool"), "I am not null"))
  .filter(col("conditions").isNull)
  .show(false)
{code}

> Incorect result of filter using isNull condition
> 
>
> Key: SPARK-37270
> URL: https://issues.apache.org/jira/browse/SPARK-37270
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.2.0
>Reporter: Tomasz Kus
>Priority: Major
>  Labels: correctness
>
> Simple code that allows to reproduce this issue:
> {code:java}
>  val frame = Seq((false, 1)).toDF("bool", "number")
> frame
>   .checkpoint()
>   .withColumn("conditions", when(col("bool"), "I am not null"))
>   .filter(col("conditions").isNull)
>   .show(false){code}
> Although "conditions" column is null
> {code:java}
>  +-+--+--+
> |bool |number|conditions|
> +-+--+--+
> |false|1     |null      |
> +-+--+--+{code}
> empty result is shown.
> Execution plans:
> {code:java}
> == Parsed Logical Plan ==
> 'Filter isnull('conditions)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#252]
>    +- LogicalRDD [bool#124, number#125], false
> == Analyzed Logical Plan ==
> bool: boolean, number: int, conditions: string
> Filter isnull(conditions#252)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#252]
>    +- LogicalRDD [bool#124, number#125], false
> == Optimized Logical Plan ==
> LocalRelation , [bool#124, number#125, conditions#252]
> == Physical Plan ==
> LocalTableScan , [bool#124, number#125, conditions#252]
>  {code}
> After removing checkpoint proper result is returned  and execution plans are 
> as follow:
> {code:java}
> == Parsed Logical Plan ==
> 'Filter isnull('conditions)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#256]
>    +- Project [_1#119 AS bool#124, _2#120 AS number#125]
>       +- LocalRelation [_1#119, _2#120]
> == Analyzed Logical Plan ==
> bool: boolean, number: int, conditions: string
> Filter isnull(conditions#256)
> +- Project [bool#124, number#125, CASE WHEN bool#124 THEN I am not null END 
> AS conditions#256]
>    +- Project [_1#119 AS bool#124, _2#120 AS number#125]
>       +- LocalRelation [_1#119, _2#120]
> == Optimized Logical Plan ==
> LocalRelation [bool#124, number#125, conditions#256]
> == Physical Plan ==
> LocalTableScan [bool#124, number#125, conditions#256]
>  {code}
> It seems that the most important difference is LogicalRDD ->  LocalRelation
> There are following ways (workarounds) to retrieve correct result:
> 1) remove checkpoint
> 2) add explicit .otherwise(null) to when
> 3) add checkpoint() or cache() just before filter
> 4) downgrade to Spark 3.1.2



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36799) Pass queryExecution name in CLI when only select query

2021-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36799.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34041
[https://github.com/apache/spark/pull/34041]

> Pass queryExecution name in CLI when only select query
> --
>
> Key: SPARK-36799
> URL: https://issues.apache.org/jira/browse/SPARK-36799
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.3.0
>
>
> Now when in spark-sql CLI, QueryExecutionListener can receive command, but 
> not select query, because queryExecution Name is not passed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36799) Pass queryExecution name in CLI when only select query

2021-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36799:
---

Assignee: dzcxzl

> Pass queryExecution name in CLI when only select query
> --
>
> Key: SPARK-36799
> URL: https://issues.apache.org/jira/browse/SPARK-36799
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
>
> Now when in spark-sql CLI, QueryExecutionListener can receive command, but 
> not select query, because queryExecution Name is not passed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-36799) Pass queryExecution name in CLI

2021-11-10 Thread dzcxzl (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

dzcxzl updated SPARK-36799:
---
Summary: Pass queryExecution name in CLI  (was: Pass queryExecution name in 
CLI when only select query)

> Pass queryExecution name in CLI
> ---
>
> Key: SPARK-36799
> URL: https://issues.apache.org/jira/browse/SPARK-36799
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.2
>Reporter: dzcxzl
>Assignee: dzcxzl
>Priority: Trivial
> Fix For: 3.3.0
>
>
> Now when in spark-sql CLI, QueryExecutionListener can receive command, but 
> not select query, because queryExecution Name is not passed.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36182) Support TimestampNTZ type in Parquet file source

2021-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36182?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36182.
-
Resolution: Fixed

Issue resolved by pull request 34495
[https://github.com/apache/spark/pull/34495]

> Support TimestampNTZ type in Parquet file source
> 
>
> Key: SPARK-36182
> URL: https://issues.apache.org/jira/browse/SPARK-36182
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.3.0
>
>
> As per 
> https://github.com/apache/parquet-format/blob/master/LogicalTypes.md#timestamp,
>  Parquet supports both TIMESTAMP_NTZ and TIMESTAMP_LTZ (Spark's current 
> default timestamp type):
> * A TIMESTAMP with isAdjustedToUTC=true => TIMESTAMP_LTZ
> * A TIMESTAMP with isAdjustedToUTC=false => TIMESTAMP_NTZ
> In Spark 3.1 or prior,  the Parquet writer follows the definition and sets 
> the field `isAdjustedToUTC` as `true`, while the Parquet reader doesn’t 
> respect the `isAdjustedToUTC` flag and convert any Parquet Timestamp type as 
> TIMESTAMP_LTZ.
> Since 3.2, with the support of timestamp without time zone type:
> * Parquet writer follows the definition and sets the field `isAdjustedToUTC` 
> as `false` on writing TIMESTAMP_NTZ. 
> * Parquet reader 
> ** For schema inference, Spark converts the Parquet timestamp type to the 
> corresponding catalyst timestamp type according to the timestamp annotation 
> flag `isAdjustedToUTC`.
> ** If merge schema is enabled in schema inference and some of the files are 
> inferred as TIMESTAMP_NTZ while the others are TIMESTAMP_LTZ, the result type 
> is TIMESTAMP_LTZ  which is considered as the “wider” type
> ** If a column of a user-provided schema is TIMESTAMP_LTZ and the column was 
> written as  TIMESTAMP_NTZ type, Spark allows the read operation.
> ** If a column of a user-provided schema is TIMESTAMP_NTZ and the column was 
> written as  TIMESTAMP_LTZ type, the read operation is not allowed since the 
> TIMESTAMP_NTZ is considered as narrower than TIMESTAMP_LTZ.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-36073:
---

Assignee: Peter Toth

> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
>
> Currently `EquivalentExpressions` has 2 issues:
> - identifying common expressions in conditional expressions is not correct in 
> all cases
> - transparently canonicalized expressions (like `PromotePrecision`) are 
> considered common subexpressions



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-36073) EquivalentExpressions fixes and improvements

2021-11-10 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-36073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-36073.
-
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 33281
[https://github.com/apache/spark/pull/33281]

> EquivalentExpressions fixes and improvements
> 
>
> Key: SPARK-36073
> URL: https://issues.apache.org/jira/browse/SPARK-36073
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Peter Toth
>Assignee: Peter Toth
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently `EquivalentExpressions` has 2 issues:
> - identifying common expressions in conditional expressions is not correct in 
> all cases
> - transparently canonicalized expressions (like `PromotePrecision`) are 
> considered common subexpressions



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37282) Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon

2021-11-10 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-37282:
-

 Summary: Add ExtendedLevelDBTest and disable LevelDB tests on 
Apple Silicon
 Key: SPARK-37282
 URL: https://issues.apache.org/jira/browse/SPARK-37282
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Tests
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun


Java 17 officially support Apple Silicon.
- JEP 391: macOS/AArch64 Port
- https://bugs.openjdk.java.net/browse/JDK-8251280

Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
natively.
{code}
/Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
/Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
arm64
{code}

Since LevelDBJNI still doesn't support Apple Silicon natively, the test cases 
fail on M1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37282) Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442084#comment-17442084
 ] 

Apache Spark commented on SPARK-37282:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/34548

> Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon
> --
>
> Key: SPARK-37282
> URL: https://issues.apache.org/jira/browse/SPARK-37282
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Java 17 officially support Apple Silicon.
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since LevelDBJNI still doesn't support Apple Silicon natively, the test cases 
> fail on M1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37282) Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37282:


Assignee: Apache Spark

> Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon
> --
>
> Key: SPARK-37282
> URL: https://issues.apache.org/jira/browse/SPARK-37282
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> Java 17 officially support Apple Silicon.
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since LevelDBJNI still doesn't support Apple Silicon natively, the test cases 
> fail on M1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37282) Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37282:


Assignee: (was: Apache Spark)

> Add ExtendedLevelDBTest and disable LevelDB tests on Apple Silicon
> --
>
> Key: SPARK-37282
> URL: https://issues.apache.org/jira/browse/SPARK-37282
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> Java 17 officially support Apple Silicon.
> - JEP 391: macOS/AArch64 Port
> - https://bugs.openjdk.java.net/browse/JDK-8251280
> Oracle Java, Azul Zulu, and Eclipse Temurin Java 17 supports Apple Silicon 
> natively.
> {code}
> /Users/dongjoon/.jenv/versions/oracle17/bin/java: Mach-O 64-bit executable 
> arm64
> /Users/dongjoon/.jenv/versions/zulu17/bin/java: Mach-O 64-bit executable arm64
> /Users/dongjoon/.jenv/versions/temurin17/bin/java: Mach-O 64-bit executable 
> arm64
> {code}
> Since LevelDBJNI still doesn't support Apple Silicon natively, the test cases 
> fail on M1.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37278) Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37278:
-
Description: 
Implements the support of YearMonthIntervalType in:
- Python UDFs
- createDataFrame/toPandas

  was:
Implements the support of YearMonthIntervalType in:
- Python UDFs
- createDataFrame/toPandas when Arrow is disabled


> Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs
> -
>
> Key: SPARK-37278
> URL: https://issues.apache.org/jira/browse/SPARK-37278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Implements the support of YearMonthIntervalType in:
> - Python UDFs
> - createDataFrame/toPandas



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37276) Support YearMonthIntervalType in Arrow

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37276:
-
Description: 
Implements the support of YearMonthIntervalType in Arrow code path:
- pandas UDFs
- pandas functions APIs

  was:
Implements the support of YearMonthIntervalType in Arrow code path:
- pandas UDFs
- pandas functions APIs
- createDataFrame/toPandas when Arrow is enabled


> Support YearMonthIntervalType in Arrow
> --
>
> Key: SPARK-37276
> URL: https://issues.apache.org/jira/browse/SPARK-37276
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Implements the support of YearMonthIntervalType in Arrow code path:
> - pandas UDFs
> - pandas functions APIs



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37276) Support YearMonthIntervalType in Arrow

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37276:
-
Description: 
Implements the support of YearMonthIntervalType in Arrow code path:
- pandas UDFs
- pandas functions APIs
- createDataFrame/toPandas w/ Arrow

  was:
Implements the support of YearMonthIntervalType in Arrow code path:
- pandas UDFs
- pandas functions APIs


> Support YearMonthIntervalType in Arrow
> --
>
> Key: SPARK-37276
> URL: https://issues.apache.org/jira/browse/SPARK-37276
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Implements the support of YearMonthIntervalType in Arrow code path:
> - pandas UDFs
> - pandas functions APIs
> - createDataFrame/toPandas w/ Arrow



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37278) Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs

2021-11-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37278:
-
Description: 
Implements the support of YearMonthIntervalType in:
- Python UDFs
- createDataFrame/toPandas without Arrow

  was:
Implements the support of YearMonthIntervalType in:
- Python UDFs
- createDataFrame/toPandas


> Support YearMonthIntervalType in createDataFrame/toPandas and Python UDFs
> -
>
> Key: SPARK-37278
> URL: https://issues.apache.org/jira/browse/SPARK-37278
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Implements the support of YearMonthIntervalType in:
> - Python UDFs
> - createDataFrame/toPandas without Arrow



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37263) Create an option to silence advice for pandas API on Spark.

2021-11-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-37263:

Summary: Create an option to silence advice for pandas API on Spark.  (was: 
Reduce pandas-on-Spark warning for internal usage.)

> Create an option to silence advice for pandas API on Spark.
> ---
>
> Key: SPARK-37263
> URL: https://issues.apache.org/jira/browse/SPARK-37263
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Raised from comment 
> https://github.com/apache/spark/pull/34389#discussion_r741733023.
> The advice warning for pandas API on Spark for expensive APIs 
> ([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
>  now issuing too much warning message, since it also issuing the warning when 
> the APIs are used for internal usage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37263) Add an option to silence advice for pandas API on Spark.

2021-11-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-37263:

Summary: Add an option to silence advice for pandas API on Spark.  (was: 
Create an option to silence advice for pandas API on Spark.)

> Add an option to silence advice for pandas API on Spark.
> 
>
> Key: SPARK-37263
> URL: https://issues.apache.org/jira/browse/SPARK-37263
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Raised from comment 
> https://github.com/apache/spark/pull/34389#discussion_r741733023.
> The advice warning for pandas API on Spark for expensive APIs 
> ([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
>  now issuing too much warning message, since it also issuing the warning when 
> the APIs are used for internal usage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37263) Add an option to silence advice for pandas API on Spark.

2021-11-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-37263:

Description: 
Raised from comment 
[https://github.com/apache/spark/pull/34389#discussion_r741733023].

The advice warning for pandas API on Spark for expensive APIs 
([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
 now issuing too much warning message, so it might be good to have option to 
turn this message on/off.

  was:
Raised from comment 
https://github.com/apache/spark/pull/34389#discussion_r741733023.

The advice warning for pandas API on Spark for expensive APIs 
([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
 now issuing too much warning message, since it also issuing the warning when 
the APIs are used for internal usage.


> Add an option to silence advice for pandas API on Spark.
> 
>
> Key: SPARK-37263
> URL: https://issues.apache.org/jira/browse/SPARK-37263
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Raised from comment 
> [https://github.com/apache/spark/pull/34389#discussion_r741733023].
> The advice warning for pandas API on Spark for expensive APIs 
> ([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
>  now issuing too much warning message, so it might be good to have option to 
> turn this message on/off.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37274) These parameters should be of type long, not int

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37274:


Assignee: (was: Apache Spark)

> These parameters should be of type long, not int
> 
>
> Key: SPARK-37274
> URL: https://issues.apache.org/jira/browse/SPARK-37274
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: hao
>Priority: Major
>
> These parameters [spark.sql.orc.columnarReaderBatchSize], 
> [spark.sql.inMemoryColumnarStorage.batchSize], 
> [spark.sql.parquet.columnarReaderBatchSize] should be of type long, not of 
> type int. when the user sets the value to be greater than the maximum value 
> of type int, an error will be thrown



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37274) These parameters should be of type long, not int

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37274?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37274:


Assignee: Apache Spark

> These parameters should be of type long, not int
> 
>
> Key: SPARK-37274
> URL: https://issues.apache.org/jira/browse/SPARK-37274
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: hao
>Assignee: Apache Spark
>Priority: Major
>
> These parameters [spark.sql.orc.columnarReaderBatchSize], 
> [spark.sql.inMemoryColumnarStorage.batchSize], 
> [spark.sql.parquet.columnarReaderBatchSize] should be of type long, not of 
> type int. when the user sets the value to be greater than the maximum value 
> of type int, an error will be thrown



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37274) These parameters should be of type long, not int

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442102#comment-17442102
 ] 

Apache Spark commented on SPARK-37274:
--

User 'dh20' has created a pull request for this issue:
https://github.com/apache/spark/pull/34549

> These parameters should be of type long, not int
> 
>
> Key: SPARK-37274
> URL: https://issues.apache.org/jira/browse/SPARK-37274
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: hao
>Priority: Major
>
> These parameters [spark.sql.orc.columnarReaderBatchSize], 
> [spark.sql.inMemoryColumnarStorage.batchSize], 
> [spark.sql.parquet.columnarReaderBatchSize] should be of type long, not of 
> type int. when the user sets the value to be greater than the maximum value 
> of type int, an error will be thrown



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Kousuke Saruta (Jira)
Kousuke Saruta created SPARK-37283:
--

 Summary: Don't try to store a V1 table which contains ANSI 
intervals in Hive compatible format
 Key: SPARK-37283
 URL: https://issues.apache.org/jira/browse/SPARK-37283
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.2.0
Reporter: Kousuke Saruta
Assignee: Kousuke Saruta


If, a table being created contains a column of ANSI interval types and the 
underlying file format has a corresponding Hive SerDe (e.g. Parquet),
`HiveExternalcatalog` tries to store the table in Hive compatible format.
But, as ANSI interval types in Spark and interval type in Hive are not 
compatible (Hive only supports interval_year_month and interval_day_time), the 
following warning with stack trace will be logged.

{code}
spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist `default`.`tbl1` 
in a Hive compatible way. Persisting it into Hive metastore in Spark SQL 
specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
'interval year to month' but 'interval year to month' is found.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$

[jira] [Updated] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Kousuke Saruta (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kousuke Saruta updated SPARK-37283:
---
Description: 
If, a table being created contains a column of ANSI interval types and the 
underlying file format has a corresponding Hive SerDe (e.g. Parquet),
`HiveExternalcatalog` tries to store the table in Hive compatible format.
But, as ANSI interval types in Spark and interval type in Hive are not 
compatible (Hive only supports interval_year_month and interval_day_time), the 
following warning with stack trace will be logged.

{code}
spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
since hive.security.authorization.manager is set to instance of 
HiveAuthorizerFactory.
21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist `default`.`tbl1` 
in a Hive compatible way. Persisting it into Hive metastore in Spark SQL 
specific format.
org.apache.hadoop.hive.ql.metadata.HiveException: 
java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
'interval year to month' but 'interval year to month' is found.
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
at 
scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
at 
org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
at 
org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
at 
org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWith

[jira] [Updated] (SPARK-37263) Add PandasAPIOnSparkAdviceWarning class

2021-11-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-37263:

Summary: Add PandasAPIOnSparkAdviceWarning class  (was: Add an option to 
silence advice for pandas API on Spark.)

> Add PandasAPIOnSparkAdviceWarning class
> ---
>
> Key: SPARK-37263
> URL: https://issues.apache.org/jira/browse/SPARK-37263
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Raised from comment 
> [https://github.com/apache/spark/pull/34389#discussion_r741733023].
> The advice warning for pandas API on Spark for expensive APIs 
> ([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
>  now issuing too much warning message, so it might be good to have option to 
> turn this message on/off.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-37263) Add PandasAPIOnSparkAdviceWarning class

2021-11-10 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-37263:

Description: 
Raised from comment 
[https://github.com/apache/spark/pull/34389#discussion_r741733023].

The advice warning for pandas API on Spark for expensive APIs 
([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
 now issuing too much warning message, so it might be good to have 
pandas-on-Spark specific warning class so that users can manually turn it off 
by using warning.simplefilter.

  was:
Raised from comment 
[https://github.com/apache/spark/pull/34389#discussion_r741733023].

The advice warning for pandas API on Spark for expensive APIs 
([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
 now issuing too much warning message, so it might be good to have option to 
turn this message on/off.


> Add PandasAPIOnSparkAdviceWarning class
> ---
>
> Key: SPARK-37263
> URL: https://issues.apache.org/jira/browse/SPARK-37263
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Raised from comment 
> [https://github.com/apache/spark/pull/34389#discussion_r741733023].
> The advice warning for pandas API on Spark for expensive APIs 
> ([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
>  now issuing too much warning message, so it might be good to have 
> pandas-on-Spark specific warning class so that users can manually turn it off 
> by using warning.simplefilter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37283?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17442111#comment-17442111
 ] 

Apache Spark commented on SPARK-37283:
--

User 'sarutak' has created a pull request for this issue:
https://github.com/apache/spark/pull/34551

> Don't try to store a V1 table which contains ANSI intervals in Hive 
> compatible format
> -
>
> Key: SPARK-37283
> URL: https://issues.apache.org/jira/browse/SPARK-37283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> If, a table being created contains a column of ANSI interval types and the 
> underlying file format has a corresponding Hive SerDe (e.g. Parquet),
> `HiveExternalcatalog` tries to store the table in Hive compatible format.
> But, as ANSI interval types in Spark and interval type in Hive are not 
> compatible (Hive only supports interval_year_month and interval_day_time), 
> the following warning with stack trace will be logged.
> {code}
> spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
> 21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist 
> `default`.`tbl1` in a Hive compatible way. Persisting it into Hive metastore 
> in Spark SQL specific format.
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
> 'interval year to month' but 'interval year to month' is found.
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
>   at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands

[jira] [Assigned] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37283:


Assignee: Apache Spark  (was: Kousuke Saruta)

> Don't try to store a V1 table which contains ANSI intervals in Hive 
> compatible format
> -
>
> Key: SPARK-37283
> URL: https://issues.apache.org/jira/browse/SPARK-37283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Apache Spark
>Priority: Major
>
> If, a table being created contains a column of ANSI interval types and the 
> underlying file format has a corresponding Hive SerDe (e.g. Parquet),
> `HiveExternalcatalog` tries to store the table in Hive compatible format.
> But, as ANSI interval types in Spark and interval type in Hive are not 
> compatible (Hive only supports interval_year_month and interval_day_time), 
> the following warning with stack trace will be logged.
> {code}
> spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
> 21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist 
> `default`.`tbl1` in a Hive compatible way. Persisting it into Hive metastore 
> in Spark SQL specific format.
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
> 'interval year to month' but 'interval year to month' is found.
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
>   at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$anon

[jira] [Assigned] (SPARK-37263) Add PandasAPIOnSparkAdviceWarning class

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37263:


Assignee: (was: Apache Spark)

> Add PandasAPIOnSparkAdviceWarning class
> ---
>
> Key: SPARK-37263
> URL: https://issues.apache.org/jira/browse/SPARK-37263
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Raised from comment 
> [https://github.com/apache/spark/pull/34389#discussion_r741733023].
> The advice warning for pandas API on Spark for expensive APIs 
> ([https://github.com/apache/spark/pull/34389#discussion_r741733023)|https://github.com/apache/spark/pull/34389#discussion_r741733023).]
>  now issuing too much warning message, so it might be good to have 
> pandas-on-Spark specific warning class so that users can manually turn it off 
> by using warning.simplefilter.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-37283) Don't try to store a V1 table which contains ANSI intervals in Hive compatible format

2021-11-10 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-37283?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37283:


Assignee: Kousuke Saruta  (was: Apache Spark)

> Don't try to store a V1 table which contains ANSI intervals in Hive 
> compatible format
> -
>
> Key: SPARK-37283
> URL: https://issues.apache.org/jira/browse/SPARK-37283
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Kousuke Saruta
>Assignee: Kousuke Saruta
>Priority: Major
>
> If, a table being created contains a column of ANSI interval types and the 
> underlying file format has a corresponding Hive SerDe (e.g. Parquet),
> `HiveExternalcatalog` tries to store the table in Hive compatible format.
> But, as ANSI interval types in Spark and interval type in Hive are not 
> compatible (Hive only supports interval_year_month and interval_day_time), 
> the following warning with stack trace will be logged.
> {code}
> spark-sql> CREATE TABLE tbl1(a INTERVAL YEAR TO MONTH) USING Parquet;
> 21/11/11 14:39:29 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.
> 21/11/11 14:39:29 WARN HiveExternalCatalog: Could not persist 
> `default`.`tbl1` in a Hive compatible way. Persisting it into Hive metastore 
> in Spark SQL specific format.
> org.apache.hadoop.hive.ql.metadata.HiveException: 
> java.lang.IllegalArgumentException: Error: type expected at the position 0 of 
> 'interval year to month' but 'interval year to month' is found.
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:869)
>   at org.apache.hadoop.hive.ql.metadata.Hive.createTable(Hive.java:874)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$createTable$1(HiveClientImpl.scala:553)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:303)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:234)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:233)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:283)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl.createTable(HiveClientImpl.scala:551)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.saveTableIntoHive(HiveExternalCatalog.scala:499)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createDataSourceTable(HiveExternalCatalog.scala:397)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$createTable$1(HiveExternalCatalog.scala:274)
>   at 
> scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:102)
>   at 
> org.apache.spark.sql.hive.HiveExternalCatalog.createTable(HiveExternalCatalog.scala:245)
>   at 
> org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.createTable(ExternalCatalogWithListener.scala:94)
>   at 
> org.apache.spark.sql.catalyst.catalog.SessionCatalog.createTable(SessionCatalog.scala:376)
>   at 
> org.apache.spark.sql.execution.command.CreateDataSourceTableCommand.run(createDataSourceTables.scala:120)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult$lzycompute(commands.scala:75)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.sideEffectResult(commands.scala:73)
>   at 
> org.apache.spark.sql.execution.command.ExecutedCommandExec.executeCollect(commands.scala:84)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:97)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at 
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:97)
>   at 
> org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:93)
>   at 
> org.apache.spark.sql.catalyst.trees.TreeNode.$an

  1   2   >