[jira] [Resolved] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.
[ https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37022. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34297 [https://github.com/apache/spark/pull/34297] > Use black as a formatter for the whole PySpark codebase. > > > Key: SPARK-37022 > URL: https://issues.apache.org/jira/browse/SPARK-37022 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.3.0 > > Attachments: black-diff-stats.txt, pyproject.toml > > > [{{black}}|https://github.com/psf/black] is a popular Python code formatter. > It is used by a number of projects, both small and large, including prominent > ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used > to format a {{pyspark.pandas}} and (though not enforced) stubs files. > We should consider using black to enforce formatting of all PySpark files. > There are multiple reasons to do that: > - Consistency: black is already used across existing codebase and black > formatted chunks of code are already added to modules other than > pyspark.pandas as a result of type hints inlining (SPARK-36845). > - Lower cost of contributing and reviewing: Formatting can be automatically > enforced and applied. > - Simplify reviews: In general, black formatted code, produces small and > highly readable diffs. > - Reduce effort required to maintain patched forks: smaller diffs + > predictable formatting. > Risks: > - Initial reformatting requires quite significant changes. > - Applying black will break blame in GitHub UI (for git in general see > [Avoiding ruining git > blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). > Additional steps: > - To simplify backporting, black will have to be applied to all active > branches. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37022) Use black as a formatter for the whole PySpark codebase.
[ https://issues.apache.org/jira/browse/SPARK-37022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37022: Assignee: Maciej Szymkiewicz > Use black as a formatter for the whole PySpark codebase. > > > Key: SPARK-37022 > URL: https://issues.apache.org/jira/browse/SPARK-37022 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Attachments: black-diff-stats.txt, pyproject.toml > > > [{{black}}|https://github.com/psf/black] is a popular Python code formatter. > It is used by a number of projects, both small and large, including prominent > ones, like pandas, scikit-learn, Django or SQLAlchemy. Black is already used > to format a {{pyspark.pandas}} and (though not enforced) stubs files. > We should consider using black to enforce formatting of all PySpark files. > There are multiple reasons to do that: > - Consistency: black is already used across existing codebase and black > formatted chunks of code are already added to modules other than > pyspark.pandas as a result of type hints inlining (SPARK-36845). > - Lower cost of contributing and reviewing: Formatting can be automatically > enforced and applied. > - Simplify reviews: In general, black formatted code, produces small and > highly readable diffs. > - Reduce effort required to maintain patched forks: smaller diffs + > predictable formatting. > Risks: > - Initial reformatting requires quite significant changes. > - Applying black will break blame in GitHub UI (for git in general see > [Avoiding ruining git > blame|https://black.readthedocs.io/en/stable/guides/introducing_black_to_your_project.html?highlight=blame#avoiding-ruining-git-blame]). > Additional steps: > - To simplify backporting, black will have to be applied to all active > branches. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering
[ https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37266: Assignee: Apache Spark > Optimize the analysis for view text of persistent view and fix security > vulnerabilities caused by sql tampering > > > Key: SPARK-37266 > URL: https://issues.apache.org/jira/browse/SPARK-37266 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > The current implementation of persistent view is create hive table with view > text. > The view text is just a query string, so the hackers may tamper with it > through various means. > Such as: > {code:java} > select * from tab1 > {code} > tampered with > > {code:java} > drop table tab1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering
[ https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37266: Assignee: (was: Apache Spark) > Optimize the analysis for view text of persistent view and fix security > vulnerabilities caused by sql tampering > > > Key: SPARK-37266 > URL: https://issues.apache.org/jira/browse/SPARK-37266 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > The current implementation of persistent view is create hive table with view > text. > The view text is just a query string, so the hackers may tamper with it > through various means. > Such as: > {code:java} > select * from tab1 > {code} > tampered with > > {code:java} > drop table tab1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering
[ https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441580#comment-17441580 ] Apache Spark commented on SPARK-37266: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/34543 > Optimize the analysis for view text of persistent view and fix security > vulnerabilities caused by sql tampering > > > Key: SPARK-37266 > URL: https://issues.apache.org/jira/browse/SPARK-37266 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > The current implementation of persistent view is create hive table with view > text. > The view text is just a query string, so the hackers may tamper with it > through various means. > Such as: > {code:java} > select * from tab1 > {code} > tampered with > > {code:java} > drop table tab1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node
[ https://issues.apache.org/jira/browse/SPARK-37267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37267: Assignee: Apache Spark > OptimizeSkewInRebalancePartitions support optimize non-root node > > > Key: SPARK-37267 > URL: https://issues.apache.org/jira/browse/SPARK-37267 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Major > > `OptimizeSkewInRebalancePartitions` now is applied if the > `RebalancePartitions` is the root node, but sometimes, we expect a local sort > after do RebalancePartitions that can improve the compression ratio. > After SPARK-36184, we make validate easy that whether the rule introduces > extra shuffle or not and the output partitioning is ensured by > `AQEShuffleReadExec.outputPartitioning`. > So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize > non-root node. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node
[ https://issues.apache.org/jira/browse/SPARK-37267?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37267: Assignee: (was: Apache Spark) > OptimizeSkewInRebalancePartitions support optimize non-root node > > > Key: SPARK-37267 > URL: https://issues.apache.org/jira/browse/SPARK-37267 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > `OptimizeSkewInRebalancePartitions` now is applied if the > `RebalancePartitions` is the root node, but sometimes, we expect a local sort > after do RebalancePartitions that can improve the compression ratio. > After SPARK-36184, we make validate easy that whether the rule introduces > extra shuffle or not and the output partitioning is ensured by > `AQEShuffleReadExec.outputPartitioning`. > So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize > non-root node. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node
[ https://issues.apache.org/jira/browse/SPARK-37267?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17441578#comment-17441578 ] Apache Spark commented on SPARK-37267: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/34542 > OptimizeSkewInRebalancePartitions support optimize non-root node > > > Key: SPARK-37267 > URL: https://issues.apache.org/jira/browse/SPARK-37267 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: XiDuo You >Priority: Major > > `OptimizeSkewInRebalancePartitions` now is applied if the > `RebalancePartitions` is the root node, but sometimes, we expect a local sort > after do RebalancePartitions that can improve the compression ratio. > After SPARK-36184, we make validate easy that whether the rule introduces > extra shuffle or not and the output partitioning is ensured by > `AQEShuffleReadExec.outputPartitioning`. > So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize > non-root node. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37267) OptimizeSkewInRebalancePartitions support optimize non-root node
XiDuo You created SPARK-37267: - Summary: OptimizeSkewInRebalancePartitions support optimize non-root node Key: SPARK-37267 URL: https://issues.apache.org/jira/browse/SPARK-37267 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.3.0 Reporter: XiDuo You `OptimizeSkewInRebalancePartitions` now is applied if the `RebalancePartitions` is the root node, but sometimes, we expect a local sort after do RebalancePartitions that can improve the compression ratio. After SPARK-36184, we make validate easy that whether the rule introduces extra shuffle or not and the output partitioning is ensured by `AQEShuffleReadExec.outputPartitioning`. So it is safe to make `OptimizeSkewInRebalancePartitions` support optimize non-root node. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering
[ https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37266: --- Description: The current implementation of persistent view is create hive table with view text. The view text is just a query string, so the hackers may tamper with it through various means. Such as: {code:java} select * from tab1 {code} tampered with {code:java} drop table tab1 {code} was: The current implementation of persist view is create hive table with view text. The view text is just a query string, so the hackers may tamper with it through various means. Such as: {code:java} select * from tab1 {code} tampered with {code:java} drop table tab1 {code} > Optimize the analysis for view text of persistent view and fix security > vulnerabilities caused by sql tampering > > > Key: SPARK-37266 > URL: https://issues.apache.org/jira/browse/SPARK-37266 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > The current implementation of persistent view is create hive table with view > text. > The view text is just a query string, so the hackers may tamper with it > through various means. > Such as: > {code:java} > select * from tab1 > {code} > tampered with > > {code:java} > drop table tab1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37266) Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering
[ https://issues.apache.org/jira/browse/SPARK-37266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37266: --- Summary: Optimize the analysis for view text of persistent view and fix security vulnerabilities caused by sql tampering (was: Optimize the analysis for view text of persist view and fix security vulnerabilities caused by sql tampering ) > Optimize the analysis for view text of persistent view and fix security > vulnerabilities caused by sql tampering > > > Key: SPARK-37266 > URL: https://issues.apache.org/jira/browse/SPARK-37266 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > The current implementation of persist view is create hive table with view > text. > The view text is just a query string, so the hackers may tamper with it > through various means. > Such as: > {code:java} > select * from tab1 > {code} > tampered with > > {code:java} > drop table tab1 > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37266) Optimize the analysis for view text of persist view and fix security vulnerabilities caused by sql tampering
jiaan.geng created SPARK-37266: -- Summary: Optimize the analysis for view text of persist view and fix security vulnerabilities caused by sql tampering Key: SPARK-37266 URL: https://issues.apache.org/jira/browse/SPARK-37266 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng The current implementation of persist view is create hive table with view text. The view text is just a query string, so the hackers may tamper with it through various means. Such as: {code:java} select * from tab1 {code} tampered with {code:java} drop table tab1 {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org