[jira] [Commented] (SPARK-40803) LZ4CompressionCodec looks up configuration on each stream creation
[ https://issues.apache.org/jira/browse/SPARK-40803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618682#comment-17618682 ] Apache Spark commented on SPARK-40803: -- User 'eejbyfeldt' has created a pull request for this issue: https://github.com/apache/spark/pull/38282 > LZ4CompressionCodec looks up configuration on each stream creation > -- > > Key: SPARK-40803 > URL: https://issues.apache.org/jira/browse/SPARK-40803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > This look up in SparkConf is quite expensive and shows up in profiling for > cases where lots of smaller streams are created. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40803) LZ4CompressionCodec looks up configuration on each stream creation
[ https://issues.apache.org/jira/browse/SPARK-40803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40803: Assignee: Apache Spark > LZ4CompressionCodec looks up configuration on each stream creation > -- > > Key: SPARK-40803 > URL: https://issues.apache.org/jira/browse/SPARK-40803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Assignee: Apache Spark >Priority: Major > > This look up in SparkConf is quite expensive and shows up in profiling for > cases where lots of smaller streams are created. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40803) LZ4CompressionCodec looks up configuration on each stream creation
[ https://issues.apache.org/jira/browse/SPARK-40803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618681#comment-17618681 ] Apache Spark commented on SPARK-40803: -- User 'eejbyfeldt' has created a pull request for this issue: https://github.com/apache/spark/pull/38282 > LZ4CompressionCodec looks up configuration on each stream creation > -- > > Key: SPARK-40803 > URL: https://issues.apache.org/jira/browse/SPARK-40803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > This look up in SparkConf is quite expensive and shows up in profiling for > cases where lots of smaller streams are created. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40803) LZ4CompressionCodec looks up configuration on each stream creation
[ https://issues.apache.org/jira/browse/SPARK-40803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40803: Assignee: (was: Apache Spark) > LZ4CompressionCodec looks up configuration on each stream creation > -- > > Key: SPARK-40803 > URL: https://issues.apache.org/jira/browse/SPARK-40803 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Emil Ejbyfeldt >Priority: Major > > This look up in SparkConf is quite expensive and shows up in profiling for > cases where lots of smaller streams are created. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40796) Check the generated python protos in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-40796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-40796: - Assignee: Ruifeng Zheng > Check the generated python protos in GitHub Actions > --- > > Key: SPARK-40796 > URL: https://issues.apache.org/jira/browse/SPARK-40796 > Project: Spark > Issue Type: Sub-task > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40796) Check the generated python protos in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-40796?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-40796. --- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38253 [https://github.com/apache/spark/pull/38253] > Check the generated python protos in GitHub Actions > --- > > Key: SPARK-40796 > URL: https://issues.apache.org/jira/browse/SPARK-40796 > Project: Spark > Issue Type: Sub-task > Components: Build, Connect >Affects Versions: 3.4.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40737) Add basic support for DataFrameWriter
[ https://issues.apache.org/jira/browse/SPARK-40737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618659#comment-17618659 ] Apache Spark commented on SPARK-40737: -- User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/38281 > Add basic support for DataFrameWriter > - > > Key: SPARK-40737 > URL: https://issues.apache.org/jira/browse/SPARK-40737 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Martin Grund >Assignee: Martin Grund >Priority: Major > Fix For: 3.4.0 > > > A key element of using Spark Connect is going to be to be able to write data > from a logical plan. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40790) Check error classes in DDL parsing tests
[ https://issues.apache.org/jira/browse/SPARK-40790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40790: Assignee: (was: Apache Spark) > Check error classes in DDL parsing tests > > > Key: SPARK-40790 > URL: https://issues.apache.org/jira/browse/SPARK-40790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in DDL command tests by using checkError(). For instance > - AlterNamespaceSetPropertiesParserSuite > - AlterTableDropPartitionParserSuite > - AlterTableRenameParserSuite > - AlterTableRecoverPartitionsParserSuite > - DescribeTableParserSuite > - TruncateTableParserSuite > - AlterTableSetSerdeParserSuite > - ShowPartitionsParserSuite > [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40790) Check error classes in DDL parsing tests
[ https://issues.apache.org/jira/browse/SPARK-40790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40790: Assignee: Apache Spark > Check error classes in DDL parsing tests > > > Key: SPARK-40790 > URL: https://issues.apache.org/jira/browse/SPARK-40790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in DDL command tests by using checkError(). For instance > - AlterNamespaceSetPropertiesParserSuite > - AlterTableDropPartitionParserSuite > - AlterTableRenameParserSuite > - AlterTableRecoverPartitionsParserSuite > - DescribeTableParserSuite > - TruncateTableParserSuite > - AlterTableSetSerdeParserSuite > - ShowPartitionsParserSuite > [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40790) Check error classes in DDL parsing tests
[ https://issues.apache.org/jira/browse/SPARK-40790?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618657#comment-17618657 ] Apache Spark commented on SPARK-40790: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38280 > Check error classes in DDL parsing tests > > > Key: SPARK-40790 > URL: https://issues.apache.org/jira/browse/SPARK-40790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in DDL command tests by using checkError(). For instance > - AlterNamespaceSetPropertiesParserSuite > - AlterTableDropPartitionParserSuite > - AlterTableRenameParserSuite > - AlterTableRecoverPartitionsParserSuite > - DescribeTableParserSuite > - TruncateTableParserSuite > - AlterTableSetSerdeParserSuite > - ShowPartitionsParserSuite > [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40790) Check error classes in DDL parsing tests
[ https://issues.apache.org/jira/browse/SPARK-40790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-40790: Description: Check error classes in DDL command tests by using checkError(). For instance - AlterNamespaceSetPropertiesParserSuite - AlterTableDropPartitionParserSuite - AlterTableRenameParserSuite - AlterTableRecoverPartitionsParserSuite - DescribeTableParserSuite - TruncateTableParserSuite - AlterTableSetSerdeParserSuite - ShowPartitionsParserSuite [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] was: Check error classes in DDL command tests by using checkError(). For instance - AlterNamespaceSetPropertiesParserSuite - AlterTableDropPartitionParserSuite - AlterTableRenameParserSuite - CreateNamespaceParserSuite - AlterTableRecoverPartitionsParserSuite - DescribeTableParserSuite - TruncateTableParserSuite - AlterTableSetSerdeParserSuite - ShowPartitionsParserSuite [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] > Check error classes in DDL parsing tests > > > Key: SPARK-40790 > URL: https://issues.apache.org/jira/browse/SPARK-40790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in DDL command tests by using checkError(). For instance > - AlterNamespaceSetPropertiesParserSuite > - AlterTableDropPartitionParserSuite > - AlterTableRenameParserSuite > - AlterTableRecoverPartitionsParserSuite > - DescribeTableParserSuite > - TruncateTableParserSuite > - AlterTableSetSerdeParserSuite > - ShowPartitionsParserSuite > [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40816) Python: rename LogicalPlan.collect to LogicalPlan.to_proto
[ https://issues.apache.org/jira/browse/SPARK-40816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40816: Assignee: Apache Spark > Python: rename LogicalPlan.collect to LogicalPlan.to_proto > -- > > Key: SPARK-40816 > URL: https://issues.apache.org/jira/browse/SPARK-40816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40816) Python: rename LogicalPlan.collect to LogicalPlan.to_proto
[ https://issues.apache.org/jira/browse/SPARK-40816?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618421#comment-17618421 ] Apache Spark commented on SPARK-40816: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38279 > Python: rename LogicalPlan.collect to LogicalPlan.to_proto > -- > > Key: SPARK-40816 > URL: https://issues.apache.org/jira/browse/SPARK-40816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40816) Python: rename LogicalPlan.collect to LogicalPlan.to_proto
[ https://issues.apache.org/jira/browse/SPARK-40816?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40816: Assignee: (was: Apache Spark) > Python: rename LogicalPlan.collect to LogicalPlan.to_proto > -- > > Key: SPARK-40816 > URL: https://issues.apache.org/jira/browse/SPARK-40816 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40816) Python: rename LogicalPlan.collect to LogicalPlan.to_proto
Rui Wang created SPARK-40816: Summary: Python: rename LogicalPlan.collect to LogicalPlan.to_proto Key: SPARK-40816 URL: https://issues.apache.org/jira/browse/SPARK-40816 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40780) Add WHERE to Connect proto and DSL
[ https://issues.apache.org/jira/browse/SPARK-40780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618410#comment-17618410 ] Apache Spark commented on SPARK-40780: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38278 > Add WHERE to Connect proto and DSL > -- > > Key: SPARK-40780 > URL: https://issues.apache.org/jira/browse/SPARK-40780 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40780) Add WHERE to Connect proto and DSL
[ https://issues.apache.org/jira/browse/SPARK-40780?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618411#comment-17618411 ] Apache Spark commented on SPARK-40780: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38278 > Add WHERE to Connect proto and DSL > -- > > Key: SPARK-40780 > URL: https://issues.apache.org/jira/browse/SPARK-40780 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618408#comment-17618408 ] Apache Spark commented on SPARK-40809: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38278 > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits
[ https://issues.apache.org/jira/browse/SPARK-40815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618397#comment-17618397 ] Apache Spark commented on SPARK-40815: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/38277 > SymlinkTextInputFormat returns incorrect result due to enabled > spark.hadoopRDD.ignoreEmptySplits > > > Key: SPARK-40815 > URL: https://issues.apache.org/jira/browse/SPARK-40815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits
[ https://issues.apache.org/jira/browse/SPARK-40815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40815: Assignee: Apache Spark > SymlinkTextInputFormat returns incorrect result due to enabled > spark.hadoopRDD.ignoreEmptySplits > > > Key: SPARK-40815 > URL: https://issues.apache.org/jira/browse/SPARK-40815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Ivan Sadikov >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits
[ https://issues.apache.org/jira/browse/SPARK-40815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618398#comment-17618398 ] Apache Spark commented on SPARK-40815: -- User 'sadikovi' has created a pull request for this issue: https://github.com/apache/spark/pull/38277 > SymlinkTextInputFormat returns incorrect result due to enabled > spark.hadoopRDD.ignoreEmptySplits > > > Key: SPARK-40815 > URL: https://issues.apache.org/jira/browse/SPARK-40815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits
[ https://issues.apache.org/jira/browse/SPARK-40815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40815: Assignee: (was: Apache Spark) > SymlinkTextInputFormat returns incorrect result due to enabled > spark.hadoopRDD.ignoreEmptySplits > > > Key: SPARK-40815 > URL: https://issues.apache.org/jira/browse/SPARK-40815 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.2.2, 3.4.0 >Reporter: Ivan Sadikov >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40815) SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits
Ivan Sadikov created SPARK-40815: Summary: SymlinkTextInputFormat returns incorrect result due to enabled spark.hadoopRDD.ignoreEmptySplits Key: SPARK-40815 URL: https://issues.apache.org/jira/browse/SPARK-40815 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.2, 3.3.0, 3.4.0 Reporter: Ivan Sadikov -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618389#comment-17618389 ] Hyukjin Kwon commented on SPARK-40802: -- I guess the problem is that {{getMetaData}} doesn't gurantee to work in all cases or all DBMSes. We could probably introduce a dialect to optimize this further. > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40802) Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()
[ https://issues.apache.org/jira/browse/SPARK-40802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40802: - Summary: Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery() (was: [SQL] Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve schema instead of PreparedStatement.executeQuery()) > Enhance JDBC Connector to use PreparedStatement.getMetaData() to resolve > schema instead of PreparedStatement.executeQuery() > --- > > Key: SPARK-40802 > URL: https://issues.apache.org/jira/browse/SPARK-40802 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Mingli Rui >Priority: Major > Original Estimate: 24h > Remaining Estimate: 24h > > Currently, Spark JDBC Connector uses *PreparedStatement.executeQuery()* to > resolve the JDBCRelation's schema. The schema query is like *s"SELECT * FROM > $table_or_query WHERE 1=0".* > But it is not necessary to execute the query. It's enough to *prepare* the > query. With preparing the statement, the query is parsed and compiled, but is > not executed. It will be more efficient. > So, it's better to use PreparedStatement.getMetaData() to resolve schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40808: - Component/s: SQL (was: Spark Core) > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > {code:java} > header=True > mergeSchema=True > inferSchema=True{code} > When I am reading this single file: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22{code} > I am getting this schema: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string{code} > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22 > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2 > {code} > result: > {code:java} > int_col=string > string_col=string > decimal_col=string > date_col=string > int2_col=int{code} > When I am reading only the second file, it looks fine: > {code:java} > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2{code} > result: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string > int2_col=int{code} > For conclusion, it looks like there is a bug mixing the two features: header > recognition and merge schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618388#comment-17618388 ] Hyukjin Kwon commented on SPARK-40808: -- Yeah reproducer would be helpful to assess this ticket further. > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > {code:java} > header=True > mergeSchema=True > inferSchema=True{code} > When I am reading this single file: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22{code} > I am getting this schema: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string{code} > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22 > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2 > {code} > result: > {code:java} > int_col=string > string_col=string > decimal_col=string > date_col=string > int2_col=int{code} > When I am reading only the second file, it looks fine: > {code:java} > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2{code} > result: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string > int2_col=int{code} > For conclusion, it looks like there is a bug mixing the two features: header > recognition and merge schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618386#comment-17618386 ] Hyukjin Kwon commented on SPARK-40814: -- Spark 2.4.x is EOL. Mind trying if the same issue persists in Spark 3+? > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 > kubernetes-client:v6.1.1 >Reporter: jiangjian >Priority: Major > Attachments: Dockerfile, spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/j
[jira] [Updated] (SPARK-40790) Check error classes in DDL parsing tests
[ https://issues.apache.org/jira/browse/SPARK-40790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] BingKun Pan updated SPARK-40790: Description: Check error classes in DDL command tests by using checkError(). For instance - AlterNamespaceSetPropertiesParserSuite - AlterTableDropPartitionParserSuite - AlterTableRenameParserSuite - CreateNamespaceParserSuite - AlterTableRecoverPartitionsParserSuite - DescribeTableParserSuite - TruncateTableParserSuite - AlterTableSetSerdeParserSuite - ShowPartitionsParserSuite [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] was: Check error classes in DDL command tests by using checkError(). For instance - AlterNamespaceSetPropertiesParserSuite - AlterTableDropPartitionParserSuite https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43 > Check error classes in DDL parsing tests > > > Key: SPARK-40790 > URL: https://issues.apache.org/jira/browse/SPARK-40790 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in DDL command tests by using checkError(). For instance > - AlterNamespaceSetPropertiesParserSuite > - AlterTableDropPartitionParserSuite > - AlterTableRenameParserSuite > - CreateNamespaceParserSuite > - AlterTableRecoverPartitionsParserSuite > - DescribeTableParserSuite > - TruncateTableParserSuite > - AlterTableSetSerdeParserSuite > - ShowPartitionsParserSuite > [https://github.com/apache/spark/blob/414771d4e8b52d0a76a7729d005794dc04f1e075/sql/core/src/test/scala/org/apache/spark/sql/execution/command/AlterNamespaceSetPropertiesParserSuite.scala#L43] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-40814: - Priority: Major (was: Blocker) > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 > kubernetes-client:v6.1.1 >Reporter: jiangjian >Priority: Major > Attachments: Dockerfile, spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s
[jira] [Updated] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangjian updated SPARK-40814: -- Priority: Blocker (was: Major) > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 > kubernetes-client:v6.1.1 >Reporter: jiangjian >Priority: Blocker > Attachments: Dockerfile, spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.Ser
[jira] [Updated] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangjian updated SPARK-40814: -- Component/s: Spark Submit > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy, Spark Submit >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 > kubernetes-client:v6.1.1 >Reporter: jiangjian >Priority: Major > Attachments: Dockerfile, spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletCont
[jira] [Updated] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangjian updated SPARK-40814: -- Attachment: Dockerfile > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 > kubernetes-client:v6.1.1 >Reporter: jiangjian >Priority: Major > Attachments: Dockerfile, spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@1894e4
[jira] [Updated] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangjian updated SPARK-40814: -- Environment: k8s version: v1.18.9 spark version: v2.4.0 kubernetes-client:v6.1.1 was: k8s version: v1.18.9 spark version: v2.4.0 > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 > kubernetes-client:v6.1.1 >Reporter: jiangjian >Priority: Major > Attachments: spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@S
[jira] [Updated] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
[ https://issues.apache.org/jira/browse/SPARK-40814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiangjian updated SPARK-40814: -- Attachment: spark-error.log > Exception in thread "main" java.lang.NoClassDefFoundError: > io/fabric8/kubernetes/client/KubernetesClient > > > Key: SPARK-40814 > URL: https://issues.apache.org/jira/browse/SPARK-40814 > Project: Spark > Issue Type: Bug > Components: Deploy >Affects Versions: 2.4.0 > Environment: k8s version: v1.18.9 > spark version: v2.4.0 >Reporter: jiangjian >Priority: Major > Attachments: spark-error.log > > > After I change the user in the Spark image, the running program reports an > error. What is the problem > ++ id -u > + myuid=2023 > ++ id -g > + mygid=2023 > + set +e > ++ getent passwd 2023 > + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh > + set -e > + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' > + SPARK_K8S_CMD=driver > + case "$SPARK_K8S_CMD" in > + shift 1 > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -n '' ']' > + PYSPARK_ARGS= > + '[' -n '' ']' > + R_ARGS= > + '[' -n '' ']' > + '[' '' == 2 ']' > + '[' '' == 3 ']' > + case "$SPARK_K8S_CMD" in > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file > /opt/spark/conf/spark.properties --class > com.frontier.pueedas.computer.batchTool.etl.EtlScheduler > 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' > configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS > startDate=2022-08-02 endDate=2022-08-03 > _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml > runMode=TEST > 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop > library for your platform... using builtin-java classes where applicable > 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 > 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: > [TEST]ETL[2022-08-02 00:00:00,2022-08-03 > 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: > zndw,root > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups > to: > 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: > authentication disabled; ui acls disabled; users with view permissions: > Set(zndw, root); groups with view permissions: Set(); users with modify > permissions: Set(zndw, root); groups with modify permissions: Set() > 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service > 'sparkDriver' on port 7078. > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using > org.apache.spark.storage.DefaultTopologyMapper for getting topology > information > 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - > BlockManagerMasterEndpoint up > 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at > /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 > 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity > 912.3 MB > 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator > 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms > 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: > unknown, git hash: unknown > 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms > 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started > ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} > 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' > on port 4040. > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@Spark} > 2022-10-14 06:52:30 INFO ContextHandler:781 - Started > o.s.j.s.ServletContextHandler@1894e40d\{/jobs/job,null,AVAILABLE,@Spar
[jira] [Created] (SPARK-40814) Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient
jiangjian created SPARK-40814: - Summary: Exception in thread "main" java.lang.NoClassDefFoundError: io/fabric8/kubernetes/client/KubernetesClient Key: SPARK-40814 URL: https://issues.apache.org/jira/browse/SPARK-40814 Project: Spark Issue Type: Bug Components: Deploy Affects Versions: 2.4.0 Environment: k8s version: v1.18.9 spark version: v2.4.0 Reporter: jiangjian After I change the user in the Spark image, the running program reports an error. What is the problem ++ id -u + myuid=2023 ++ id -g + mygid=2023 + set +e ++ getent passwd 2023 + uidentry=zndw:x:2023:2023::/home/zndw:/bin/sh + set -e + '[' -z zndw:x:2023:2023::/home/zndw:/bin/sh ']' + SPARK_K8S_CMD=driver + case "$SPARK_K8S_CMD" in + shift 1 + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -n '' ']' + PYSPARK_ARGS= + '[' -n '' ']' + R_ARGS= + '[' -n '' ']' + '[' '' == 2 ']' + '[' '' == 3 ']' + case "$SPARK_K8S_CMD" in + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /sbin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.1.1.11 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class com.frontier.pueedas.computer.batchTool.etl.EtlScheduler 'http://26.47.128.120:18000/spark/spark/raw/master/computer-batch-etl-hadoop-basic.jar?inline=false' configMode=HDFS metaMode=HDFS platformConfigMode=NACOS storeConfigMode=NACOS startDate=2022-08-02 endDate=2022-08-03 _file=/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml runMode=TEST 2022-10-14 06:52:21 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 2022-10-14 06:52:29 INFO SparkContext:54 - Running Spark version 2.4.0 2022-10-14 06:52:29 INFO SparkContext:54 - Submitted application: [TEST]ETL[2022-08-02 00:00:00,2022-08-03 00:00:00]\{/user/config/YC2/TEST/config/computer/business/opc/EMpHpReadCurveHourData.xml} 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls to: zndw,root 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls to: zndw,root 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing view acls groups to: 2022-10-14 06:52:29 INFO SecurityManager:54 - Changing modify acls groups to: 2022-10-14 06:52:29 INFO SecurityManager:54 - SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(zndw, root); groups with view permissions: Set(); users with modify permissions: Set(zndw, root); groups with modify permissions: Set() 2022-10-14 06:52:29 INFO Utils:54 - Successfully started service 'sparkDriver' on port 7078. 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering MapOutputTracker 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering BlockManagerMaster 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information 2022-10-14 06:52:29 INFO BlockManagerMasterEndpoint:54 - BlockManagerMasterEndpoint up 2022-10-14 06:52:29 INFO DiskBlockManager:54 - Created local directory at /var/data/spark-9a270950-7527-4d08-a7bd-d6c1062e8522/blockmgr-79ab0f0d-6f9e-401e-aa90-91baa00a3ff3 2022-10-14 06:52:29 INFO MemoryStore:54 - MemoryStore started with capacity 912.3 MB 2022-10-14 06:52:29 INFO SparkEnv:54 - Registering OutputCommitCoordinator 2022-10-14 06:52:30 INFO log:192 - Logging initialized @9926ms 2022-10-14 06:52:30 INFO Server:351 - jetty-9.3.z-SNAPSHOT, build timestamp: unknown, git hash: unknown 2022-10-14 06:52:30 INFO Server:419 - Started @10035ms 2022-10-14 06:52:30 INFO AbstractConnector:278 - Started ServerConnector@66f0548d\{HTTP/1.1,[http/1.1]}{0.0.0.0:4040} 2022-10-14 06:52:30 INFO Utils:54 - Successfully started service 'SparkUI' on port 4040. 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@59ed3e6c\{/jobs,null,AVAILABLE,@Spark} 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@70c53dbe\{/jobs/json,null,AVAILABLE,@Spark} 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@1894e40d\{/jobs/job,null,AVAILABLE,@Spark} 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@7342e05d\{/jobs/job/json,null,AVAILABLE,@Spark} 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@2a331b46\{/stages,null,AVAILABLE,@Spark} 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@15383681\{/stages/json,null,AVAILABLE,@Spark} 2022-10-14 06:52:30 INFO ContextHandler:781 - Started o.s.j.s.ServletContextHandler@743e66f7\{/stages/stag
[jira] [Commented] (SPARK-40812) Add Deduplicate to Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618344#comment-17618344 ] Apache Spark commented on SPARK-40812: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38276 > Add Deduplicate to Connect proto > > > Key: SPARK-40812 > URL: https://issues.apache.org/jira/browse/SPARK-40812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40812) Add Deduplicate to Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40812: Assignee: Apache Spark > Add Deduplicate to Connect proto > > > Key: SPARK-40812 > URL: https://issues.apache.org/jira/browse/SPARK-40812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40812) Add Deduplicate to Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618343#comment-17618343 ] Apache Spark commented on SPARK-40812: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38276 > Add Deduplicate to Connect proto > > > Key: SPARK-40812 > URL: https://issues.apache.org/jira/browse/SPARK-40812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40812) Add Deduplicate to Connect proto
[ https://issues.apache.org/jira/browse/SPARK-40812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40812: Assignee: (was: Apache Spark) > Add Deduplicate to Connect proto > > > Key: SPARK-40812 > URL: https://issues.apache.org/jira/browse/SPARK-40812 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40813) Add limit and offset to Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40813: Assignee: (was: Apache Spark) > Add limit and offset to Connect DSL > --- > > Key: SPARK-40813 > URL: https://issues.apache.org/jira/browse/SPARK-40813 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40813) Add limit and offset to Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618342#comment-17618342 ] Apache Spark commented on SPARK-40813: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38275 > Add limit and offset to Connect DSL > --- > > Key: SPARK-40813 > URL: https://issues.apache.org/jira/browse/SPARK-40813 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40813) Add limit and offset to Connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40813: Assignee: Apache Spark > Add limit and offset to Connect DSL > --- > > Key: SPARK-40813 > URL: https://issues.apache.org/jira/browse/SPARK-40813 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40813) Add limit and offset to Connect DSL
Rui Wang created SPARK-40813: Summary: Add limit and offset to Connect DSL Key: SPARK-40813 URL: https://issues.apache.org/jira/browse/SPARK-40813 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40812) Add Deduplicate to Connect proto
Rui Wang created SPARK-40812: Summary: Add Deduplicate to Connect proto Key: SPARK-40812 URL: https://issues.apache.org/jira/browse/SPARK-40812 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40811) Use checkError() to intercept ParseException
[ https://issues.apache.org/jira/browse/SPARK-40811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40811: Assignee: (was: Apache Spark) > Use checkError() to intercept ParseException > > > Key: SPARK-40811 > URL: https://issues.apache.org/jira/browse/SPARK-40811 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Port the following test suites onto checkError(): > - SQLViewSuite > - JDBCTableCatalogSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40811) Use checkError() to intercept ParseException
[ https://issues.apache.org/jira/browse/SPARK-40811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618300#comment-17618300 ] Apache Spark commented on SPARK-40811: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38267 > Use checkError() to intercept ParseException > > > Key: SPARK-40811 > URL: https://issues.apache.org/jira/browse/SPARK-40811 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Port the following test suites onto checkError(): > - SQLViewSuite > - JDBCTableCatalogSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40811) Use checkError() to intercept ParseException
[ https://issues.apache.org/jira/browse/SPARK-40811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40811: Assignee: Apache Spark > Use checkError() to intercept ParseException > > > Key: SPARK-40811 > URL: https://issues.apache.org/jira/browse/SPARK-40811 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Port the following test suites onto checkError(): > - SQLViewSuite > - JDBCTableCatalogSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40811) Use checkError() to intercept ParseException
[ https://issues.apache.org/jira/browse/SPARK-40811?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618299#comment-17618299 ] Apache Spark commented on SPARK-40811: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/38267 > Use checkError() to intercept ParseException > > > Key: SPARK-40811 > URL: https://issues.apache.org/jira/browse/SPARK-40811 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Port the following test suites onto checkError(): > - SQLViewSuite > - JDBCTableCatalogSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40811) Use checkError() to intercept ParseException
[ https://issues.apache.org/jira/browse/SPARK-40811?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk updated SPARK-40811: - Description: Port the following test suites onto checkError(): - SQLViewSuite - JDBCTableCatalogSuite was: Port the following test suites onto checkError(): - SQLViewSuite > Use checkError() to intercept ParseException > > > Key: SPARK-40811 > URL: https://issues.apache.org/jira/browse/SPARK-40811 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Priority: Major > > Port the following test suites onto checkError(): > - SQLViewSuite > - JDBCTableCatalogSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40811) Use checkError() to intercept ParseException
Max Gekk created SPARK-40811: Summary: Use checkError() to intercept ParseException Key: SPARK-40811 URL: https://issues.apache.org/jira/browse/SPARK-40811 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: Max Gekk Port the following test suites onto checkError(): - SQLViewSuite -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40786) Check error classes in PlanParserSuite
[ https://issues.apache.org/jira/browse/SPARK-40786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-40786. -- Resolution: Fixed Issue resolved by pull request 38271 [https://github.com/apache/spark/pull/38271] > Check error classes in PlanParserSuite > -- > > Key: SPARK-40786 > URL: https://issues.apache.org/jira/browse/SPARK-40786 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: BingKun Pan >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in PlanParserSuite by using checkError(). For instance, > replace > {code:scala} > intercept("EXPLAIN logical SELECT 1", "Unsupported SQL statement") > {code} > by > {code:scala} > checkError( > exception = parseException("EXPLAIN logical SELECT 1"), > errorClass = "...", > parameters = Map.empty, > context = ...) > {code} > at > https://github.com/apache/spark/blob/35d00df9bba7238ad4f40617fae4d04ddbfd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L225 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40786) Check error classes in PlanParserSuite
[ https://issues.apache.org/jira/browse/SPARK-40786?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-40786: Assignee: BingKun Pan > Check error classes in PlanParserSuite > -- > > Key: SPARK-40786 > URL: https://issues.apache.org/jira/browse/SPARK-40786 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: BingKun Pan >Priority: Major > Labels: starter > Fix For: 3.4.0 > > > Check error classes in PlanParserSuite by using checkError(). For instance, > replace > {code:scala} > intercept("EXPLAIN logical SELECT 1", "Unsupported SQL statement") > {code} > by > {code:scala} > checkError( > exception = parseException("EXPLAIN logical SELECT 1"), > errorClass = "...", > parameters = Map.empty, > context = ...) > {code} > at > https://github.com/apache/spark/blob/35d00df9bba7238ad4f40617fae4d04ddbfd/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/PlanParserSuite.scala#L225 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40728) Upgrade ASM to 9.4
[ https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-40728: - Priority: Minor (was: Major) > Upgrade ASM to 9.4 > -- > > Key: SPARK-40728 > URL: https://issues.apache.org/jira/browse/SPARK-40728 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40728) Upgrade ASM to 9.4
[ https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-40728. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38189 [https://github.com/apache/spark/pull/38189] > Upgrade ASM to 9.4 > -- > > Key: SPARK-40728 > URL: https://issues.apache.org/jira/browse/SPARK-40728 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40728) Upgrade ASM to 9.4
[ https://issues.apache.org/jira/browse/SPARK-40728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-40728: Assignee: Yang Jie > Upgrade ASM to 9.4 > -- > > Key: SPARK-40728 > URL: https://issues.apache.org/jira/browse/SPARK-40728 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 3.4.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618277#comment-17618277 ] ming95 commented on SPARK-40808: [~ohadm] Can you provide code to reproduce this issue. > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > {code:java} > header=True > mergeSchema=True > inferSchema=True{code} > When I am reading this single file: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22{code} > I am getting this schema: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string{code} > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22 > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2 > {code} > result: > {code:java} > int_col=string > string_col=string > decimal_col=string > date_col=string > int2_col=int{code} > When I am reading only the second file, it looks fine: > {code:java} > File2: > "int_col","string_col","decimal_col","date_col","int2_col" > 1,"hello",1.43,2022-02-23,234 > 2,"world",5.534,2021-05-05,5 > 3,"my name",86.455,2011-08-15,32 > 4,"is ohad",6.234,2002-03-22,2{code} > result: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string > int2_col=int{code} > For conclusion, it looks like there is a bug mixing the two features: header > recognition and merge schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-40809. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 38272 [https://github.com/apache/spark/pull/38272] > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-40809: Assignee: Rui Wang > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40588) Sorting issue with AQE turned on
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618265#comment-17618265 ] ming95 commented on SPARK-40588: After my test, I think this is not a problem of AQE, because the reproduced code I used, and after setting spark.sql.adaptive.enabled to false, the sort still does not take effect. !image-2022-10-16-22-05-47-159.png! It can be reproduced by modifying a few parameters and running in spark local: ``` val partitions = 200 val minRand = 100 val maxRand = 300 ``` The real problem seems to be in the sort + partitionBy operation. > Sorting issue with AQE turned on > -- > > Key: SPARK-40588 > URL: https://issues.apache.org/jira/browse/SPARK-40588 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.3 > Environment: Spark v3.1.3 > Scala v2.12.13 >Reporter: Swetha Baskaran >Priority: Major > Attachments: image-2022-10-16-22-05-47-159.png > > > We are attempting to partition data by a few columns, sort by a particular > _sortCol_ and write out one file per partition. > {code:java} > df > .repartition(col("day"), col("month"), col("year")) > .withColumn("partitionId",spark_partition_id) > .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId) > .sortWithinPartitions("year", "month", "day", "sortCol") > .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId) > .write > .partitionBy("year", "month", "day") > .parquet(path){code} > When inspecting the results, we observe one file per partition, however we > see an _alternating_ pattern of unsorted rows in some files. > {code:java} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code} > Here is a > [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to > reproduce the issue. > Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) > fixes the issue. > I'm working on identifying why AQE affects the sort order. Any leads or > thoughts would be appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40588) Sorting issue with AQE turned on
[ https://issues.apache.org/jira/browse/SPARK-40588?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ming95 updated SPARK-40588: --- Attachment: image-2022-10-16-22-05-47-159.png > Sorting issue with AQE turned on > -- > > Key: SPARK-40588 > URL: https://issues.apache.org/jira/browse/SPARK-40588 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.3 > Environment: Spark v3.1.3 > Scala v2.12.13 >Reporter: Swetha Baskaran >Priority: Major > Attachments: image-2022-10-16-22-05-47-159.png > > > We are attempting to partition data by a few columns, sort by a particular > _sortCol_ and write out one file per partition. > {code:java} > df > .repartition(col("day"), col("month"), col("year")) > .withColumn("partitionId",spark_partition_id) > .withColumn("monotonicallyIncreasingIdUnsorted",monotonicallyIncreasingId) > .sortWithinPartitions("year", "month", "day", "sortCol") > .withColumn("monotonicallyIncreasingIdSorted",monotonicallyIncreasingId) > .write > .partitionBy("year", "month", "day") > .parquet(path){code} > When inspecting the results, we observe one file per partition, however we > see an _alternating_ pattern of unsorted rows in some files. > {code:java} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832121344,"monotonicallyIncreasingIdSorted":6287832121344} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877022389,"monotonicallyIncreasingIdSorted":6287876860586} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287877567881,"monotonicallyIncreasingIdSorted":6287832121345} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287835105553,"monotonicallyIncreasingIdSorted":6287876860587} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832570127,"monotonicallyIncreasingIdSorted":6287832121346} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287879965760,"monotonicallyIncreasingIdSorted":6287876860588} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287878762347,"monotonicallyIncreasingIdSorted":6287832121347} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287837165012,"monotonicallyIncreasingIdSorted":6287876860589} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287832910545,"monotonicallyIncreasingIdSorted":6287832121348} > {"sortCol":1303413,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287881244758,"monotonicallyIncreasingIdSorted":6287876860590} > {"sortCol":10,"partitionId":732,"monotonicallyIncreasingIdUnsorted":6287880041345,"monotonicallyIncreasingIdSorted":6287832121349}{code} > Here is a > [gist|https://gist.github.com/Swebask/543030748a768be92d3c0ae343d2ae89] to > reproduce the issue. > Turning off AQE with spark.conf.set("spark.sql.adaptive.enabled", false) > fixes the issue. > I'm working on identifying why AQE affects the sort order. Any leads or > thoughts would be appreciated! -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40810) Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand
[ https://issues.apache.org/jira/browse/SPARK-40810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618263#comment-17618263 ] Apache Spark commented on SPARK-40810: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38274 > Use SparkIllegalArgumentException instead of IllegalArgumentException in > CreateDatabaseCommand & AlterDatabaseSetLocationCommand > > > Key: SPARK-40810 > URL: https://issues.apache.org/jira/browse/SPARK-40810 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40810) Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand
[ https://issues.apache.org/jira/browse/SPARK-40810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40810: Assignee: Apache Spark > Use SparkIllegalArgumentException instead of IllegalArgumentException in > CreateDatabaseCommand & AlterDatabaseSetLocationCommand > > > Key: SPARK-40810 > URL: https://issues.apache.org/jira/browse/SPARK-40810 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40810) Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand
[ https://issues.apache.org/jira/browse/SPARK-40810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618262#comment-17618262 ] Apache Spark commented on SPARK-40810: -- User 'panbingkun' has created a pull request for this issue: https://github.com/apache/spark/pull/38274 > Use SparkIllegalArgumentException instead of IllegalArgumentException in > CreateDatabaseCommand & AlterDatabaseSetLocationCommand > > > Key: SPARK-40810 > URL: https://issues.apache.org/jira/browse/SPARK-40810 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40810) Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand
[ https://issues.apache.org/jira/browse/SPARK-40810?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40810: Assignee: (was: Apache Spark) > Use SparkIllegalArgumentException instead of IllegalArgumentException in > CreateDatabaseCommand & AlterDatabaseSetLocationCommand > > > Key: SPARK-40810 > URL: https://issues.apache.org/jira/browse/SPARK-40810 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: BingKun Pan >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40810) Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand
BingKun Pan created SPARK-40810: --- Summary: Use SparkIllegalArgumentException instead of IllegalArgumentException in CreateDatabaseCommand & AlterDatabaseSetLocationCommand Key: SPARK-40810 URL: https://issues.apache.org/jira/browse/SPARK-40810 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.4.0 Reporter: BingKun Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37945: Assignee: Apache Spark > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618253#comment-17618253 ] Apache Spark commented on SPARK-37945: -- User 'khalidmammadov' has created a pull request for this issue: https://github.com/apache/spark/pull/38273 > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37945) Use error classes in the execution errors of arithmetic ops
[ https://issues.apache.org/jira/browse/SPARK-37945?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37945: Assignee: (was: Apache Spark) > Use error classes in the execution errors of arithmetic ops > --- > > Key: SPARK-37945 > URL: https://issues.apache.org/jira/browse/SPARK-37945 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: Max Gekk >Priority: Major > > Migrate the following errors in QueryExecutionErrors: > * overflowInSumOfDecimalError > * overflowInIntegralDivideError > * arithmeticOverflowError > * unaryMinusCauseOverflowError > * binaryArithmeticCauseOverflowError > * unscaledValueTooLargeForPrecisionError > * decimalPrecisionExceedsMaxPrecisionError > * outOfDecimalTypeRangeError > * integerOverflowError > onto use error classes. Throw an implementation of SparkThrowable. Also write > a test per every error in QueryExecutionErrorsSuite. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618239#comment-17618239 ] Apache Spark commented on SPARK-40809: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38272 > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618238#comment-17618238 ] Apache Spark commented on SPARK-40809: -- User 'amaliujia' has created a pull request for this issue: https://github.com/apache/spark/pull/38272 > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40809: Assignee: Apache Spark > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-40809: Assignee: (was: Apache Spark) > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40809) Add as(alias: String) to connect DSL
[ https://issues.apache.org/jira/browse/SPARK-40809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Rui Wang updated SPARK-40809: - Summary: Add as(alias: String) to connect DSL (was: Add as(alias) to connect DSL) > Add as(alias: String) to connect DSL > > > Key: SPARK-40809 > URL: https://issues.apache.org/jira/browse/SPARK-40809 > Project: Spark > Issue Type: Sub-task > Components: Connect >Affects Versions: 3.4.0 >Reporter: Rui Wang >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-40809) Add as(alias) to connect DSL
Rui Wang created SPARK-40809: Summary: Add as(alias) to connect DSL Key: SPARK-40809 URL: https://issues.apache.org/jira/browse/SPARK-40809 Project: Spark Issue Type: Sub-task Components: Connect Affects Versions: 3.4.0 Reporter: Rui Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39951) Support columnar batches with nested fields in Parquet V2
[ https://issues.apache.org/jira/browse/SPARK-39951?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang reassigned SPARK-39951: --- Assignee: Adam Binford (was: Apache Spark) > Support columnar batches with nested fields in Parquet V2 > - > > Key: SPARK-39951 > URL: https://issues.apache.org/jira/browse/SPARK-39951 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Adam Binford >Assignee: Adam Binford >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > Follow up to https://issues.apache.org/jira/browse/SPARK-34863 to updated > `supportsColumnarReads` to account for nested fields -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40563) Error at where clause, when sql case executes by else branch
[ https://issues.apache.org/jira/browse/SPARK-40563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17618234#comment-17618234 ] Yuming Wang commented on SPARK-40563: - [~Zing] Does branch-3.3 also fixed this issue? > Error at where clause, when sql case executes by else branch > > > Key: SPARK-40563 > URL: https://issues.apache.org/jira/browse/SPARK-40563 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Vadim >Priority: Major > Fix For: 3.3.1 > > Attachments: java-code-example.txt, sql.txt, stack-trace.txt > > > Hello! > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > - Spark verison 3.3.0 > - Scala version 2.12 > - DatasourceV2 > - Postgres > - Postrgres JDBC Driver: 42+ > - Java8 > Case: > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'foo'; -> works as expected* > *--* > select > case > when (t_name = 'foo') then 'foo' > else 'default' > end as case_when > from > t > where > case > when (t_name = 'foo') then 'foo' > else 'default' > end *= 'default'; -> query throw ex;* > In where clause when we try find rows by else branch, spark thrown exception: > The Spark SQL phase optimization failed with an internal error. Please, fill > a bug report in, and provide the full stack trace. > Caused by: java.lang.AssertionError: assertion failed > at scala.Predef$.assert(Predef.scala:208) > > org.apache.spark.sql.execution.datasources.v2.PushablePredicate.$anonfun$unapply$1(DataSourceV2Strategy.scala:589) > At debugger def unapply in PushablePredicate.class > when sql case return 'foo' -> function unapply accept: (t_name = 'foo'), as > instance of Predicate > when sql case return 'default' -> function unapply accept: COALESCE(t_name = > 'foo', FALSE) as instance of GeneralScalarExpression and assertation failed > with error > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39200) Stream is corrupted Exception while fetching the blocks from fallback storage system
[ https://issues.apache.org/jira/browse/SPARK-39200?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-39200: Fix Version/s: 3.3.1 (was: 3.3.2) > Stream is corrupted Exception while fetching the blocks from fallback storage > system > > > Key: SPARK-39200 > URL: https://issues.apache.org/jira/browse/SPARK-39200 > Project: Spark > Issue Type: Sub-task > Components: Shuffle >Affects Versions: 3.2.0 >Reporter: Rajendra Gujja >Assignee: Frank Yin >Priority: Major > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > When executor decommissioning and fallback storage is enabled - the shuffle > reads are failing with `FetchFailedException: Stream is corrupted` > ref: https://issues.apache.org/jira/browse/SPARK-18105 (search for > decommission) > > This is happening when the shuffle block is bigger than `inputstream.read` > can read in one attempt. The code path is not reading the block fully > (`readFully`) and the partial read is causing the exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40535) NPE from observe of collect_list
[ https://issues.apache.org/jira/browse/SPARK-40535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40535: Fix Version/s: 3.3.1 (was: 3.3.2) > NPE from observe of collect_list > > > Key: SPARK-40535 > URL: https://issues.apache.org/jira/browse/SPARK-40535 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Max Gekk >Assignee: jiaan.geng >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > The code below reproduces the issue: > {code:scala} > import org.apache.spark.sql.functions._ > val df = spark.range(1,10,1,11) > df.observe("collectedList", collect_list("id")).collect() > {code} > instead of > {code} > Array(1, 2, 3, 4, 5, 6, 7, 8, 9) > {code} > it fails with the NPE: > {code:java} > java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:641) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.getBufferObject(interfaces.scala:602) > at > org.apache.spark.sql.catalyst.expressions.aggregate.TypedImperativeAggregate.serializeAggregateBufferInPlace(interfaces.scala:624) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:205) > at > org.apache.spark.sql.execution.AggregatingAccumulator.withBufferSerialized(AggregatingAccumulator.scala:33) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40547) Fix dead links in sparkr-vignettes.Rmd
[ https://issues.apache.org/jira/browse/SPARK-40547?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40547: Fix Version/s: 3.3.1 (was: 3.3.2) > Fix dead links in sparkr-vignettes.Rmd > -- > > Key: SPARK-40547 > URL: https://issues.apache.org/jira/browse/SPARK-40547 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40322) Fix all dead links
[ https://issues.apache.org/jira/browse/SPARK-40322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40322: Fix Version/s: 3.4.0 3.3.1 (was: 3.3.2) > Fix all dead links > -- > > Key: SPARK-40322 > URL: https://issues.apache.org/jira/browse/SPARK-40322 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > > [https://www.deadlinkchecker.com/website-dead-link-checker.asp] > > > ||Status||URL||Source link text|| > |-1 Not found: The server name or address could not be > resolved|[http://engineering.ooyala.com/blog/using-parquet-and-scrooge-spark]|[Using > Parquet and Scrooge with Spark|https://spark.apache.org/documentation.html]| > |-1 Not found: The server name or address could not be > resolved|[http://blinkdb.org/]|[BlinkDB|https://spark.apache.org/third-party-projects.html]| > |404 Not > Found|[https://github.com/AyasdiOpenSource/df]|[DF|https://spark.apache.org/third-party-projects.html]| > |-1 Timeout|[https://atp.io/]|[atp|https://spark.apache.org/powered-by.html]| > |-1 Not found: The server name or address could not be > resolved|[http://www.sehir.edu.tr/en/]|[Istanbul Sehir > University|https://spark.apache.org/powered-by.html]| > |404 Not Found|[http://nsn.com/]|[Nokia Solutions and > Networks|https://spark.apache.org/powered-by.html]| > |-1 Not found: The server name or address could not be > resolved|[http://www.nubetech.co/]|[Nube > Technologies|https://spark.apache.org/powered-by.html]| > |-1 Timeout|[http://ooyala.com/]|[Ooyala, > Inc.|https://spark.apache.org/powered-by.html]| > |-1 Not found: The server name or address could not be > resolved|[http://engineering.ooyala.com/blog/fast-spark-queries-memory-datasets]|[Spark > for Fast Queries|https://spark.apache.org/powered-by.html]| > |-1 Not found: The server name or address could not be > resolved|[http://www.sisa.samsung.com/]|[Samsung Research > America|https://spark.apache.org/powered-by.html]| > |-1 > Timeout|[https://checker.apache.org/projs/spark.html]|[https://checker.apache.org/projs/spark.html|https://spark.apache.org/release-process.html]| > |404 Not Found|[https://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|[AMP > Camp 2 [302 from > http://ampcamp.berkeley.edu/amp-camp-two-strata-2013/]|https://spark.apache.org/documentation.html]| > |404 Not Found|[https://ampcamp.berkeley.edu/agenda-2012/]|[AMP Camp 1 [302 > from > http://ampcamp.berkeley.edu/agenda-2012/]|https://spark.apache.org/documentation.html]| > |404 Not Found|[https://ampcamp.berkeley.edu/4/]|[AMP Camp 4 [302 from > http://ampcamp.berkeley.edu/4/]|https://spark.apache.org/documentation.html]| > |404 Not Found|[https://ampcamp.berkeley.edu/3/]|[AMP Camp 3 [302 from > http://ampcamp.berkeley.edu/3/]|https://spark.apache.org/documentation.html]| > |-500 Internal Server > Error-|-[https://www.packtpub.com/product/spark-cookbook/9781783987061]-|-[Spark > Cookbook [301 from > https://www.packtpub.com/big-data-and-business-intelligence/spark-cookbook]|https://spark.apache.org/documentation.html]-| > |-500 Internal Server > Error-|-[https://www.packtpub.com/product/apache-spark-graph-processing/9781784391805]-|-[Apache > Spark Graph Processing [301 from > https://www.packtpub.com/big-data-and-business-intelligence/apache-spark-graph-processing]|https://spark.apache.org/documentation.html]-| > |500 Internal Server > Error|[https://prevalentdesignevents.com/sparksummit/eu17/]|[register|https://spark.apache.org/news/]| > |500 Internal Server > Error|[https://prevalentdesignevents.com/sparksummit/ss17/?_ga=1.211902866.780052874.1433437196]|[register|https://spark.apache.org/news/]| > |500 Internal Server > Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/registration.aspx?source=header]|[register|https://spark.apache.org/news/]| > |500 Internal Server > Error|[https://www.prevalentdesignevents.com/sparksummit2015/europe/speaker/]|[Spark > Summit Europe|https://spark.apache.org/news/]| > |-1 > Timeout|[http://strataconf.com/strata2013]|[Strata|https://spark.apache.org/news/]| > |-1 Not found: The server name or address could not be > resolved|[http://blog.quantifind.com/posts/spark-unit-test/]|[Unit testing > with Spark|https://spark.apache.org/news/]| > |-1 Not found: The server name or address could not be > resolved|[http://blog.quantifind.com/posts/logging-post/]|[Configuring > Spark's logs|https://spark.apache.org/news/]| > |-1 > Timeout|[http://strata.oreilly.com/2012/08/seven-reasons-why-i-like-spark.html]|[Spark|https://spark.apache.org/news/]| > |-1 > Timeout|[http://strata.oreilly.com/2012/11/shark-real-time-queries-and-a
[jira] [Updated] (SPARK-40562) Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy
[ https://issues.apache.org/jira/browse/SPARK-40562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40562: Fix Version/s: 3.3.1 (was: 3.3.2) > Add spark.sql.legacy.groupingIdWithAppendedUserGroupBy > -- > > Key: SPARK-40562 > URL: https://issues.apache.org/jira/browse/SPARK-40562 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0, 3.3.1, 3.2.3 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > {code:java} > scala> sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS > t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2) ").show() > +++ > |count(1)|grouping__id| > +++ > | 1| 2| > | 1| 2| > +++ > scala> sql("set spark.sql.legacy.groupingIdWithAppendedUserGroupBy=true") > res1: org.apache.spark.sql.DataFrame = [key: string, value: string]scala> > sql("SELECT count(*), grouping__id from (VALUES (1,1,1),(2,2,2)) AS > t(k1,k2,v) GROUP BY k1 GROUPING SETS (k2) ").show() > +++ > |count(1)|grouping__id| > +++ > | 1| 1| > | 1| 1| > +++ {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-38717) Handle Hive's bucket spec case preserving behaviour
[ https://issues.apache.org/jira/browse/SPARK-38717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-38717: Fix Version/s: 3.3.1 (was: 3.3.2) > Handle Hive's bucket spec case preserving behaviour > --- > > Key: SPARK-38717 > URL: https://issues.apache.org/jira/browse/SPARK-38717 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0 >Reporter: Peter Toth >Assignee: Peter Toth >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > {code} > CREATE TABLE t( > c STRING, > B_C STRING > ) > PARTITIONED BY (p_c STRING) > CLUSTERED BY (B_C) INTO 4 BUCKETS > STORED AS PARQUET > {code} > then > {code} > SELECT * FROM t > {code} > fails with: > {code} > Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Bucket columns > B_C is not part of the table columns ([FieldSchema(name:c, type:string, > comment:null), FieldSchema(name:b_c, type:string, comment:null)] > at > org.apache.hadoop.hive.ql.metadata.Table.setBucketCols(Table.java:552) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.toHiveTable(HiveClientImpl.scala:1098) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$getPartitionsByFilter$1(HiveClientImpl.scala:764) > at > org.apache.spark.sql.hive.client.HiveClientImpl.$anonfun$withHiveState$1(HiveClientImpl.scala:294) > at > org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:225) > at > org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:224) > at > org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:274) > at > org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:763) > at > org.apache.spark.sql.hive.HiveExternalCatalog.$anonfun$listPartitionsByFilter$1(HiveExternalCatalog.scala:1287) > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:101) > ... 110 more > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40583) Documentation error in "Integration with Cloud Infrastructures"
[ https://issues.apache.org/jira/browse/SPARK-40583?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40583: Fix Version/s: 3.3.1 (was: 3.3.2) > Documentation error in "Integration with Cloud Infrastructures" > --- > > Key: SPARK-40583 > URL: https://issues.apache.org/jira/browse/SPARK-40583 > Project: Spark > Issue Type: Bug > Components: Documentation >Affects Versions: 3.3.0 >Reporter: Daniel Ranchal >Assignee: Daniel Ranchal >Priority: Minor > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > The artifactId that implements the integration with several cloud > infrastructures is wrong. Instead of "hadoop-cloud-\{SCALA_VERSION}", it > should say "spark-hadoop-cloud-\{SCALA_VERSION}". -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40636) Fix wrong remained shuffles log in BlockManagerDecommissioner
[ https://issues.apache.org/jira/browse/SPARK-40636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40636: Fix Version/s: 3.3.1 (was: 3.3.2) > Fix wrong remained shuffles log in BlockManagerDecommissioner > - > > Key: SPARK-40636 > URL: https://issues.apache.org/jira/browse/SPARK-40636 > Project: Spark > Issue Type: Bug > Components: Block Manager >Affects Versions: 3.3.0 >Reporter: Zhongwei Zhu >Assignee: Zhongwei Zhu >Priority: Minor > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > BlockManagerDecommissioner should log correct remained shuffles. > {code:java} > 4 of 24 local shuffles are added. In total, 24 shuffles are remained. > 2022-09-30 17:42:15.035 PDT > 0 of 24 local shuffles are added. In total, 24 shuffles are remained. > 2022-09-30 17:42:45.069 PDT > 0 of 24 local shuffles are added. In total, 24 shuffles are remained.{code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40648) Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module
[ https://issues.apache.org/jira/browse/SPARK-40648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40648: Fix Version/s: 3.3.1 (was: 3.3.2) > Add `@ExtendedLevelDBTest` to the leveldb relevant tests in the yarn module > -- > > Key: SPARK-40648 > URL: https://issues.apache.org/jira/browse/SPARK-40648 > Project: Spark > Issue Type: Improvement > Components: Tests, YARN >Affects Versions: 3.2.2, 3.4.0, 3.3.1 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 3.4.0, 3.3.1 > > > SPARK-40490 make the test case related to `YarnShuffleIntegrationSuite` > starts to verify the registeredExecFile reload test scenario again,so we need > to add `@ExtendedLevelDBTest` for the test case using LevelDB so that the > `MacOs/Apple Silicon` can skip relevant tests through > `-Dtest.exclude.tags=org.apache.spark.tags.ExtendedLevelDBTest` -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40574) Add PURGE to DROP TABLE doc
[ https://issues.apache.org/jira/browse/SPARK-40574?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40574: Fix Version/s: 3.3.1 (was: 3.3.2) > Add PURGE to DROP TABLE doc > --- > > Key: SPARK-40574 > URL: https://issues.apache.org/jira/browse/SPARK-40574 > Project: Spark > Issue Type: Improvement > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Yuming Wang >Assignee: Yuming Wang >Priority: Major > Fix For: 3.4.0, 3.3.1, 3.2.3 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40612) On Kubernetes for long running app Spark using an invalid principal to renew the delegation token
[ https://issues.apache.org/jira/browse/SPARK-40612?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40612: Fix Version/s: 3.3.1 (was: 3.3.2) > On Kubernetes for long running app Spark using an invalid principal to renew > the delegation token > - > > Key: SPARK-40612 > URL: https://issues.apache.org/jira/browse/SPARK-40612 > Project: Spark > Issue Type: Bug > Components: Security >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0, > 3.1.3, 3.2.1, 3.3.0, 3.2.2 >Reporter: Attila Zsolt Piros >Assignee: Attila Zsolt Piros >Priority: Major > Fix For: 3.4.0, 3.3.1, 3.2.3 > > > When the delegation token fetched at the first time the principal is the > current user but the subsequent token renewals are using a MapReduce/Yarn > specific principal even on Kubernetes. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39725) Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622
[ https://issues.apache.org/jira/browse/SPARK-39725?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-39725: Fix Version/s: 3.3.1 (was: 3.3.2) > Upgrade jetty-http from 9.4.46.v20220331 to 9.4.48.v20220622 > > > Key: SPARK-39725 > URL: https://issues.apache.org/jira/browse/SPARK-39725 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.0 >Reporter: Bjørn Jørgensen >Assignee: Bjørn Jørgensen >Priority: Major > Fix For: 3.4.0, 3.3.1 > > Attachments: jetty-io-spark.png > > > [Release note |https://github.com/eclipse/jetty.project/releases] > [CVE-2022-2047|https://nvd.nist.gov/vuln/detail/CVE-2022-2047] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40682) Set spark.driver.maxResultSize to 3g in SqlBasedBenchmark
[ https://issues.apache.org/jira/browse/SPARK-40682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-40682: Fix Version/s: 3.3.1 (was: 3.3.2) > Set spark.driver.maxResultSize to 3g in SqlBasedBenchmark > - > > Key: SPARK-40682 > URL: https://issues.apache.org/jira/browse/SPARK-40682 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Minor > Fix For: 3.4.0, 3.3.1 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ohad updated SPARK-40808: - Description: Hello. I am writing unit-tests to some functionality in my application that reading data from CSV files using Spark. I am reading the data using: {code:java} header=True mergeSchema=True inferSchema=True{code} When I am reading this single file: {code:java} File1: "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22{code} I am getting this schema: {code:java} int_col=int string_col=string decimal_col=double date_col=string{code} When I am duplicating this file, I am getting the same schema. The strange part is when I am adding new int column, it looks like spark is getting confused and think that the column that already identified as int are now string: {code:java} File1: "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 {code} result: {code:java} int_col=string string_col=string decimal_col=string date_col=string int2_col=int{code} When I am reading only the second file, it looks fine: {code:java} File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2{code} result: {code:java} int_col=int string_col=string decimal_col=double date_col=string int2_col=int{code} For conclusion, it looks like there is a bug mixing the two features: header recognition and merge schema. was: Hello. I am writing unit-tests to some functionality in my application that reading data from CSV files using Spark. I am reading the data using: ``` header=True mergeSchema=True inferSchema=True ``` When I am reading this single file: ``` Fi "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 ``` I am getting this schema: ``` int_col=int string_col=string decimal_col=double date_col=string ``` When I am duplicating this file, I am getting the same schema. The strange part is when I am adding new int column, it looks like spark is getting confused and think that the column that already identified as int are now string: ``` File1: "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=string string_col=string decimal_col=string date_col=string int2_col=int ``` When I am reading only the second file, it looks fine: ``` File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=int string_col=string decimal_col=double date_col=string int2_col=int ``` For conclusion, it looks like there is a bug mixing the two features: header recognition and merge schema. > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > {code:java} > header=True > mergeSchema=True > inferSchema=True{code} > When I am reading this single file: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22{code} > I am getting this schema: > {code:java} > int_col=int > string_col=string > decimal_col=double > date_col=string{code} > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > {code:java} > File1: > "int_col","string_col","decimal_col","date_col"
[jira] [Updated] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
[ https://issues.apache.org/jira/browse/SPARK-40808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ohad updated SPARK-40808: - Description: Hello. I am writing unit-tests to some functionality in my application that reading data from CSV files using Spark. I am reading the data using: ``` header=True mergeSchema=True inferSchema=True ``` When I am reading this single file: ``` Fi "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 ``` I am getting this schema: ``` int_col=int string_col=string decimal_col=double date_col=string ``` When I am duplicating this file, I am getting the same schema. The strange part is when I am adding new int column, it looks like spark is getting confused and think that the column that already identified as int are now string: ``` File1: "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=string string_col=string decimal_col=string date_col=string int2_col=int ``` When I am reading only the second file, it looks fine: ``` File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=int string_col=string decimal_col=double date_col=string int2_col=int ``` For conclusion, it looks like there is a bug mixing the two features: header recognition and merge schema. was: Hello. I am writing some unit-tests to some functionality in my application that reading data from CSV files using Spark. I am reading the data using: ``` header=True mergeSchema=True inferSchema=True ``` When I am reading this single file: ``` Fi "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 ``` I am getting this schema: ``` int_col=int string_col=string decimal_col=double date_col=string ``` When I am duplicating this file, I am getting the same schema. The strange part is when I am adding new int column, it looks like spark is getting confused and think that the column that already identified as int are now string: ``` File1: "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=string string_col=string decimal_col=string date_col=string int2_col=int ``` When I am reading only the second file, it looks fine: ``` File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=int string_col=string decimal_col=double date_col=string int2_col=int ``` For conclusion, it looks like there is a bug mixing the two features: header recognition and merge schema. > Infer schema for CSV files - wrong behavior using header + merge schema > --- > > Key: SPARK-40808 > URL: https://issues.apache.org/jira/browse/SPARK-40808 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.2.2 >Reporter: ohad >Priority: Major > Labels: CSVReader, csv, csvparser > > Hello. > I am writing unit-tests to some functionality in my application that reading > data from CSV files using Spark. > I am reading the data using: > ``` > header=True > mergeSchema=True > inferSchema=True > ``` > When I am reading this single file: > ``` > Fi > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,2002-03-22 > ``` > I am getting this schema: > ``` > int_col=int > string_col=string > decimal_col=double > date_col=string > ``` > When I am duplicating this file, I am getting the same schema. > The strange part is when I am adding new int column, it looks like spark is > getting confused and think that the column that already identified as int are > now string: > ``` > File1: > "int_col","string_col","decimal_col","date_col" > 1,"hello",1.43,2022-02-23 > 2,"world",5.534,2021-05-05 > 3,"my name",86.455,2011-08-15 > 4,"is ohad",6.234,
[jira] [Created] (SPARK-40808) Infer schema for CSV files - wrong behavior using header + merge schema
ohad created SPARK-40808: Summary: Infer schema for CSV files - wrong behavior using header + merge schema Key: SPARK-40808 URL: https://issues.apache.org/jira/browse/SPARK-40808 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.2.2 Reporter: ohad Hello. I am writing some unit-tests to some functionality in my application that reading data from CSV files using Spark. I am reading the data using: ``` header=True mergeSchema=True inferSchema=True ``` When I am reading this single file: ``` Fi "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 ``` I am getting this schema: ``` int_col=int string_col=string decimal_col=double date_col=string ``` When I am duplicating this file, I am getting the same schema. The strange part is when I am adding new int column, it looks like spark is getting confused and think that the column that already identified as int are now string: ``` File1: "int_col","string_col","decimal_col","date_col" 1,"hello",1.43,2022-02-23 2,"world",5.534,2021-05-05 3,"my name",86.455,2011-08-15 4,"is ohad",6.234,2002-03-22 File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=string string_col=string decimal_col=string date_col=string int2_col=int ``` When I am reading only the second file, it looks fine: ``` File2: "int_col","string_col","decimal_col","date_col","int2_col" 1,"hello",1.43,2022-02-23,234 2,"world",5.534,2021-05-05,5 3,"my name",86.455,2011-08-15,32 4,"is ohad",6.234,2002-03-22,2 ``` result: ``` int_col=int string_col=string decimal_col=double date_col=string int2_col=int ``` For conclusion, it looks like there is a bug mixing the two features: header recognition and merge schema. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org