[jira] [Resolved] (SPARK-47140) Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions
[ https://issues.apache.org/jira/browse/SPARK-47140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47140. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45227 [https://github.com/apache/spark/pull/45227] > Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions > -- > > Key: SPARK-47140 > URL: https://issues.apache.org/jira/browse/SPARK-47140 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`
[ https://issues.apache.org/jira/browse/SPARK-47142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47142. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45229 [https://github.com/apache/spark/pull/45229] > Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in > `DepsTestsSuite` > > > Key: SPARK-47142 > URL: https://issues.apache.org/jira/browse/SPARK-47142 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47143) Improve `ArtifactSuite` to use unique `MavenCoordinate`s
[ https://issues.apache.org/jira/browse/SPARK-47143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47143: -- Summary: Improve `ArtifactSuite` to use unique `MavenCoordinate`s (was: Fix `ArtifactSuite` to use unique `MavenCoordinate`s) > Improve `ArtifactSuite` to use unique `MavenCoordinate`s > > > Key: SPARK-47143 > URL: https://issues.apache.org/jira/browse/SPARK-47143 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47143) Fix `ArtifactSuite` to use unique `MavenCoordinate`s
[ https://issues.apache.org/jira/browse/SPARK-47143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47143: -- Summary: Fix `ArtifactSuite` to use unique `MavenCoordinate`s (was: Fix `ArtifactSuite` to use aunique `MavenCoordinate`s) > Fix `ArtifactSuite` to use unique `MavenCoordinate`s > > > Key: SPARK-47143 > URL: https://issues.apache.org/jira/browse/SPARK-47143 > Project: Spark > Issue Type: Sub-task > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`
[ https://issues.apache.org/jira/browse/SPARK-47142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47142: - Assignee: Dongjoon Hyun > Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in > `DepsTestsSuite` > > > Key: SPARK-47142 > URL: https://issues.apache.org/jira/browse/SPARK-47142 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`
[ https://issues.apache.org/jira/browse/SPARK-47142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47142: --- Labels: pull-request-available (was: ) > Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in > `DepsTestsSuite` > > > Key: SPARK-47142 > URL: https://issues.apache.org/jira/browse/SPARK-47142 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`
Dongjoon Hyun created SPARK-47142: - Summary: Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite` Key: SPARK-47142 URL: https://issues.apache.org/jira/browse/SPARK-47142 Project: Spark Issue Type: Sub-task Components: Kubernetes, Tests Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47137. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45222 [https://github.com/apache/spark/pull/45222] > Add getAll to spark.conf for feature parity with Scala > -- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47137: - Assignee: Takuya Ueshin > Add getAll to spark.conf for feature parity with Scala > -- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Assignee: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47141) Support shuffle migration to external storage
[ https://issues.apache.org/jira/browse/SPARK-47141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47141: --- Labels: pull-request-available (was: ) > Support shuffle migration to external storage > - > > Key: SPARK-47141 > URL: https://issues.apache.org/jira/browse/SPARK-47141 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: mahesh kumar behera >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently Spark supports migration of shuffle data to peer nodes during node > decommissioning. If peer nodes are not accessible, then Spark falls back to > external storage. User needs to provide the storage location path. There are > scenarios where user may want to migrate to external storage instead of peer > nodes. This may be because of unstable nodes or due to the need of > aggressive scale down. So user should be able to configure to migrate the > shuffle data directly to external storage if the use case permits. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47141) Support shuffle migration to external storage
mahesh kumar behera created SPARK-47141: --- Summary: Support shuffle migration to external storage Key: SPARK-47141 URL: https://issues.apache.org/jira/browse/SPARK-47141 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: mahesh kumar behera Fix For: 4.0.0 Currently Spark supports migration of shuffle data to peer nodes during node decommissioning. If peer nodes are not accessible, then Spark falls back to external storage. User needs to provide the storage location path. There are scenarios where user may want to migrate to external storage instead of peer nodes. This may be because of unstable nodes or due to the need of aggressive scale down. So user should be able to configure to migrate the shuffle data directly to external storage if the use case permits. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs
[ https://issues.apache.org/jira/browse/SPARK-47130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao reassigned SPARK-47130: Assignee: Kent Yao > Use listStatus to bypass block location info when cleaning driver logs > -- > > Key: SPARK-47130 > URL: https://issues.apache.org/jira/browse/SPARK-47130 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs
[ https://issues.apache.org/jira/browse/SPARK-47130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao resolved SPARK-47130. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45215 [https://github.com/apache/spark/pull/45215] > Use listStatus to bypass block location info when cleaning driver logs > -- > > Key: SPARK-47130 > URL: https://issues.apache.org/jira/browse/SPARK-47130 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47140) Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions
Hyukjin Kwon created SPARK-47140: Summary: Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions Key: SPARK-47140 URL: https://issues.apache.org/jira/browse/SPARK-47140 Project: Spark Issue Type: Improvement Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47139) Upgrade Python version used in coverage report
Hyukjin Kwon created SPARK-47139: Summary: Upgrade Python version used in coverage report Key: SPARK-47139 URL: https://issues.apache.org/jira/browse/SPARK-47139 Project: Spark Issue Type: Improvement Components: Project Infra, PySpark Affects Versions: 4.0.0 Reporter: Hyukjin Kwon -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47123) JDBCRDD does not correctly handle errors in getQueryOutputSchema
[ https://issues.apache.org/jira/browse/SPARK-47123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47123. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45209 [https://github.com/apache/spark/pull/45209] > JDBCRDD does not correctly handle errors in getQueryOutputSchema > > > Key: SPARK-47123 > URL: https://issues.apache.org/jira/browse/SPARK-47123 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Pablo Langa Blanco >Assignee: Pablo Langa Blanco >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > If there is an error executing statement.executeQuery(), it's possible that > another error in one of the finally statements makes us not see the main > error. > {code:java} > def getQueryOutputSchema( > query: String, options: JDBCOptions, dialect: JdbcDialect): StructType > = { > val conn: Connection = dialect.createConnectionFactory(options)(-1) > try { > val statement = conn.prepareStatement(query) > try { > statement.setQueryTimeout(options.queryTimeout) > val rs = statement.executeQuery() > try { > JdbcUtils.getSchema(rs, dialect, alwaysNullable = true, > isTimestampNTZ = options.preferTimestampNTZ) > } finally { > rs.close() > } > } finally { > statement.close() > } > } finally { > conn.close() > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47123) JDBCRDD does not correctly handle errors in getQueryOutputSchema
[ https://issues.apache.org/jira/browse/SPARK-47123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47123: Assignee: Pablo Langa Blanco > JDBCRDD does not correctly handle errors in getQueryOutputSchema > > > Key: SPARK-47123 > URL: https://issues.apache.org/jira/browse/SPARK-47123 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.5.0, 4.0.0 >Reporter: Pablo Langa Blanco >Assignee: Pablo Langa Blanco >Priority: Minor > Labels: pull-request-available > > If there is an error executing statement.executeQuery(), it's possible that > another error in one of the finally statements makes us not see the main > error. > {code:java} > def getQueryOutputSchema( > query: String, options: JDBCOptions, dialect: JdbcDialect): StructType > = { > val conn: Connection = dialect.createConnectionFactory(options)(-1) > try { > val statement = conn.prepareStatement(query) > try { > statement.setQueryTimeout(options.queryTimeout) > val rs = statement.executeQuery() > try { > JdbcUtils.getSchema(rs, dialect, alwaysNullable = true, > isTimestampNTZ = options.preferTimestampNTZ) > } finally { > rs.close() > } > } finally { > statement.close() > } > } finally { > conn.close() > } > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-47115) Use larger memory for Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-47115: -- Assignee: (was: Hyukjin Kwon) > Use larger memory for Maven builds > -- > > Key: SPARK-47115 > URL: https://issues.apache.org/jira/browse/SPARK-47115 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > *** RUN ABORTED *** > An exception or error caused a run to abort: unable to create native thread: > possibly out of memory or process/resource limits reached > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:1553) > at java.base/java.lang.System$2.start(System.java:2577) > at > java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) > at > org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) > ... > Warning: The requested profile "volcano" could not be activated because it > does not exist. > Warning: The requested profile "hive" could not be activated because it does > not exist. > Error: Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project > spark-core_2.13: There are test failures -> [Help 1] > Error: > Error: To see the full stack trace of the errors, re-run Maven with the -e > switch. > Error: Re-run Maven using the -X switch to enable full debug logging. > Error: > Error: For more information about the errors and possible solutions, please > read the following articles: > Error: [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > Error: > Error: After correcting the problems, you can resume the build with the > command > Error:mvn -rf :spark-core_2.13 > Error: Process completed with exit code 1. > {code} > https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47115) Use larger memory for Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47115: - Fix Version/s: (was: 4.0.0) > Use larger memory for Maven builds > -- > > Key: SPARK-47115 > URL: https://issues.apache.org/jira/browse/SPARK-47115 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > *** RUN ABORTED *** > An exception or error caused a run to abort: unable to create native thread: > possibly out of memory or process/resource limits reached > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:1553) > at java.base/java.lang.System$2.start(System.java:2577) > at > java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) > at > org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) > ... > Warning: The requested profile "volcano" could not be activated because it > does not exist. > Warning: The requested profile "hive" could not be activated because it does > not exist. > Error: Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project > spark-core_2.13: There are test failures -> [Help 1] > Error: > Error: To see the full stack trace of the errors, re-run Maven with the -e > switch. > Error: Re-run Maven using the -X switch to enable full debug logging. > Error: > Error: For more information about the errors and possible solutions, please > read the following articles: > Error: [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > Error: > Error: After correcting the problems, you can resume the build with the > command > Error:mvn -rf :spark-core_2.13 > Error: Process completed with exit code 1. > {code} > https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47115) Use larger memory for Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47115. -- Resolution: Invalid it doesn't help. reverted > Use larger memory for Maven builds > -- > > Key: SPARK-47115 > URL: https://issues.apache.org/jira/browse/SPARK-47115 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > *** RUN ABORTED *** > An exception or error caused a run to abort: unable to create native thread: > possibly out of memory or process/resource limits reached > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:1553) > at java.base/java.lang.System$2.start(System.java:2577) > at > java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) > at > org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) > ... > Warning: The requested profile "volcano" could not be activated because it > does not exist. > Warning: The requested profile "hive" could not be activated because it does > not exist. > Error: Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project > spark-core_2.13: There are test failures -> [Help 1] > Error: > Error: To see the full stack trace of the errors, re-run Maven with the -e > switch. > Error: Re-run Maven using the -X switch to enable full debug logging. > Error: > Error: For more information about the errors and possible solutions, please > read the following articles: > Error: [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > Error: > Error: After correcting the problems, you can resume the build with the > command > Error:mvn -rf :spark-core_2.13 > Error: Process completed with exit code 1. > {code} > https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47136) Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
[ https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47136. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45220 [https://github.com/apache/spark/pull/45220] > Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly > -- > > Key: SPARK-47136 > URL: https://issues.apache.org/jira/browse/SPARK-47136 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47136) Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
[ https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47136: - Assignee: Dongjoon Hyun > Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly > -- > > Key: SPARK-47136 > URL: https://issues.apache.org/jira/browse/SPARK-47136 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819834#comment-17819834 ] Dylan Walker edited comment on SPARK-47134 at 2/22/24 10:32 PM: [~bersprockets] Hmm, it's possible I may have made too many assumptions. I left out that this is on EMR, which does have its own fork of Spark. If this is referring to names that don't exist in the Apache Spark codebase, this may be an Amazon thing. I will reach out to AWS support to confirm, and apologies if this turns out to be the case. Unfortunately, they don't do a great job at documenting the differences. was (Author: JIRAUSER304364): [~bersprockets] Hmm, it's possible I may have made too many assumptions. I left out that this is on EMR, which does have its own fork of Spark. If this is referring to names that don't exist in the Apache Spark codebase, this may be an Amazon thing. I will reach out to AWS support to confirm, and apologies if this turns out to be the case. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819834#comment-17819834 ] Dylan Walker commented on SPARK-47134: -- [~bersprockets] Hmm, it's possible I may have made too many assumptions. I left out that this is on EMR, which does have its own fork of Spark. If this is referring to names that don't exist in the Apache Spark codebase, this may be an Amazon thing. I will reach out to AWS support to confirm, and apologies if this turns out to be the case. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47137: --- Labels: pull-request-available (was: ) > Add getAll to spark.conf for feature parity with Scala > -- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47135) Implement error classes for Kafka data loss exceptions
[ https://issues.apache.org/jira/browse/SPARK-47135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47135: --- Labels: pull-request-available (was: ) > Implement error classes for Kafka data loss exceptions > --- > > Key: SPARK-47135 > URL: https://issues.apache.org/jira/browse/SPARK-47135 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: B. Micheal Okutubo >Priority: Major > Labels: pull-request-available > > In the kafka connector code, we have several code that throws the java > *IllegalStateException* to report data loss, while reading from Kafka. We > want to properly classify those exceptions using the new error framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819830#comment-17819830 ] nirav patel edited comment on SPARK-46762 at 2/22/24 10:13 PM: --- I did some more digging into executor classloading and heap dump. Here's what I found: with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue is not reproducible, ie everything works) I only see one instance of `org.apache.iceberg.Table` loaded however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two instances of `org.apache.iceberg.Table` loaded: here's stdout from executor on which I applied `verbose:class` : {code:java} [47.556s][info][class,load ] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar [45.415s][info][class,load] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar {code} I also confirmed above via heap dump. see attached screenshots. Same class `org.apache.iceberg.Table` is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader was (Author: tenstriker): I did some more digging into executor classloading and heap dump. Here's what I found: with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue is not reproducible, ie everything works) I only see one instance of `org.apache.iceberg.Table` loaded however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two instances of `org.apache.iceberg.Table` loaded: here's stdout from executor on which I applied `verbose:class` : {code:java} [47.556s][info][class,load ] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar [45.415s][info][class,load] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar {code} I also confirmed above via heap dump. see attached screenshots. Same class is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot > 2024-02-22 at 2.04.49 PM.png > > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) >
[jira] [Comment Edited] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819830#comment-17819830 ] nirav patel edited comment on SPARK-46762 at 2/22/24 10:11 PM: --- I did some more digging into executor classloading and heap dump. Here's what I found: with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue is not reproducible, ie everything works) I only see one instance of `org.apache.iceberg.Table` loaded however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two instances of `org.apache.iceberg.Table` loaded: here's stdout from executor on which I applied `verbose:class` : {code:java} [47.556s][info][class,load ] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar [45.415s][info][class,load] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar {code} I also confirmed above via heap dump. see attached screenshots. Same class is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader was (Author: tenstriker): I did some more digging into executor classloading and heap dump. Here's what I found: with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue is not reproducible, ie everything works) I only see one instance of `org.apache.iceberg.Table` loaded however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two instances of `org.apache.iceberg.Table` loaded: here's stdout from executor on which I applied `verbose:class` : {code:java} [47.556s][info][class,load ] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar [45.415s][info][class,load] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar {code} I also confirmed above via heap dump. see attached screenshots. Same class is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader !Screenshot 2024-02-22 at 2.04.49 PM.png! > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot > 2024-02-22 at 2.04.49 PM.png > > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at
[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] nirav patel updated SPARK-46762: Attachment: Screenshot 2024-02-22 at 2.04.49 PM.png Screenshot 2024-02-22 at 2.04.37 PM.png > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot > 2024-02-22 at 2.04.49 PM.png > > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at org.apach...{code} > > `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of > `org.apache.iceberg.Table` and they are both in only one jar > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` > We verified that there's only one jar of > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server > is started. > Looking more into Error it seems classloader itself is instantiated multiple > times somewhere. I can see two instances: > org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 > > *Affected version:* > spark 3.5 and spark-connect_2.12:3.5.0 works fine > > *Not affected version and variation:* > Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar > Also works with just Spark 3.5 spark-submit script directly (ie without using > spark-connect 3.5 ) > > Issue has been open with Iceberg as well: > [https://github.com/apache/iceberg/issues/8978] > And been discussed in dev@org.apache.iceberg: > [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar
[ https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819830#comment-17819830 ] nirav patel commented on SPARK-46762: - I did some more digging into executor classloading and heap dump. Here's what I found: with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue is not reproducible, ie everything works) I only see one instance of `org.apache.iceberg.Table` loaded however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two instances of `org.apache.iceberg.Table` loaded: here's stdout from executor on which I applied `verbose:class` : {code:java} [47.556s][info][class,load ] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar [45.415s][info][class,load] org.apache.iceberg.Table source: file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar {code} I also confirmed above via heap dump. see attached screenshots. Same class is loaded twice> once by ChildFirstUrlClassLoader and once by MutableURLClassLoader !Screenshot 2024-02-22 at 2.04.49 PM.png! > Spark Connect 3.5 Classloading issue with external jar > -- > > Key: SPARK-46762 > URL: https://issues.apache.org/jira/browse/SPARK-46762 > Project: Spark > Issue Type: Bug > Components: Connect >Affects Versions: 3.5.0 >Reporter: nirav patel >Priority: Major > > We are having following `java.lang.ClassCastException` error in spark > Executors when using spark-connect 3.5 with external spark sql catalog jar - > iceberg-spark-runtime-3.5_2.12-1.4.3.jar > We also set "spark.executor.userClassPathFirst=true" otherwise child class > gets loaded by MutableClassLoader and parent class gets loaded by > ChildFirstCLassLoader and that causes ClassCastException as well. > > {code:java} > pyspark.errors.exceptions.connect.SparkConnectGrpcException: > (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in > stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 > (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): > java.lang.ClassCastException: class > org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to > class org.apache.iceberg.Table > (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed > module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; > org.apache.iceberg.Table is in unnamed module of loader > org.apache.spark.util.ChildFirstURLClassLoader @4b18b943) > at > org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88) > at > org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50) > at > org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84) > at > org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63) > at > org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460) > at > org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890) > at > org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890) > at > org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52) > at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364) > at org.apache.spark.rdd.RDD.iterator(RDD.scala:328) > at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93) > at > org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161) > at org.apache.spark.scheduler.Task.run(Task.scala:141) > at > org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620) > at org.apach...{code} > > `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of > `org.apache.iceberg.Table` and they are both in only one jar > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` > We verified that there's only one jar of > `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server > is started. > Looking more into Error it seems classloader itself is instantiated multiple > times
[jira] [Updated] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47137: -- Summary: Add getAll to spark.conf for feature parity with Scala (was: Add getAll for spark.conf for feature parity with Scala) > Add getAll to spark.conf for feature parity with Scala > -- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47137) Add getAll for spark.conf for feature parity with Scala
[ https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin updated SPARK-47137: -- Summary: Add getAll for spark.conf for feature parity with Scala (was: Add getAll for pyspark.sql.conf for feature parity with Scala) > Add getAll for spark.conf for feature parity with Scala > --- > > Key: SPARK-47137 > URL: https://issues.apache.org/jira/browse/SPARK-47137 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Takuya Ueshin >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47137) Add getAll for pyspark.sql.conf for feature parity with Scala
Takuya Ueshin created SPARK-47137: - Summary: Add getAll for pyspark.sql.conf for feature parity with Scala Key: SPARK-47137 URL: https://issues.apache.org/jira/browse/SPARK-47137 Project: Spark Issue Type: Improvement Components: PySpark Affects Versions: 4.0.0 Reporter: Takuya Ueshin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47136) Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
[ https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47136: -- Summary: Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly (was: Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`) > Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly > -- > > Key: SPARK-47136 > URL: https://issues.apache.org/jira/browse/SPARK-47136 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47136) Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
[ https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47136: -- Component/s: Tests > Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite` > > > Key: SPARK-47136 > URL: https://issues.apache.org/jira/browse/SPARK-47136 > Project: Spark > Issue Type: Sub-task > Components: Spark Core, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47136) Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
[ https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47136: --- Labels: pull-request-available (was: ) > Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite` > > > Key: SPARK-47136 > URL: https://issues.apache.org/jira/browse/SPARK-47136 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47136) Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
[ https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47136: -- Summary: Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite` (was: Use `ivyPath` parameter of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`) > Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite` > > > Key: SPARK-47136 > URL: https://issues.apache.org/jira/browse/SPARK-47136 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47136) Use `ivyPath` parameter of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
Dongjoon Hyun created SPARK-47136: - Summary: Use `ivyPath` parameter of `MavenUtils.loadIvySettings` in `MavenUtilsSuite` Key: SPARK-47136 URL: https://issues.apache.org/jira/browse/SPARK-47136 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47069) Introduce `spark.profile.show/dump` for SparkSession-based profiling
[ https://issues.apache.org/jira/browse/SPARK-47069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-47069. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45129 [https://github.com/apache/spark/pull/45129] > Introduce `spark.profile.show/dump` for SparkSession-based profiling > > > Key: SPARK-47069 > URL: https://issues.apache.org/jira/browse/SPARK-47069 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Xinrong Meng >Assignee: Xinrong Meng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Introduce `spark.profile.show/dump` for SparkSession-based profiling -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819789#comment-17819789 ] Bruce Robbins commented on SPARK-47134: --- Oddly, I cannot reproduce on either 3.4.1 or 3.5.0. Also, my 3.4.1 plan doesn't look like your 3.4.1 plan: My plan uses {{sum}}, your plan uses {{decimalsum}}. I can't find where {{decimalsum}} comes from in the code base, but maybe I am not looking hard enough. {noformat} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").explain == Physical Plan == AdaptiveSparkPlan isFinalPlan=false +- Sort [ct#19 ASC NULLS FIRST], true, 0 +- Exchange rangepartitioning(ct#19 ASC NULLS FIRST, 200), ENSURE_REQUIREMENTS, [plan_id=68] +- HashAggregate(keys=[_1#2], functions=[sum(1.00)]) +- Exchange hashpartitioning(_1#2, 200), ENSURE_REQUIREMENTS, [plan_id=65] +- HashAggregate(keys=[_1#2], functions=[partial_sum(1.00)]) +- LocalTableScan [_1#2] scala> sql("select version()").show(false) +--+ |version() | +--+ |3.4.1 6b1ff22dde1ead51cbf370be6e48a802daae58b6| +--+ scala> {noformat} > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779 ] Xinrong Meng edited comment on SPARK-47132 at 2/22/24 7:21 PM: --- [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. !image-2024-02-22-11-21-30-460.png! was (Author: xinrongm): [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png, > image-2024-02-22-11-21-30-460.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779 ] Xinrong Meng commented on SPARK-47132: -- [~wunderalbert] would you double check if you set up your Jira account correctly? I somehow couldn't assign the ticket to you. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819780#comment-17819780 ] Xinrong Meng commented on SPARK-47132: -- Resolved by https://github.com/apache/spark/pull/45197. > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Attachment: image-2024-02-22-11-18-02-429.png > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Attachments: image-2024-02-22-11-18-02-429.png > > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Issue Type: Documentation (was: Bug) > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 3.5.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xinrong Meng updated SPARK-47132: - Affects Version/s: 4.0.0 (was: 3.5.0) > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
[ https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819777#comment-17819777 ] Xinrong Meng commented on SPARK-47132: -- I modified the ticket to Documentation (from Bug) and 4.0.0 (from 3.5.0). > Mistake in Docstring for Pyspark's Dataframe.head() > --- > > Key: SPARK-47132 > URL: https://issues.apache.org/jira/browse/SPARK-47132 > Project: Spark > Issue Type: Documentation > Components: PySpark >Affects Versions: 4.0.0 >Reporter: Albert Ziegler >Priority: Trivial > Labels: pull-request-available > Original Estimate: 0.5h > Remaining Estimate: 0.5h > > The docstring claims that {{head(n)}} would return a {{Row}} (rather than a > list of rows) iff n == 1, but that's incorrect. > Type hints, example, and implementation show that the difference between row > or list of rows lies in whether n is supplied at all -- if it isn't, > {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} > returns a list. > > A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47135) Implement error classes for Kafka data loss exceptions
[ https://issues.apache.org/jira/browse/SPARK-47135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819771#comment-17819771 ] B. Micheal Okutubo commented on SPARK-47135: I'm working on this. Will send PR soon. > Implement error classes for Kafka data loss exceptions > --- > > Key: SPARK-47135 > URL: https://issues.apache.org/jira/browse/SPARK-47135 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: B. Micheal Okutubo >Priority: Major > > In the kafka connector code, we have several code that throws the java > *IllegalStateException* to report data loss, while reading from Kafka. We > want to properly classify those exceptions using the new error framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47135) Implement error classes for Kafka data loss exceptions
B. Micheal Okutubo created SPARK-47135: -- Summary: Implement error classes for Kafka data loss exceptions Key: SPARK-47135 URL: https://issues.apache.org/jira/browse/SPARK-47135 Project: Spark Issue Type: Task Components: Structured Streaming Affects Versions: 4.0.0 Reporter: B. Micheal Okutubo In the kafka connector code, we have several code that throws the java *IllegalStateException* to report data loss, while reading from Kafka. We want to properly classify those exceptions using the new error framework. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types
[ https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47001: --- Labels: pull-request-available (was: ) > Pushdown Verification in Optimizer.scala should support changed data types > -- > > Key: SPARK-47001 > URL: https://issues.apache.org/jira/browse/SPARK-47001 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Holden Karau >Assignee: Holden Karau >Priority: Major > Labels: pull-request-available > > When pushing a filter down in a union the data type may not match exactly if > the filter was constructed using the child dataframe reference. This is > because the unions output is updated with a structype merge of union which > can turn non-nullable to nullable. These are still the same column despite > the different nullability so the filter should be safe to push down. As it > currently stands we get an exception. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy with SSL enabled
[ https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filippo Monari updated SPARK-47133: --- Environment: * We are running Spark in stand-alone mode, on Kubernetes. * The containers are based on Debian 11 (minideb) * The Spark version is 3.5 was: * We are running Spark in stand-alone mode, on Kubernetes. * The containers are based on Debian 11 (minideb) * The Spark version is 3.5 Please do not hesitate to ask further information if needed. > java.lang.NullPointerException: Missing SslContextFactory when accessing > Worker WebUI from Master as reverse proxy with SSL enabled > --- > > Key: SPARK-47133 > URL: https://issues.apache.org/jira/browse/SPARK-47133 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 3.5.0 > Environment: * We are running Spark in stand-alone mode, on > Kubernetes. > * The containers are based on Debian 11 (minideb) > * The Spark version is 3.5 >Reporter: Filippo Monari >Priority: Major > > Hi, > > We are encountering the error described here below. > If SSL/TLS is enabled on both, Master and Worker, it is not possible to > access the WebUI of the latter from the former configured as reverse proxy. > The returned error is the the following. > {code:java} > HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory > URI:/proxy/worker-20240222171308-10.113.3.1-34959 > STATUS:500 > MESSAGE:java.lang.NullPointerException: Missing SslContextFactory > SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54 > CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory > Caused by:java.lang.NullPointerException: Missing SslContextFactory > at java.base/java.util.Objects.requireNonNull(Objects.java:235) > at > org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57) > at > org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273) > at > org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279) > at > org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209) > at > org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215) > at > org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100) > at > org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25) > at > org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32) > at > org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54) > at > org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597) > at > java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) > at > org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593) > at > org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571) > at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626) > at > org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780) > at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767) > at > org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618) > at > org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) > at > org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) > at >
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Attachment: 321queryplan.txt > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Attachment: 341queryplan.txt > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > Attachments: 321queryplan.txt, 341queryplan.txt > > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > +---+ > | ct| > +---+ > | null| > |9508.00| > +---+ > {code} > This is fairly delicate: > - removing the {{ORDER BY}} clause produces the correct result > - removing the {{CAST}} produces the correct result > - changing the number of 0s in the argument to {{SUM}} produces the correct > result > - setting {{spark.ansi.enabled}} to {{true}} produces the correct result > (and does not throw an error) > Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also > result in the unexpected nulls. > Please let me know if you need additional information. > We are also interested in understanding whether setting > {{spark.ansi.enabled}} can be considered a reliable workaround to this issue > prior to a fix being released, if possible. > Text files that include {{explain()}} output attached. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ {code} This is fairly delicate: - removing the {{ORDER BY}} clause produces the correct result - removing the {{CAST}} produces the correct result - changing the number of 0s in the argument to {{SUM}} produces the correct result - setting {{spark.ansi.enabled}} to {{true}} produces the correct result (and does not throw an error) Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting {{spark.ansi.enabled}} can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00|
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > | ct| > ++ > | 9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0
[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy with SSL enabled
[ https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filippo Monari updated SPARK-47133: --- Summary: java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy with SSL enabled (was: java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy) > java.lang.NullPointerException: Missing SslContextFactory when accessing > Worker WebUI from Master as reverse proxy with SSL enabled > --- > > Key: SPARK-47133 > URL: https://issues.apache.org/jira/browse/SPARK-47133 > Project: Spark > Issue Type: Question > Components: Web UI >Affects Versions: 3.5.0 > Environment: * We are running Spark in stand-alone mode, on > Kubernetes. > * The containers are based on Debian 11 (minideb) > * The Spark version is 3.5 > Please do not hesitate to ask further information if needed. >Reporter: Filippo Monari >Priority: Major > > Hi, > > We are encountering the error described here below. > If SSL/TLS is enabled on both, Master and Worker, it is not possible to > access the WebUI of the latter from the former configured as reverse proxy. > The returned error is the the following. > {code:java} > HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory > URI:/proxy/worker-20240222171308-10.113.3.1-34959 > STATUS:500 > MESSAGE:java.lang.NullPointerException: Missing SslContextFactory > SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54 > CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory > Caused by:java.lang.NullPointerException: Missing SslContextFactory > at java.base/java.util.Objects.requireNonNull(Objects.java:235) > at > org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57) > at > org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273) > at > org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279) > at > org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209) > at > org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215) > at > org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100) > at > org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25) > at > org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32) > at > org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54) > at > org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597) > at > java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) > at > org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593) > at > org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571) > at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626) > at > org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780) > at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767) > at > org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618) > at > org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) > at > org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) > at > org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) > at > org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) > at > org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) > at > org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) > at > org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) > at > org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) > at > org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) > at > org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) > at
[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy
[ https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filippo Monari updated SPARK-47133: --- Description: Hi, We are encountering the error described here below. If SSL/TLS is enabled on both, Master and Worker, it is not possible to access the WebUI of the latter from the former configured as reverse proxy. The returned error is the the following. {code:java} HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory URI:/proxy/worker-20240222171308-10.113.3.1-34959 STATUS:500 MESSAGE:java.lang.NullPointerException: Missing SslContextFactory SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54 CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory Caused by:java.lang.NullPointerException: Missing SslContextFactory at java.base/java.util.Objects.requireNonNull(Objects.java:235) at org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215) at org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100) at org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25) at org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32) at org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54) at org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597) at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571) at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626) at org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780) at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767) at org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618) at org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114) at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.sparkproject.jetty.server.Server.handle(Server.java:516) at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479) at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: *Setup:* {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} *Spark 3.2.1 behaviour (correct):* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} *Spark 3.4.1 / Spark 3.5.0 behaviour:* {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} Spark 3.2.1 behaviour (correct): {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} Spark 3.4.1 / Spark 3.5.0 behaviour: {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > *Setup:* > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > *Spark 3.2.1 behaviour (correct):* > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > |ct| > ++ > |9508.00| > |13879.00| > ++ > {code} > *Spark 3.4.1 / Spark 3.5.0 behaviour:* > {code:scala} > scala> spark.sql("select
[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
[ https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dylan Walker updated SPARK-47134: - Description: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: {code:scala} scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") {code} Spark 3.2.1 behaviour (correct): {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ |ct| ++ |9508.00| |13879.00| ++ {code} Spark 3.4.1 / Spark 3.5.0 behaviour: {code:scala} scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ |ct| +---+ |null| |9508.00| +---+ {code} This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. was: In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: ``` scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") ``` Spark 3.2.1 behaviour (correct): ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ ``` Spark 3.4.1 / Spark 3.5.0 behaviour: ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ ``` This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. > Unexpected nulls when casting decimal values in specific cases > -- > > Key: SPARK-47134 > URL: https://issues.apache.org/jira/browse/SPARK-47134 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Dylan Walker >Priority: Major > > In specific cases, casting decimal values can result in `null` values where > no overflow exists. > > The cases appear very specific, and I don't have the depth of knowledge to > generalize this issue, so here is a simple spark-shell reproduction: > > Setup: > {code:scala} > scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", > x)).toDS > ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] > scala> ds.createOrReplaceTempView("t") > {code} > > Spark 3.2.1 behaviour (correct): > {code:scala} > scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct > FROM t GROUP BY `_1` ORDER BY ct ASC").show() > ++ > |ct| > ++ > |9508.00| > |13879.00| > ++ > {code} > Spark 3.4.1 / Spark 3.5.0 behaviour: > {code:scala} > scala> spark.sql("select
[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy
[ https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filippo Monari updated SPARK-47133: --- Description: Hi, We are encountering the error described here below. If SSL/TLS is enabled on both, Master and Worker, it is not possible to access the WebUI of the latter from the former configured as reverse proxy. The returned error is the the following. {code:java} HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory URI:/proxy/worker-20240222171308-10.113.3.1-34959 STATUS:500 MESSAGE:java.lang.NullPointerException: Missing SslContextFactory SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54 CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory Caused by:java.lang.NullPointerException: Missing SslContextFactory at java.base/java.util.Objects.requireNonNull(Objects.java:235) at org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215) at org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100) at org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25) at org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32) at org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54) at org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597) at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571) at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626) at org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780) at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767) at org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618) at org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114) at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.sparkproject.jetty.server.Server.handle(Server.java:516) at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479) at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at
[jira] [Created] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases
Dylan Walker created SPARK-47134: Summary: Unexpected nulls when casting decimal values in specific cases Key: SPARK-47134 URL: https://issues.apache.org/jira/browse/SPARK-47134 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.5.0, 3.4.1 Reporter: Dylan Walker In specific cases, casting decimal values can result in `null` values where no overflow exists. The cases appear very specific, and I don't have the depth of knowledge to generalize this issue, so here is a simple spark-shell reproduction: Setup: ``` scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int] scala> ds.createOrReplaceTempView("t") ``` Spark 3.2.1 behaviour (correct): ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() ++ | ct| ++ | 9508.00| |13879.00| ++ ``` Spark 3.4.1 / Spark 3.5.0 behaviour: ``` scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct FROM t GROUP BY `_1` ORDER BY ct ASC").show() +---+ | ct| +---+ | null| |9508.00| +---+ ``` This is fairly delicate: - removing the `ORDER BY` clause produces the correct result - removing the `CAST` produces the correct result - changing the number of 0s in the argument to `SUM` produces the correct result - setting `spark.ansi.enabled` to `true` produces the correct result (and does not throw an error) Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result in the unexpected nulls. Please let me know if you need additional information. We are also interested in understanding whether setting `spark.ansi.enabled` can be considered a reliable workaround to this issue prior to a fix being released, if possible. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy
Filippo Monari created SPARK-47133: -- Summary: java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy Key: SPARK-47133 URL: https://issues.apache.org/jira/browse/SPARK-47133 Project: Spark Issue Type: Question Components: Web UI Affects Versions: 3.5.0 Environment: * We are running Spark in stand-alone mode, on Kubernetes. * The containers are based on Debian 11 (minideb) * The Spark version is 3.5 Please do not hesitate to ask further information if needed. Reporter: Filippo Monari Hi, we are encountering the error described here below. If SSL/TLS is enabled on both, Master and Worker, it is not possible to access the WebUI of the latter from the former configured as reverse proxy. The returned error is the the following. {code:java} HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory URI:/proxy/worker-20240222171308-10.113.3.1-34959 STATUS:500 MESSAGE:java.lang.NullPointerException: Missing SslContextFactory SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54 CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory Caused by:java.lang.NullPointerException: Missing SslContextFactory at java.base/java.util.Objects.requireNonNull(Objects.java:235) at org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215) at org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100) at org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25) at org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32) at org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54) at org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597) at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571) at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626) at org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780) at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767) at org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618) at org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114) at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.sparkproject.jetty.server.Server.handle(Server.java:516) at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy
[ https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Filippo Monari updated SPARK-47133: --- Description: Hi, we are encountering the error described here below. If SSL/TLS is enabled on both, Master and Worker, it is not possible to access the WebUI of the latter from the former configured as reverse proxy. The returned error is the the following. {code:java} HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory URI:/proxy/worker-20240222171308-10.113.3.1-34959 STATUS:500 MESSAGE:java.lang.NullPointerException: Missing SslContextFactory SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54 CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory Caused by:java.lang.NullPointerException: Missing SslContextFactory at java.base/java.util.Objects.requireNonNull(Objects.java:235) at org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273) at org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209) at org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215) at org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100) at org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25) at org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32) at org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54) at org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597) at java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593) at org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571) at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626) at org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780) at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767) at org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618) at org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114) at javax.servlet.http.HttpServlet.service(HttpServlet.java:590) at org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799) at org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656) at org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95) at org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193) at org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626) at org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552) at org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233) at org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188) at org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505) at org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186) at org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355) at org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141) at org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772) at org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234) at org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127) at org.sparkproject.jetty.server.Server.handle(Server.java:516) at org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487) at org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732) at org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479) at org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277) at org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311) at org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105) at
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-43259: Assignee: Mihailo Milosevic > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Mihailo Milosevic >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-43259. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45095 [https://github.com/apache/spark/pull/45095] > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Mihailo Milosevic >Priority: Minor > Labels: pull-request-available, starter > Fix For: 4.0.0 > > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47131) contains, startswith, endswith
[ https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47131: --- Labels: pull-request-available (was: ) > contains, startswith, endswith > -- > > Key: SPARK-47131 > URL: https://issues.apache.org/jira/browse/SPARK-47131 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uroš Bojanić >Priority: Major > Labels: pull-request-available > > Refactored built-in string functions to enable collation support for: > {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now > be able to use COLLATE within arguments for built-in string functions: > CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47102: -- Description: *What changes were proposed in this pull request?* This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. *Why are the changes needed?* We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. was: ### What changes were proposed in this pull request? This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. ### Why are the changes needed? We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > *What changes were proposed in this pull request?* > This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error > class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of > feature under development. > *Why are the changes needed?* > We want to make collations configurable on this some flag. These changes > disable usage of `collate` and `collation` functions, along with any > `COLLATE` syntax when the flag is set to false. By default, the flag is set > to false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Mihailo Milosevic updated SPARK-47102: -- Description: ### What changes were proposed in this pull request? This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of feature under development. ### Why are the changes needed? We want to make collations configurable on this some flag. These changes disable usage of `collate` and `collation` functions, along with any `COLLATE` syntax when the flag is set to false. By default, the flag is set to false. > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > > ### What changes were proposed in this pull request? > This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error > class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of > feature under development. > ### Why are the changes needed? > We want to make collations configurable on this some flag. These changes > disable usage of `collate` and `collation` functions, along with any > `COLLATE` syntax when the flag is set to false. By default, the flag is set > to false. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47102: --- Labels: pull-request-available (was: ) > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()
Albert Ziegler created SPARK-47132: -- Summary: Mistake in Docstring for Pyspark's Dataframe.head() Key: SPARK-47132 URL: https://issues.apache.org/jira/browse/SPARK-47132 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.5.0 Reporter: Albert Ziegler The docstring claims that {{head(n)}} would return a {{Row}} (rather than a list of rows) iff n == 1, but that's incorrect. Type hints, example, and implementation show that the difference between row or list of rows lies in whether n is supplied at all -- if it isn't, {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} returns a list. A suggestion to fix is here: https://github.com/apache/spark/pull/45197 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47131) contains, startswith, endswith
Uroš Bojanić created SPARK-47131: Summary: contains, startswith, endswith Key: SPARK-47131 URL: https://issues.apache.org/jira/browse/SPARK-47131 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Uroš Bojanić Refactored built-in string functions to enable collation support for: {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be able to use COLLATE within arguments for built-in string functions: CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46975) Support dedicated fallback methods
[ https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-46975: -- Summary: Support dedicated fallback methods (was: Move to_{hdf, feather, stata} to the fallback list) > Support dedicated fallback methods > -- > > Key: SPARK-46975 > URL: https://issues.apache.org/jira/browse/SPARK-46975 > Project: Spark > Issue Type: Sub-task > Components: PS >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175
[ https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk reassigned SPARK-42328: Assignee: Nikola Mandic > Assign name to _LEGACY_ERROR_TEMP_1175 > -- > > Key: SPARK-42328 > URL: https://issues.apache.org/jira/browse/SPARK-42328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175
[ https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Max Gekk resolved SPARK-42328. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45183 [https://github.com/apache/spark/pull/45183] > Assign name to _LEGACY_ERROR_TEMP_1175 > -- > > Key: SPARK-42328 > URL: https://issues.apache.org/jira/browse/SPARK-42328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Assignee: Nikola Mandic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs
[ https://issues.apache.org/jira/browse/SPARK-47130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47130: --- Labels: pull-request-available (was: ) > Use listStatus to bypass block location info when cleaning driver logs > -- > > Key: SPARK-47130 > URL: https://issues.apache.org/jira/browse/SPARK-47130 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47129) Make ResolveRelations cache connect plan properly
[ https://issues.apache.org/jira/browse/SPARK-47129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng updated SPARK-47129: -- Summary: Make ResolveRelations cache connect plan properly (was: Make ResolveRelations handle planId properly) > Make ResolveRelations cache connect plan properly > - > > Key: SPARK-47129 > URL: https://issues.apache.org/jira/browse/SPARK-47129 > Project: Spark > Issue Type: Improvement > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs
Kent Yao created SPARK-47130: Summary: Use listStatus to bypass block location info when cleaning driver logs Key: SPARK-47130 URL: https://issues.apache.org/jira/browse/SPARK-47130 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47129) Make ResolveRelations handle planId properly
[ https://issues.apache.org/jira/browse/SPARK-47129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47129: --- Labels: pull-request-available (was: ) > Make ResolveRelations handle planId properly > > > Key: SPARK-47129 > URL: https://issues.apache.org/jira/browse/SPARK-47129 > Project: Spark > Issue Type: Improvement > Components: Connect, SQL >Affects Versions: 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47128) Improve `spark.sql.hive.metastore.sharedPrefixes` default value
[ https://issues.apache.org/jira/browse/SPARK-47128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47128: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Improvement) > Improve `spark.sql.hive.metastore.sharedPrefixes` default value > --- > > Key: SPARK-47128 > URL: https://issues.apache.org/jira/browse/SPARK-47128 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org