[jira] [Resolved] (SPARK-47140) Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions

2024-02-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47140?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47140.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45227
[https://github.com/apache/spark/pull/45227]

> Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions
> --
>
> Key: SPARK-47140
> URL: https://issues.apache.org/jira/browse/SPARK-47140
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47142.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45229
[https://github.com/apache/spark/pull/45229]

> Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in 
> `DepsTestsSuite`
> 
>
> Key: SPARK-47142
> URL: https://issues.apache.org/jira/browse/SPARK-47142
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47143) Improve `ArtifactSuite` to use unique `MavenCoordinate`s

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47143:
--
Summary: Improve `ArtifactSuite` to use unique `MavenCoordinate`s  (was: 
Fix `ArtifactSuite` to use unique `MavenCoordinate`s)

> Improve `ArtifactSuite` to use unique `MavenCoordinate`s
> 
>
> Key: SPARK-47143
> URL: https://issues.apache.org/jira/browse/SPARK-47143
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47143) Fix `ArtifactSuite` to use unique `MavenCoordinate`s

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47143:
--
Summary: Fix `ArtifactSuite` to use unique `MavenCoordinate`s  (was: Fix 
`ArtifactSuite` to use aunique `MavenCoordinate`s)

> Fix `ArtifactSuite` to use unique `MavenCoordinate`s
> 
>
> Key: SPARK-47143
> URL: https://issues.apache.org/jira/browse/SPARK-47143
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47142:
-

Assignee: Dongjoon Hyun

> Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in 
> `DepsTestsSuite`
> 
>
> Key: SPARK-47142
> URL: https://issues.apache.org/jira/browse/SPARK-47142
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47142?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47142:
---
Labels: pull-request-available  (was: )

> Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in 
> `DepsTestsSuite`
> 
>
> Key: SPARK-47142
> URL: https://issues.apache.org/jira/browse/SPARK-47142
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47142) Use `spark.jars.ivy` instead `spark.driver.extraJavaOptions` in `DepsTestsSuite`

2024-02-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47142:
-

 Summary: Use `spark.jars.ivy` instead 
`spark.driver.extraJavaOptions` in `DepsTestsSuite`
 Key: SPARK-47142
 URL: https://issues.apache.org/jira/browse/SPARK-47142
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes, Tests
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47137.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45222
[https://github.com/apache/spark/pull/45222]

> Add getAll to spark.conf for feature parity with Scala
> --
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47137:
-

Assignee: Takuya Ueshin

> Add getAll to spark.conf for feature parity with Scala
> --
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Assignee: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47141) Support shuffle migration to external storage

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47141?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47141:
---
Labels: pull-request-available  (was: )

> Support shuffle migration to external storage
> -
>
> Key: SPARK-47141
> URL: https://issues.apache.org/jira/browse/SPARK-47141
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: mahesh kumar behera
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently Spark supports migration of shuffle data to peer nodes during node 
> decommissioning. If peer nodes are not accessible, then Spark falls back to 
> external storage. User needs to provide the storage location path. There are 
> scenarios where user may want to migrate to external storage instead of peer 
> nodes. This may be because of unstable  nodes or due to the need of 
> aggressive scale down. So user should be able to configure to migrate the 
> shuffle data directly to external storage if the use case permits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47141) Support shuffle migration to external storage

2024-02-22 Thread mahesh kumar behera (Jira)
mahesh kumar behera created SPARK-47141:
---

 Summary: Support shuffle migration to external storage
 Key: SPARK-47141
 URL: https://issues.apache.org/jira/browse/SPARK-47141
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: mahesh kumar behera
 Fix For: 4.0.0


Currently Spark supports migration of shuffle data to peer nodes during node 
decommissioning. If peer nodes are not accessible, then Spark falls back to 
external storage. User needs to provide the storage location path. There are 
scenarios where user may want to migrate to external storage instead of peer 
nodes. This may be because of unstable  nodes or due to the need of aggressive 
scale down. So user should be able to configure to migrate the shuffle data 
directly to external storage if the use case permits. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs

2024-02-22 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-47130:


Assignee: Kent Yao

> Use listStatus to bypass block location info when cleaning driver logs
> --
>
> Key: SPARK-47130
> URL: https://issues.apache.org/jira/browse/SPARK-47130
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs

2024-02-22 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47130.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45215
[https://github.com/apache/spark/pull/45215]

> Use listStatus to bypass block location info when cleaning driver logs
> --
>
> Key: SPARK-47130
> URL: https://issues.apache.org/jira/browse/SPARK-47130
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47140) Upgrade codecov/codecov-action from v2 to v4 in GitHub Actions

2024-02-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47140:


 Summary: Upgrade codecov/codecov-action from v2 to v4 in GitHub 
Actions
 Key: SPARK-47140
 URL: https://issues.apache.org/jira/browse/SPARK-47140
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47139) Upgrade Python version used in coverage report

2024-02-22 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47139:


 Summary: Upgrade Python version used in coverage report
 Key: SPARK-47139
 URL: https://issues.apache.org/jira/browse/SPARK-47139
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47123) JDBCRDD does not correctly handle errors in getQueryOutputSchema

2024-02-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47123.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45209
[https://github.com/apache/spark/pull/45209]

> JDBCRDD does not correctly handle errors in getQueryOutputSchema
> 
>
> Key: SPARK-47123
> URL: https://issues.apache.org/jira/browse/SPARK-47123
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> If there is an error executing statement.executeQuery(), it's possible that 
> another error in one of the finally statements makes us not see the main 
> error.
> {code:java}
> def getQueryOutputSchema(
>       query: String, options: JDBCOptions, dialect: JdbcDialect): StructType 
> = {
>     val conn: Connection = dialect.createConnectionFactory(options)(-1)
>     try {
>       val statement = conn.prepareStatement(query)
>       try {
>         statement.setQueryTimeout(options.queryTimeout)
>         val rs = statement.executeQuery()
>         try {
>           JdbcUtils.getSchema(rs, dialect, alwaysNullable = true,
>             isTimestampNTZ = options.preferTimestampNTZ)
>         } finally {
>           rs.close()
>         }
>       } finally {
>         statement.close()
>       }
>     } finally {
>       conn.close()
>     }
>   } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47123) JDBCRDD does not correctly handle errors in getQueryOutputSchema

2024-02-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47123:


Assignee: Pablo Langa Blanco

> JDBCRDD does not correctly handle errors in getQueryOutputSchema
> 
>
> Key: SPARK-47123
> URL: https://issues.apache.org/jira/browse/SPARK-47123
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Pablo Langa Blanco
>Assignee: Pablo Langa Blanco
>Priority: Minor
>  Labels: pull-request-available
>
> If there is an error executing statement.executeQuery(), it's possible that 
> another error in one of the finally statements makes us not see the main 
> error.
> {code:java}
> def getQueryOutputSchema(
>       query: String, options: JDBCOptions, dialect: JdbcDialect): StructType 
> = {
>     val conn: Connection = dialect.createConnectionFactory(options)(-1)
>     try {
>       val statement = conn.prepareStatement(query)
>       try {
>         statement.setQueryTimeout(options.queryTimeout)
>         val rs = statement.executeQuery()
>         try {
>           JdbcUtils.getSchema(rs, dialect, alwaysNullable = true,
>             isTimestampNTZ = options.preferTimestampNTZ)
>         } finally {
>           rs.close()
>         }
>       } finally {
>         statement.close()
>       }
>     } finally {
>       conn.close()
>     }
>   } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-47115) Use larger memory for Maven builds

2024-02-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-47115:
--
  Assignee: (was: Hyukjin Kwon)

> Use larger memory for Maven builds
> --
>
> Key: SPARK-47115
> URL: https://issues.apache.org/jira/browse/SPARK-47115
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> *** RUN ABORTED ***
> An exception or error caused a run to abort: unable to create native thread: 
> possibly out of memory or process/resource limits reached 
>   java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>   at java.base/java.lang.Thread.start0(Native Method)
>   at java.base/java.lang.Thread.start(Thread.java:1553)
>   at java.base/java.lang.System$2.start(System.java:2577)
>   at 
> java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
>   ...
> Warning:  The requested profile "volcano" could not be activated because it 
> does not exist.
> Warning:  The requested profile "hive" could not be activated because it does 
> not exist.
> Error:  Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
> spark-core_2.13: There are test failures -> [Help 1]
> Error:  
> Error:  To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> Error:  Re-run Maven using the -X switch to enable full debug logging.
> Error:  
> Error:  For more information about the errors and possible solutions, please 
> read the following articles:
> Error:  [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> Error:  
> Error:  After correcting the problems, you can resume the build with the 
> command
> Error:mvn  -rf :spark-core_2.13
> Error: Process completed with exit code 1.
> {code}
> https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47115) Use larger memory for Maven builds

2024-02-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47115:
-
Fix Version/s: (was: 4.0.0)

> Use larger memory for Maven builds
> --
>
> Key: SPARK-47115
> URL: https://issues.apache.org/jira/browse/SPARK-47115
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> *** RUN ABORTED ***
> An exception or error caused a run to abort: unable to create native thread: 
> possibly out of memory or process/resource limits reached 
>   java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>   at java.base/java.lang.Thread.start0(Native Method)
>   at java.base/java.lang.Thread.start(Thread.java:1553)
>   at java.base/java.lang.System$2.start(System.java:2577)
>   at 
> java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
>   ...
> Warning:  The requested profile "volcano" could not be activated because it 
> does not exist.
> Warning:  The requested profile "hive" could not be activated because it does 
> not exist.
> Error:  Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
> spark-core_2.13: There are test failures -> [Help 1]
> Error:  
> Error:  To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> Error:  Re-run Maven using the -X switch to enable full debug logging.
> Error:  
> Error:  For more information about the errors and possible solutions, please 
> read the following articles:
> Error:  [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> Error:  
> Error:  After correcting the problems, you can resume the build with the 
> command
> Error:mvn  -rf :spark-core_2.13
> Error: Process completed with exit code 1.
> {code}
> https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47115) Use larger memory for Maven builds

2024-02-22 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47115.
--
Resolution: Invalid

it doesn't help. reverted

> Use larger memory for Maven builds
> --
>
> Key: SPARK-47115
> URL: https://issues.apache.org/jira/browse/SPARK-47115
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> *** RUN ABORTED ***
> An exception or error caused a run to abort: unable to create native thread: 
> possibly out of memory or process/resource limits reached 
>   java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>   at java.base/java.lang.Thread.start0(Native Method)
>   at java.base/java.lang.Thread.start(Thread.java:1553)
>   at java.base/java.lang.System$2.start(System.java:2577)
>   at 
> java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
>   ...
> Warning:  The requested profile "volcano" could not be activated because it 
> does not exist.
> Warning:  The requested profile "hive" could not be activated because it does 
> not exist.
> Error:  Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
> spark-core_2.13: There are test failures -> [Help 1]
> Error:  
> Error:  To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> Error:  Re-run Maven using the -X switch to enable full debug logging.
> Error:  
> Error:  For more information about the errors and possible solutions, please 
> read the following articles:
> Error:  [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> Error:  
> Error:  After correcting the problems, you can resume the build with the 
> command
> Error:mvn  -rf :spark-core_2.13
> Error: Process completed with exit code 1.
> {code}
> https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47136) Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47136.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45220
[https://github.com/apache/spark/pull/45220]

> Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
> --
>
> Key: SPARK-47136
> URL: https://issues.apache.org/jira/browse/SPARK-47136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47136) Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47136:
-

Assignee: Dongjoon Hyun

> Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
> --
>
> Key: SPARK-47136
> URL: https://issues.apache.org/jira/browse/SPARK-47136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819834#comment-17819834
 ] 

Dylan Walker edited comment on SPARK-47134 at 2/22/24 10:32 PM:


[~bersprockets]

Hmm, it's possible I may have made too many assumptions.  I left out that this 
is on EMR, which does have its own fork of Spark.

If this is referring to names that don't exist in the Apache Spark codebase, 
this may be an Amazon thing.  I will reach out to AWS support to confirm, and 
apologies if this turns out to be the case.  Unfortunately, they don't do a 
great job at documenting the differences.


was (Author: JIRAUSER304364):
[~bersprockets]

Hmm, it's possible I may have made too many assumptions.  I left out that this 
is on EMR, which does have its own fork of Spark.

If this is referring to names that don't exist in the Apache Spark codebase, 
this may be an Amazon thing.  I will reach out to AWS support to confirm, and 
apologies if this turns out to be the case.

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819834#comment-17819834
 ] 

Dylan Walker commented on SPARK-47134:
--

[~bersprockets]

Hmm, it's possible I may have made too many assumptions.  I left out that this 
is on EMR, which does have its own fork of Spark.

If this is referring to names that don't exist in the Apache Spark codebase, 
this may be an Amazon thing.  I will reach out to AWS support to confirm, and 
apologies if this turns out to be the case.

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47137:
---
Labels: pull-request-available  (was: )

> Add getAll to spark.conf for feature parity with Scala
> --
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47135) Implement error classes for Kafka data loss exceptions

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47135:
---
Labels: pull-request-available  (was: )

> Implement error classes for Kafka data loss exceptions 
> ---
>
> Key: SPARK-47135
> URL: https://issues.apache.org/jira/browse/SPARK-47135
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: B. Micheal Okutubo
>Priority: Major
>  Labels: pull-request-available
>
> In the kafka connector code, we have several code that throws the java 
> *IllegalStateException* to report data loss, while reading from Kafka. We 
> want to properly classify those exceptions using the new error framework. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-02-22 Thread nirav patel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819830#comment-17819830
 ] 

nirav patel edited comment on SPARK-46762 at 2/22/24 10:13 PM:
---

I did some more digging into executor classloading and heap dump. Here's what I 
found:

with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue 
is not reproducible, ie everything works) I only see one instance of 
`org.apache.iceberg.Table` loaded

 

however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two 
instances of `org.apache.iceberg.Table` loaded:

here's stdout from executor on which I applied `verbose:class` :

 

 
{code:java}
[47.556s][info][class,load  ] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
 
[45.415s][info][class,load] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
{code}
I also confirmed above via heap dump. see attached screenshots.

Same class `org.apache.iceberg.Table` is loaded twice> once by 
ChildFirstUrlClassLoader and once by MutableURLClassLoader 


was (Author: tenstriker):
I did some more digging into executor classloading and heap dump. Here's what I 
found:

with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue 
is not reproducible, ie everything works) I only see one instance of 
`org.apache.iceberg.Table` loaded

 

however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two 
instances of `org.apache.iceberg.Table` loaded:

here's stdout from executor on which I applied `verbose:class` :

 

 
{code:java}
[47.556s][info][class,load  ] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
 
[45.415s][info][class,load] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
{code}
I also confirmed above via heap dump. see attached screenshots.

Same class is loaded twice> once by ChildFirstUrlClassLoader and once by 
MutableURLClassLoader 

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
> Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot 
> 2024-02-22 at 2.04.49 PM.png
>
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     

[jira] [Comment Edited] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-02-22 Thread nirav patel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819830#comment-17819830
 ] 

nirav patel edited comment on SPARK-46762 at 2/22/24 10:11 PM:
---

I did some more digging into executor classloading and heap dump. Here's what I 
found:

with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue 
is not reproducible, ie everything works) I only see one instance of 
`org.apache.iceberg.Table` loaded

 

however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two 
instances of `org.apache.iceberg.Table` loaded:

here's stdout from executor on which I applied `verbose:class` :

 

 
{code:java}
[47.556s][info][class,load  ] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
 
[45.415s][info][class,load] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
{code}
I also confirmed above via heap dump. see attached screenshots.

Same class is loaded twice> once by ChildFirstUrlClassLoader and once by 
MutableURLClassLoader 


was (Author: tenstriker):
I did some more digging into executor classloading and heap dump. Here's what I 
found:



with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue 
is not reproducible, ie everything works) I only see one instance of 
`org.apache.iceberg.Table` loaded

 

however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two 
instances of `org.apache.iceberg.Table` loaded:

here's stdout from executor on which I applied `verbose:class` :

 

 
{code:java}
[47.556s][info][class,load  ] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
 
[45.415s][info][class,load] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
{code}
I also confirmed above via heap dump. see attached screenshots.

Same class is loaded twice> once by ChildFirstUrlClassLoader and once by 
MutableURLClassLoader !Screenshot 2024-02-22 at 2.04.49 PM.png!

 

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
> Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot 
> 2024-02-22 at 2.04.49 PM.png
>
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at 

[jira] [Updated] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-02-22 Thread nirav patel (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

nirav patel updated SPARK-46762:

Attachment: Screenshot 2024-02-22 at 2.04.49 PM.png
Screenshot 2024-02-22 at 2.04.37 PM.png

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
> Attachments: Screenshot 2024-02-22 at 2.04.37 PM.png, Screenshot 
> 2024-02-22 at 2.04.49 PM.png
>
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>     at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>     at org.apache.spark.scheduler.Task.run(Task.scala:141)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>     at org.apach...{code}
>  
> `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
> `org.apache.iceberg.Table` and they are both in only one jar  
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 
> We verified that there's only one jar of 
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server 
> is started. 
> Looking more into Error it seems classloader itself is instantiated multiple 
> times somewhere. I can see two instances: 
> org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053 and 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943 
>  
> *Affected version:*
> spark 3.5 and spark-connect_2.12:3.5.0 works fine
>  
> *Not affected version and variation:*
> Spark 3.4 and spark-connect_2.12:3.4.0 works fine with external jar
> Also works with just Spark 3.5 spark-submit script directly (ie without using 
> spark-connect 3.5 )
>  
> Issue has been open with Iceberg as well: 
> [https://github.com/apache/iceberg/issues/8978]
> And been discussed in dev@org.apache.iceberg: 
> [https://lists.apache.org/thread/5q1pdqqrd1h06hgs8vx9ztt60z5yv8n1]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-46762) Spark Connect 3.5 Classloading issue with external jar

2024-02-22 Thread nirav patel (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-46762?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819830#comment-17819830
 ] 

nirav patel commented on SPARK-46762:
-

I did some more digging into executor classloading and heap dump. Here's what I 
found:



with spark 3.4 and iceberg-spark-runtime-3.4_2.12-1.3.1.jar (Case where issue 
is not reproducible, ie everything works) I only see one instance of 
`org.apache.iceberg.Table` loaded

 

however with spark 3.5 and iceberg-spark-runtime-3.5_2.12-1.4.3.jar I see two 
instances of `org.apache.iceberg.Table` loaded:

here's stdout from executor on which I applied `verbose:class` :

 

 
{code:java}
[47.556s][info][class,load  ] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
 
[45.415s][info][class,load] org.apache.iceberg.Table source: 
file:/hadoop/yarn/nm-local-dir/usercache/root/appcache/application_1708632053092_0004/container_1708632053092_0004_01_01/org.apache.iceberg_iceberg-spark-runtime-3.5_2.12-1.4.3.jar
{code}
I also confirmed above via heap dump. see attached screenshots.

Same class is loaded twice> once by ChildFirstUrlClassLoader and once by 
MutableURLClassLoader !Screenshot 2024-02-22 at 2.04.49 PM.png!

 

> Spark Connect 3.5 Classloading issue with external jar
> --
>
> Key: SPARK-46762
> URL: https://issues.apache.org/jira/browse/SPARK-46762
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: nirav patel
>Priority: Major
>
> We are having following `java.lang.ClassCastException` error in spark 
> Executors when using spark-connect 3.5 with external spark sql catalog jar - 
> iceberg-spark-runtime-3.5_2.12-1.4.3.jar
> We also set "spark.executor.userClassPathFirst=true" otherwise child class 
> gets loaded by MutableClassLoader and parent class gets loaded by 
> ChildFirstCLassLoader and that causes ClassCastException as well.
>  
> {code:java}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkException) Job aborted due to stage failure: Task 0 in 
> stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 
> (TID 3) (spark35-m.c.mycomp-dev-test.internal executor 2): 
> java.lang.ClassCastException: class 
> org.apache.iceberg.spark.source.SerializableTableWithSize cannot be cast to 
> class org.apache.iceberg.Table 
> (org.apache.iceberg.spark.source.SerializableTableWithSize is in unnamed 
> module of loader org.apache.spark.util.ChildFirstURLClassLoader @5e7ae053; 
> org.apache.iceberg.Table is in unnamed module of loader 
> org.apache.spark.util.ChildFirstURLClassLoader @4b18b943)
>     at 
> org.apache.iceberg.spark.source.SparkInputPartition.table(SparkInputPartition.java:88)
>     at 
> org.apache.iceberg.spark.source.RowDataReader.(RowDataReader.java:50)
>     at 
> org.apache.iceberg.spark.source.SparkRowReaderFactory.createReader(SparkRowReaderFactory.java:45)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.advanceToNextIter(DataSourceRDD.scala:84)
>     at 
> org.apache.spark.sql.execution.datasources.v2.DataSourceRDD$$anon$1.hasNext(DataSourceRDD.scala:63)
>     at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at scala.collection.Iterator$$anon$10.hasNext(Iterator.scala:460)
>     at 
> org.apache.spark.sql.execution.SparkPlan.$anonfun$getByteArrayRdd$1(SparkPlan.scala:388)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2(RDD.scala:890)
>     at 
> org.apache.spark.rdd.RDD.$anonfun$mapPartitionsInternal$2$adapted(RDD.scala:890)
>     at 
> org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:52)
>     at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:364)
>     at org.apache.spark.rdd.RDD.iterator(RDD.scala:328)
>     at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:93)
>     at 
> org.apache.spark.TaskContext.runTaskWithListeners(TaskContext.scala:161)
>     at org.apache.spark.scheduler.Task.run(Task.scala:141)
>     at 
> org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$4(Executor.scala:620)
>     at org.apach...{code}
>  
> `org.apache.iceberg.spark.source.SerializableTableWithSize` is a child of 
> `org.apache.iceberg.Table` and they are both in only one jar  
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` 
> We verified that there's only one jar of 
> `iceberg-spark-runtime-3.5_2.12-1.4.3.jar` loaded when spark-connect server 
> is started. 
> Looking more into Error it seems classloader itself is instantiated multiple 
> times 

[jira] [Updated] (SPARK-47137) Add getAll to spark.conf for feature parity with Scala

2024-02-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47137:
--
Summary: Add getAll to spark.conf for feature parity with Scala  (was: Add 
getAll for spark.conf for feature parity with Scala)

> Add getAll to spark.conf for feature parity with Scala
> --
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47137) Add getAll for spark.conf for feature parity with Scala

2024-02-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin updated SPARK-47137:
--
Summary: Add getAll for spark.conf for feature parity with Scala  (was: Add 
getAll for pyspark.sql.conf for feature parity with Scala)

> Add getAll for spark.conf for feature parity with Scala
> ---
>
> Key: SPARK-47137
> URL: https://issues.apache.org/jira/browse/SPARK-47137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47137) Add getAll for pyspark.sql.conf for feature parity with Scala

2024-02-22 Thread Takuya Ueshin (Jira)
Takuya Ueshin created SPARK-47137:
-

 Summary: Add getAll for pyspark.sql.conf for feature parity with 
Scala
 Key: SPARK-47137
 URL: https://issues.apache.org/jira/browse/SPARK-47137
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47136) Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47136:
--
Summary: Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` 
properly  (was: Use `ivyPath` param of `MavenUtils.loadIvySettings` in 
`MavenUtilsSuite`)

> Fix `MavenUtilsSuite` to use `MavenUtils.resolveMavenCoordinates` properly
> --
>
> Key: SPARK-47136
> URL: https://issues.apache.org/jira/browse/SPARK-47136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47136) Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47136:
--
Component/s: Tests

> Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
> 
>
> Key: SPARK-47136
> URL: https://issues.apache.org/jira/browse/SPARK-47136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47136) Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47136:
---
Labels: pull-request-available  (was: )

> Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
> 
>
> Key: SPARK-47136
> URL: https://issues.apache.org/jira/browse/SPARK-47136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47136) Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47136?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47136:
--
Summary: Use `ivyPath` param of `MavenUtils.loadIvySettings` in 
`MavenUtilsSuite`  (was: Use `ivyPath` parameter of 
`MavenUtils.loadIvySettings` in `MavenUtilsSuite`)

> Use `ivyPath` param of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`
> 
>
> Key: SPARK-47136
> URL: https://issues.apache.org/jira/browse/SPARK-47136
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47136) Use `ivyPath` parameter of `MavenUtils.loadIvySettings` in `MavenUtilsSuite`

2024-02-22 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47136:
-

 Summary: Use `ivyPath` parameter of `MavenUtils.loadIvySettings` 
in `MavenUtilsSuite`
 Key: SPARK-47136
 URL: https://issues.apache.org/jira/browse/SPARK-47136
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47069) Introduce `spark.profile.show/dump` for SparkSession-based profiling

2024-02-22 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-47069.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45129
[https://github.com/apache/spark/pull/45129]

> Introduce `spark.profile.show/dump` for SparkSession-based profiling
> 
>
> Key: SPARK-47069
> URL: https://issues.apache.org/jira/browse/SPARK-47069
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Introduce `spark.profile.show/dump` for SparkSession-based profiling



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Bruce Robbins (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819789#comment-17819789
 ] 

Bruce Robbins commented on SPARK-47134:
---

Oddly, I cannot reproduce on either 3.4.1 or 3.5.0.

Also, my 3.4.1 plan doesn't look like your 3.4.1 plan: My plan uses {{sum}}, 
your plan uses {{decimalsum}}. I can't find where {{decimalsum}} comes from in 
the code base, but maybe I am not looking hard enough.
{noformat}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")

scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++

scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").explain
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Sort [ct#19 ASC NULLS FIRST], true, 0
   +- Exchange rangepartitioning(ct#19 ASC NULLS FIRST, 200), 
ENSURE_REQUIREMENTS, [plan_id=68]
  +- HashAggregate(keys=[_1#2], functions=[sum(1.00)])
 +- Exchange hashpartitioning(_1#2, 200), ENSURE_REQUIREMENTS, 
[plan_id=65]
+- HashAggregate(keys=[_1#2], 
functions=[partial_sum(1.00)])
   +- LocalTableScan [_1#2]

scala> sql("select version()").show(false)
+--+
|version() |
+--+
|3.4.1 6b1ff22dde1ead51cbf370be6e48a802daae58b6|
+--+

scala> 
{noformat}

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779
 ] 

Xinrong Meng edited comment on SPARK-47132 at 2/22/24 7:21 PM:
---

[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

 

!image-2024-02-22-11-21-30-460.png!


was (Author: xinrongm):
[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png, 
> image-2024-02-22-11-21-30-460.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819779#comment-17819779
 ] 

Xinrong Meng commented on SPARK-47132:
--

[~wunderalbert] would you double check if you set up your Jira account 
correctly? I somehow couldn't assign the ticket to you.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819780#comment-17819780
 ] 

Xinrong Meng commented on SPARK-47132:
--

Resolved by https://github.com/apache/spark/pull/45197.

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Attachment: image-2024-02-22-11-18-02-429.png

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
> Attachments: image-2024-02-22-11-18-02-429.png
>
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Issue Type: Documentation  (was: Bug)

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47132:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Xinrong Meng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47132?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819777#comment-17819777
 ] 

Xinrong Meng commented on SPARK-47132:
--

I modified the ticket to Documentation (from Bug) and 4.0.0 (from 3.5.0).

> Mistake in Docstring for Pyspark's Dataframe.head()
> ---
>
> Key: SPARK-47132
> URL: https://issues.apache.org/jira/browse/SPARK-47132
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Albert Ziegler
>Priority: Trivial
>  Labels: pull-request-available
>   Original Estimate: 0.5h
>  Remaining Estimate: 0.5h
>
> The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
> list of rows) iff n == 1, but that's incorrect.
> Type hints, example, and implementation show that the difference between row 
> or list of rows lies in whether n is supplied at all -- if it isn't, 
> {{head()}} returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} 
> returns a list.
>  
> A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47135) Implement error classes for Kafka data loss exceptions

2024-02-22 Thread B. Micheal Okutubo (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819771#comment-17819771
 ] 

B. Micheal Okutubo commented on SPARK-47135:


I'm working on this. Will send PR soon.

> Implement error classes for Kafka data loss exceptions 
> ---
>
> Key: SPARK-47135
> URL: https://issues.apache.org/jira/browse/SPARK-47135
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: B. Micheal Okutubo
>Priority: Major
>
> In the kafka connector code, we have several code that throws the java 
> *IllegalStateException* to report data loss, while reading from Kafka. We 
> want to properly classify those exceptions using the new error framework. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47135) Implement error classes for Kafka data loss exceptions

2024-02-22 Thread B. Micheal Okutubo (Jira)
B. Micheal Okutubo created SPARK-47135:
--

 Summary: Implement error classes for Kafka data loss exceptions 
 Key: SPARK-47135
 URL: https://issues.apache.org/jira/browse/SPARK-47135
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 4.0.0
Reporter: B. Micheal Okutubo


In the kafka connector code, we have several code that throws the java 
*IllegalStateException* to report data loss, while reading from Kafka. We want 
to properly classify those exceptions using the new error framework. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47001) Pushdown Verification in Optimizer.scala should support changed data types

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47001?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47001:
---
Labels: pull-request-available  (was: )

> Pushdown Verification in Optimizer.scala should support changed data types
> --
>
> Key: SPARK-47001
> URL: https://issues.apache.org/jira/browse/SPARK-47001
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>  Labels: pull-request-available
>
> When pushing a filter down in a union the data type may not match exactly if 
> the filter was constructed using the child dataframe reference. This is 
> because the unions output is updated with a structype merge of union which 
> can turn non-nullable to nullable. These are still the same column despite 
> the different nullability so the filter should be safe to push down. As it 
> currently stands we get an exception.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy with SSL enabled

2024-02-22 Thread Filippo Monari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filippo Monari updated SPARK-47133:
---
Environment: 
* We are running Spark in stand-alone mode, on Kubernetes.
 * The containers are based on Debian 11 (minideb)
 * The Spark version is 3.5

  was:
* We are running Spark in stand-alone mode, on Kubernetes.
 * The containers are based on Debian 11 (minideb)
 * The Spark version is 3.5

Please do not hesitate to ask further information if needed.


> java.lang.NullPointerException: Missing SslContextFactory when accessing 
> Worker WebUI from Master as reverse proxy with SSL enabled
> ---
>
> Key: SPARK-47133
> URL: https://issues.apache.org/jira/browse/SPARK-47133
> Project: Spark
>  Issue Type: Question
>  Components: Web UI
>Affects Versions: 3.5.0
> Environment: * We are running Spark in stand-alone mode, on 
> Kubernetes.
>  * The containers are based on Debian 11 (minideb)
>  * The Spark version is 3.5
>Reporter: Filippo Monari
>Priority: Major
>
> Hi,
>  
> We are encountering the error described here below.
> If SSL/TLS is enabled on both, Master and Worker, it is not possible to 
> access the WebUI of the latter from the former configured as reverse proxy. 
> The returned error is the the following.
> {code:java}
> HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory
> URI:/proxy/worker-20240222171308-10.113.3.1-34959
> STATUS:500
> MESSAGE:java.lang.NullPointerException: Missing SslContextFactory
> SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54
> CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory
> Caused by:java.lang.NullPointerException: Missing SslContextFactory
>   at java.base/java.util.Objects.requireNonNull(Objects.java:235)
>   at 
> org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57)
>   at 
> org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273)
>   at 
> org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279)
>   at 
> org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209)
>   at 
> org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215)
>   at 
> org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100)
>   at 
> org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25)
>   at 
> org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32)
>   at 
> org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54)
>   at 
> org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597)
>   at 
> java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
>   at 
> org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593)
>   at 
> org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571)
>   at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626)
>   at 
> org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780)
>   at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767)
>   at 
> org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618)
>   at 
> org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
>   at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>   at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>   at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
>   at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
>   at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
>   at 
> 

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Attachment: 321queryplan.txt

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Attachment: 341queryplan.txt

> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
> Attachments: 321queryplan.txt, 341queryplan.txt
>
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> +---+
> | ct|
> +---+
> |   null|
> |9508.00|
> +---+
> {code}
> This is fairly delicate:
>  - removing the {{ORDER BY}} clause produces the correct result
>  - removing the {{CAST}} produces the correct result
>  - changing the number of 0s in the argument to {{SUM}} produces the correct 
> result
>  - setting {{spark.ansi.enabled}} to {{true}} produces the correct result 
> (and does not throw an error)
> Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
> result in the unexpected nulls.
> Please let me know if you need additional information.
> We are also interested in understanding whether setting 
> {{spark.ansi.enabled}} can be considered a reliable workaround to this issue 
> prior to a fix being released, if possible.
> Text files that include {{explain()}} output attached.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
{code}

This is fairly delicate:
 - removing the {{ORDER BY}} clause produces the correct result
 - removing the {{CAST}} produces the correct result
 - changing the number of 0s in the argument to {{SUM}} produces the correct 
result
 - setting {{spark.ansi.enabled}} to {{true}} produces the correct result (and 
does not throw an error)

Also, removing the {{ORDER BY}}, but writing {{ds}} to a parquet will also 
result in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting {{spark.ansi.enabled}} 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |  ct|
> ++
> | 9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 

[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy with SSL enabled

2024-02-22 Thread Filippo Monari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filippo Monari updated SPARK-47133:
---
Summary: java.lang.NullPointerException: Missing SslContextFactory when 
accessing Worker WebUI from Master as reverse proxy with SSL enabled  (was: 
java.lang.NullPointerException: Missing SslContextFactory when accessing Worker 
WebUI from Master as reverse proxy)

> java.lang.NullPointerException: Missing SslContextFactory when accessing 
> Worker WebUI from Master as reverse proxy with SSL enabled
> ---
>
> Key: SPARK-47133
> URL: https://issues.apache.org/jira/browse/SPARK-47133
> Project: Spark
>  Issue Type: Question
>  Components: Web UI
>Affects Versions: 3.5.0
> Environment: * We are running Spark in stand-alone mode, on 
> Kubernetes.
>  * The containers are based on Debian 11 (minideb)
>  * The Spark version is 3.5
> Please do not hesitate to ask further information if needed.
>Reporter: Filippo Monari
>Priority: Major
>
> Hi,
>  
> We are encountering the error described here below.
> If SSL/TLS is enabled on both, Master and Worker, it is not possible to 
> access the WebUI of the latter from the former configured as reverse proxy. 
> The returned error is the the following.
> {code:java}
> HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory
> URI:/proxy/worker-20240222171308-10.113.3.1-34959
> STATUS:500
> MESSAGE:java.lang.NullPointerException: Missing SslContextFactory
> SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54
> CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory
> Caused by:java.lang.NullPointerException: Missing SslContextFactory
>   at java.base/java.util.Objects.requireNonNull(Objects.java:235)
>   at 
> org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57)
>   at 
> org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273)
>   at 
> org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279)
>   at 
> org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209)
>   at 
> org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215)
>   at 
> org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100)
>   at 
> org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25)
>   at 
> org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32)
>   at 
> org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54)
>   at 
> org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597)
>   at 
> java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
>   at 
> org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593)
>   at 
> org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571)
>   at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626)
>   at 
> org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780)
>   at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767)
>   at 
> org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618)
>   at 
> org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114)
>   at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
>   at 
> org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
>   at 
> org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
>   at 
> org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
>   at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
>   at 
> org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
>   at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
>   at 
> org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
>   at 
> org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
>   at 

[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy

2024-02-22 Thread Filippo Monari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filippo Monari updated SPARK-47133:
---
Description: 
Hi,

 

We are encountering the error described here below.

If SSL/TLS is enabled on both, Master and Worker, it is not possible to access 
the WebUI of the latter from the former configured as reverse proxy. The 
returned error is the the following.
{code:java}
HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory

URI:/proxy/worker-20240222171308-10.113.3.1-34959
STATUS:500
MESSAGE:java.lang.NullPointerException: Missing SslContextFactory
SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54
CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory

Caused by:java.lang.NullPointerException: Missing SslContextFactory
at java.base/java.util.Objects.requireNonNull(Objects.java:235)
at 
org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215)
at 
org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100)
at 
org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25)
at 
org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32)
at 
org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54)
at 
org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597)
at 
java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571)
at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626)
at 
org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780)
at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767)
at 
org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618)
at 
org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at 
org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at 
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at 
org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at 

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

*Setup:*

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 
*Spark 3.2.1 behaviour (correct):*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

*Spark 3.4.1 / Spark 3.5.0 behaviour:*

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 

Spark 3.2.1 behaviour (correct):

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

Spark 3.4.1 / Spark 3.5.0 behaviour:

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
> *Setup:*
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> *Spark 3.2.1 behaviour (correct):*
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |ct|
> ++
> |9508.00|
> |13879.00|
> ++
> {code}
> *Spark 3.4.1 / Spark 3.5.0 behaviour:*
> {code:scala}
> scala> spark.sql("select 

[jira] [Updated] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47134?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dylan Walker updated SPARK-47134:
-
Description: 
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

{code:scala}
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
{code}
 

Spark 3.2.1 behaviour (correct):

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|ct|

++
|9508.00|
|13879.00|

++
{code}

Spark 3.4.1 / Spark 3.5.0 behaviour:

{code:scala}
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
|ct|

+---+
|null|
|9508.00|

+---+
{code}

This is fairly delicate:
 - removing the `ORDER BY` clause produces the correct result
 - removing the `CAST` produces the correct result
 - changing the number of 0s in the argument to `SUM` produces the correct 
result
 - setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 

  was:
In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

```
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
```

Spark 3.2.1 behaviour (correct):

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
```

Spark 3.4.1 / Spark 3.5.0 behaviour:

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
```

This is fairly delicate:

- removing the `ORDER BY` clause produces the correct result
- removing the `CAST` produces the correct result
- changing the number of 0s in the argument to `SUM` produces the correct result
- setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 


> Unexpected nulls when casting decimal values in specific cases
> --
>
> Key: SPARK-47134
> URL: https://issues.apache.org/jira/browse/SPARK-47134
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Dylan Walker
>Priority: Major
>
> In specific cases, casting decimal values can result in `null` values where 
> no overflow exists.
>  
> The cases appear very specific, and I don't have the depth of knowledge to 
> generalize this issue, so here is a simple spark-shell reproduction:
>  
> Setup:
> {code:scala}
> scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", 
> x)).toDS
> ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]
> scala> ds.createOrReplaceTempView("t")
> {code}
>  
> Spark 3.2.1 behaviour (correct):
> {code:scala}
> scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
> FROM t GROUP BY `_1` ORDER BY ct ASC").show()
> ++
> |ct|
> ++
> |9508.00|
> |13879.00|
> ++
> {code}
> Spark 3.4.1 / Spark 3.5.0 behaviour:
> {code:scala}
> scala> spark.sql("select 

[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy

2024-02-22 Thread Filippo Monari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filippo Monari updated SPARK-47133:
---
Description: 
Hi,

 

We are encountering the error described here below.

If SSL/TLS is enabled on both, Master and Worker, it is not possible to access 
the WebUI of the latter from the former configured as reverse proxy. The 
returned error is the the following.
{code:java}
HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory

URI:/proxy/worker-20240222171308-10.113.3.1-34959
STATUS:500
MESSAGE:java.lang.NullPointerException: Missing SslContextFactory
SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54
CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory

Caused by:java.lang.NullPointerException: Missing SslContextFactory
at java.base/java.util.Objects.requireNonNull(Objects.java:235)
at 
org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215)
at 
org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100)
at 
org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25)
at 
org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32)
at 
org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54)
at 
org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597)
at 
java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571)
at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626)
at 
org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780)
at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767)
at 
org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618)
at 
org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at 
org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at 
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at 
org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at 

[jira] [Created] (SPARK-47134) Unexpected nulls when casting decimal values in specific cases

2024-02-22 Thread Dylan Walker (Jira)
Dylan Walker created SPARK-47134:


 Summary: Unexpected nulls when casting decimal values in specific 
cases
 Key: SPARK-47134
 URL: https://issues.apache.org/jira/browse/SPARK-47134
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.1
Reporter: Dylan Walker


In specific cases, casting decimal values can result in `null` values where no 
overflow exists.

 

The cases appear very specific, and I don't have the depth of knowledge to 
generalize this issue, so here is a simple spark-shell reproduction:

 

Setup:

```
scala> val ds = 0.to(23386).map(x => if (x > 13878) ("A", x) else ("B", x)).toDS
ds: org.apache.spark.sql.Dataset[(String, Int)] = [_1: string, _2: int]

scala> ds.createOrReplaceTempView("t")
```

Spark 3.2.1 behaviour (correct):

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
++
|  ct|
++
| 9508.00|
|13879.00|
++
```

Spark 3.4.1 / Spark 3.5.0 behaviour:

```
scala> spark.sql("select CAST(SUM(1.00) AS DECIMAL(28,14)) as ct 
FROM t GROUP BY `_1` ORDER BY ct ASC").show()
+---+
| ct|
+---+
|   null|
|9508.00|
+---+
```

This is fairly delicate:

- removing the `ORDER BY` clause produces the correct result
- removing the `CAST` produces the correct result
- changing the number of 0s in the argument to `SUM` produces the correct result
- setting `spark.ansi.enabled` to `true` produces the correct result (and does 
not throw an error)

Also, removing the `ORDER BY`, but writing `ds` to a parquet will also result 
in the unexpected nulls.

Please let me know if you need additional information.

We are also interested in understanding whether setting `spark.ansi.enabled` 
can be considered a reliable workaround to this issue prior to a fix being 
released, if possible.
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy

2024-02-22 Thread Filippo Monari (Jira)
Filippo Monari created SPARK-47133:
--

 Summary: java.lang.NullPointerException: Missing SslContextFactory 
when accessing Worker WebUI from Master as reverse proxy
 Key: SPARK-47133
 URL: https://issues.apache.org/jira/browse/SPARK-47133
 Project: Spark
  Issue Type: Question
  Components: Web UI
Affects Versions: 3.5.0
 Environment: * We are running Spark in stand-alone mode, on Kubernetes.
 * The containers are based on Debian 11 (minideb)
 * The Spark version is 3.5

Please do not hesitate to ask further information if needed.
Reporter: Filippo Monari


Hi,

we are encountering the error described here below.

If SSL/TLS is enabled on both, Master and Worker, it is not possible to access 
the WebUI of the latter from the former configured as reverse proxy. The 
returned error is the the following.

 
{code:java}
HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory

URI:/proxy/worker-20240222171308-10.113.3.1-34959
STATUS:500
MESSAGE:java.lang.NullPointerException: Missing SslContextFactory
SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54
CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory

Caused by:java.lang.NullPointerException: Missing SslContextFactory
at java.base/java.util.Objects.requireNonNull(Objects.java:235)
at 
org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215)
at 
org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100)
at 
org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25)
at 
org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32)
at 
org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54)
at 
org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597)
at 
java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571)
at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626)
at 
org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780)
at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767)
at 
org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618)
at 
org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)

[jira] [Updated] (SPARK-47133) java.lang.NullPointerException: Missing SslContextFactory when accessing Worker WebUI from Master as reverse proxy

2024-02-22 Thread Filippo Monari (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Filippo Monari updated SPARK-47133:
---
Description: 
Hi,

we are encountering the error described here below.

If SSL/TLS is enabled on both, Master and Worker, it is not possible to access 
the WebUI of the latter from the former configured as reverse proxy. The 
returned error is the the following.
{code:java}
HTTP ERROR 500 java.lang.NullPointerException: Missing SslContextFactory

URI:/proxy/worker-20240222171308-10.113.3.1-34959
STATUS:500
MESSAGE:java.lang.NullPointerException: Missing SslContextFactory
SERVLET:org.apache.spark.ui.JettyUtils$$anon$3-7d068d54
CAUSED BY:java.lang.NullPointerException: Missing SslContextFactory

Caused by:java.lang.NullPointerException: Missing SslContextFactory
at java.base/java.util.Objects.requireNonNull(Objects.java:235)
at 
org.sparkproject.jetty.io.ssl.SslClientConnectionFactory.(SslClientConnectionFactory.java:57)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1273)
at 
org.sparkproject.jetty.client.HttpClient.newSslClientConnectionFactory(HttpClient.java:1279)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:209)
at 
org.sparkproject.jetty.client.HttpDestination.newSslClientConnectionFactory(HttpDestination.java:215)
at 
org.sparkproject.jetty.client.HttpDestination.(HttpDestination.java:100)
at 
org.sparkproject.jetty.client.PoolingHttpDestination.(PoolingHttpDestination.java:25)
at 
org.sparkproject.jetty.client.http.HttpDestinationOverHTTP.(HttpDestinationOverHTTP.java:32)
at 
org.sparkproject.jetty.client.http.HttpClientTransportOverHTTP.newHttpDestination(HttpClientTransportOverHTTP.java:54)
at 
org.sparkproject.jetty.client.HttpClient.lambda$resolveDestination$0(HttpClient.java:597)
at 
java.base/java.util.concurrent.ConcurrentHashMap.compute(ConcurrentHashMap.java:1916)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:593)
at 
org.sparkproject.jetty.client.HttpClient.resolveDestination(HttpClient.java:571)
at org.sparkproject.jetty.client.HttpClient.send(HttpClient.java:626)
at 
org.sparkproject.jetty.client.HttpRequest.sendAsync(HttpRequest.java:780)
at org.sparkproject.jetty.client.HttpRequest.send(HttpRequest.java:767)
at 
org.sparkproject.jetty.proxy.AbstractProxyServlet.sendProxyRequest(AbstractProxyServlet.java:618)
at 
org.sparkproject.jetty.proxy.ProxyServlet.service(ProxyServlet.java:114)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:590)
at 
org.sparkproject.jetty.servlet.ServletHolder.handle(ServletHolder.java:799)
at 
org.sparkproject.jetty.servlet.ServletHandler$ChainEnd.doFilter(ServletHandler.java:1656)
at 
org.apache.spark.ui.HttpSecurityFilter.doFilter(HttpSecurityFilter.scala:95)
at 
org.sparkproject.jetty.servlet.FilterHolder.doFilter(FilterHolder.java:193)
at 
org.sparkproject.jetty.servlet.ServletHandler$Chain.doFilter(ServletHandler.java:1626)
at 
org.sparkproject.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:552)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextHandle(ScopedHandler.java:233)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1440)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:188)
at 
org.sparkproject.jetty.servlet.ServletHandler.doScope(ServletHandler.java:505)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.nextScope(ScopedHandler.java:186)
at 
org.sparkproject.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1355)
at 
org.sparkproject.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.sparkproject.jetty.server.handler.gzip.GzipHandler.handle(GzipHandler.java:772)
at 
org.sparkproject.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:234)
at 
org.sparkproject.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:127)
at org.sparkproject.jetty.server.Server.handle(Server.java:516)
at 
org.sparkproject.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:487)
at 
org.sparkproject.jetty.server.HttpChannel.dispatch(HttpChannel.java:732)
at 
org.sparkproject.jetty.server.HttpChannel.handle(HttpChannel.java:479)
at 
org.sparkproject.jetty.server.HttpConnection.onFillable(HttpConnection.java:277)
at 
org.sparkproject.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:311)
at 
org.sparkproject.jetty.io.FillInterest.fillable(FillInterest.java:105)
at 

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43259:


Assignee: Mihailo Milosevic

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Mihailo Milosevic
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43259.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45095
[https://github.com/apache/spark/pull/45095]

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Mihailo Milosevic
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47131) contains, startswith, endswith

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47131:
---
Labels: pull-request-available  (was: )

> contains, startswith, endswith
> --
>
> Key: SPARK-47131
> URL: https://issues.apache.org/jira/browse/SPARK-47131
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Refactored built-in string functions to enable collation support for: 
> {_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now 
> be able to use COLLATE within arguments for built-in string functions: 
> CONTAINS, STARTSWITH, ENDSWITH in Spark SQL queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag

2024-02-22 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47102:
--
Description: 
*What changes were proposed in this pull request?*
This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error 
class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of 
feature under development. 

*Why are the changes needed?*
We want to make collations configurable on this some flag. These changes 
disable usage of `collate` and `collation` functions, along with any `COLLATE` 
syntax when the flag is set to false. By default, the flag is set to false.

  was:
### What changes were proposed in this pull request?
This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error 
class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of 
feature under development. 

### Why are the changes needed?
We want to make collations configurable on this some flag. These changes 
disable usage of `collate` and `collation` functions, along with any `COLLATE` 
syntax when the flag is set to false. By default, the flag is set to false.


> Add COLLATION_ENABLED config flag
> -
>
> Key: SPARK-47102
> URL: https://issues.apache.org/jira/browse/SPARK-47102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error 
> class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of 
> feature under development. 
> *Why are the changes needed?*
> We want to make collations configurable on this some flag. These changes 
> disable usage of `collate` and `collation` functions, along with any 
> `COLLATE` syntax when the flag is set to false. By default, the flag is set 
> to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag

2024-02-22 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47102:
--
Description: 
### What changes were proposed in this pull request?
This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error 
class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of 
feature under development. 

### Why are the changes needed?
We want to make collations configurable on this some flag. These changes 
disable usage of `collate` and `collation` functions, along with any `COLLATE` 
syntax when the flag is set to false. By default, the flag is set to false.

> Add COLLATION_ENABLED config flag
> -
>
> Key: SPARK-47102
> URL: https://issues.apache.org/jira/browse/SPARK-47102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> ### What changes were proposed in this pull request?
> This PR adds COLLATION_ENABLED config to `SQLConf` and introduces new error 
> class `COLLATION_SUPPORT_DISABLED` to appropriately report error on usage of 
> feature under development. 
> ### Why are the changes needed?
> We want to make collations configurable on this some flag. These changes 
> disable usage of `collate` and `collation` functions, along with any 
> `COLLATE` syntax when the flag is set to false. By default, the flag is set 
> to false.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47102:
---
Labels: pull-request-available  (was: )

> Add COLLATION_ENABLED config flag
> -
>
> Key: SPARK-47102
> URL: https://issues.apache.org/jira/browse/SPARK-47102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47132) Mistake in Docstring for Pyspark's Dataframe.head()

2024-02-22 Thread Albert Ziegler (Jira)
Albert Ziegler created SPARK-47132:
--

 Summary: Mistake in Docstring for Pyspark's Dataframe.head()
 Key: SPARK-47132
 URL: https://issues.apache.org/jira/browse/SPARK-47132
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Albert Ziegler


The docstring claims that {{head(n)}} would return a {{Row}} (rather than a 
list of rows) iff n == 1, but that's incorrect.

Type hints, example, and implementation show that the difference between row or 
list of rows lies in whether n is supplied at all -- if it isn't, {{head()}} 
returns a {{{}Row{}}}, if it is, even if it is 1, {{head(n)}} returns a list.

 

A suggestion to fix is here: https://github.com/apache/spark/pull/45197



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47131) contains, startswith, endswith

2024-02-22 Thread Jira
Uroš Bojanić created SPARK-47131:


 Summary: contains, startswith, endswith
 Key: SPARK-47131
 URL: https://issues.apache.org/jira/browse/SPARK-47131
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Uroš Bojanić


Refactored built-in string functions to enable collation support for: 
{_}contains{_}, {_}startsWith{_}, {_}endsWith{_}. Spark SQL users should now be 
able to use COLLATE within arguments for built-in string functions: CONTAINS, 
STARTSWITH, ENDSWITH in Spark SQL queries.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46975) Support dedicated fallback methods

2024-02-22 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46975?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-46975:
--
Summary: Support dedicated fallback methods  (was: Move to_{hdf, feather, 
stata} to the fallback list)

> Support dedicated fallback methods
> --
>
> Key: SPARK-46975
> URL: https://issues.apache.org/jira/browse/SPARK-46975
> Project: Spark
>  Issue Type: Sub-task
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175

2024-02-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-42328:


Assignee: Nikola Mandic

> Assign name to _LEGACY_ERROR_TEMP_1175
> --
>
> Key: SPARK-42328
> URL: https://issues.apache.org/jira/browse/SPARK-42328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175

2024-02-22 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-42328.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45183
[https://github.com/apache/spark/pull/45183]

> Assign name to _LEGACY_ERROR_TEMP_1175
> --
>
> Key: SPARK-42328
> URL: https://issues.apache.org/jira/browse/SPARK-42328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47130?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47130:
---
Labels: pull-request-available  (was: )

> Use listStatus to bypass block location info when cleaning driver logs
> --
>
> Key: SPARK-47130
> URL: https://issues.apache.org/jira/browse/SPARK-47130
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47129) Make ResolveRelations cache connect plan properly

2024-02-22 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-47129:
--
Summary: Make ResolveRelations cache connect plan properly  (was: Make 
ResolveRelations handle planId properly)

> Make ResolveRelations cache connect plan properly
> -
>
> Key: SPARK-47129
> URL: https://issues.apache.org/jira/browse/SPARK-47129
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47130) Use listStatus to bypass block location info when cleaning driver logs

2024-02-22 Thread Kent Yao (Jira)
Kent Yao created SPARK-47130:


 Summary: Use listStatus to bypass block location info when 
cleaning driver logs
 Key: SPARK-47130
 URL: https://issues.apache.org/jira/browse/SPARK-47130
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47129) Make ResolveRelations handle planId properly

2024-02-22 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47129:
---
Labels: pull-request-available  (was: )

> Make ResolveRelations handle planId properly
> 
>
> Key: SPARK-47129
> URL: https://issues.apache.org/jira/browse/SPARK-47129
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47128) Improve `spark.sql.hive.metastore.sharedPrefixes` default value

2024-02-22 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47128?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47128:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Improvement)

> Improve `spark.sql.hive.metastore.sharedPrefixes` default value
> ---
>
> Key: SPARK-47128
> URL: https://issues.apache.org/jira/browse/SPARK-47128
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org