[jira] [Updated] (SPARK-45351) Change RocksDB as default shuffle service db backend

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45351:
---
Labels: pull-request-available  (was: )

> Change RocksDB as default shuffle service db backend
> 
>
> Key: SPARK-45351
> URL: https://issues.apache.org/jira/browse/SPARK-45351
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Change RocksDB as default shuffle service db backend, because we will remove 
> leveldb in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45351) Change RocksDB as default shuffle service db backend

2023-09-26 Thread Jia Fan (Jira)
Jia Fan created SPARK-45351:
---

 Summary: Change RocksDB as default shuffle service db backend
 Key: SPARK-45351
 URL: https://issues.apache.org/jira/browse/SPARK-45351
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Jia Fan


Change RocksDB as default shuffle service db backend, because we will remove 
leveldb in the future.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45340) Remove the SQL config spark.sql.hive.verifyPartitionPath

2023-09-26 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45340.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43130
[https://github.com/apache/spark/pull/43130]

> Remove the SQL config spark.sql.hive.verifyPartitionPath
> 
>
> Key: SPARK-45340
> URL: https://issues.apache.org/jira/browse/SPARK-45340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The SQL config spark.sql.hive.verifyPartitionPath has been deprecated a quite 
> a while in version 3.0. Can be removed in the version 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45350) Rename the imported Java Boolean to JBoolean

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45350:


 Summary: Rename the imported Java Boolean to JBoolean
 Key: SPARK-45350
 URL: https://issues.apache.org/jira/browse/SPARK-45350
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie


Some places have used `import java.lang.Boolean` for the import of Java Boolean 
type, which can easily cause ambiguity, it should be renamed to JBoolean.
 
 
 
 
 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44681) Solve issue referencing github.com/apache/spark-connect-go as Go library

2023-09-26 Thread BoYang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BoYang resolved SPARK-44681.

   Fix Version/s: 3.4.0
Target Version/s: 3.5.0
  Resolution: Fixed

> Solve issue referencing github.com/apache/spark-connect-go as Go library
> 
>
> Key: SPARK-44681
> URL: https://issues.apache.org/jira/browse/SPARK-44681
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect Contrib
>Affects Versions: 3.5.0
>Reporter: BoYang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44780) Document SQL Session variables

2023-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-44780.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42467
[https://github.com/apache/spark/pull/42467]

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: Screenshot 2023-08-11 at 10.22.55 PM.png, Screenshot 
> 2023-08-11 at 10.24.33 PM.png, Screenshot 2023-08-11 at 10.26.54 PM.png
>
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44780) Document SQL Session variables

2023-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-44780:
---

Assignee: Serge Rielau

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2023-08-11 at 10.22.55 PM.png, Screenshot 
> 2023-08-11 at 10.24.33 PM.png, Screenshot 2023-08-11 at 10.26.54 PM.png
>
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44780) Document SQL Session variables

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44780?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44780:
---
Labels: pull-request-available  (was: )

> Document SQL Session variables
> --
>
> Key: SPARK-44780
> URL: https://issues.apache.org/jira/browse/SPARK-44780
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 3.4.2
>Reporter: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screenshot 2023-08-11 at 10.22.55 PM.png, Screenshot 
> 2023-08-11 at 10.24.33 PM.png, Screenshot 2023-08-11 at 10.26.54 PM.png
>
>
> SQL Session variables have been added with: SPARK-42849.
> Here we add the docs for it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45338) Remove scala.collection.JavaConverters

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45338:
-
Parent Issue: SPARK-45314  (was: SPARK-44111)

> Remove scala.collection.JavaConverters
> --
>
> Key: SPARK-45338
> URL: https://issues.apache.org/jira/browse/SPARK-45338
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Remove deprecated scala.collection.JavaConverters, replaced by 
> scala.jdk.CollectionConverters



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43850) Remove the import for scala.language.higherKinds and delete the corresponding suppression rule

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-43850.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43128
[https://github.com/apache/spark/pull/43128]

> Remove the import for scala.language.higherKinds and delete the corresponding 
> suppression rule
> --
>
> Key: SPARK-43850
> URL: https://issues.apache.org/jira/browse/SPARK-43850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43850) Remove the import for scala.language.higherKinds and delete the corresponding suppression rule

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-43850:


Assignee: Yang Jie

> Remove the import for scala.language.higherKinds and delete the corresponding 
> suppression rule
> --
>
> Key: SPARK-43850
> URL: https://issues.apache.org/jira/browse/SPARK-43850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45349) Backport SPARK-44034 and SPARK-44074 to branch-3.4/banch-3.3

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45349?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45349:
---
Labels: pull-request-available  (was: )

> Backport SPARK-44034 and SPARK-44074 to branch-3.4/banch-3.3
> 
>
> Key: SPARK-45349
> URL: https://issues.apache.org/jira/browse/SPARK-45349
> Project: Spark
>  Issue Type: Task
>  Components: Tests
>Affects Versions: 3.4.2, 3.3.4
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> Improve the success rate of CI 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45349) Backport SPARK-44034 and SPARK-44074 to branch-3.4/banch-3.3

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45349:


 Summary: Backport SPARK-44034 and SPARK-44074 to 
branch-3.4/banch-3.3
 Key: SPARK-45349
 URL: https://issues.apache.org/jira/browse/SPARK-45349
 Project: Spark
  Issue Type: Task
  Components: Tests
Affects Versions: 3.4.2, 3.3.4
Reporter: Yang Jie


Improve the success rate of CI 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45348) Make the Maven build in GitHub Action check "javadoc:javadoc".

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45348?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45348:
---
Labels: pull-request-available  (was: )

> Make the Maven build in GitHub Action check "javadoc:javadoc".
> --
>
> Key: SPARK-45348
> URL: https://issues.apache.org/jira/browse/SPARK-45348
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44223) Drop leveldb support

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44223:
---
Labels: pull-request-available  (was: )

> Drop leveldb support
> 
>
> Key: SPARK-44223
> URL: https://issues.apache.org/jira/browse/SPARK-44223
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> The leveldb project seems to be no longer maintained, and we can always 
> replace it with rocksdb. I think we can remove support and dependencies on 
> leveldb in Spark 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45334) Remove misleading comment in parquetSchemaConverter

2023-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-45334:


Assignee: Mengran Lan

> Remove misleading comment in parquetSchemaConverter
> ---
>
> Key: SPARK-45334
> URL: https://issues.apache.org/jira/browse/SPARK-45334
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Mengran Lan
>Assignee: Mengran Lan
>Priority: Trivial
>  Labels: pull-request-available
>
> I'm debugging a parquet issue and reading spark code as references. Happened 
> to find a misleading comment which remains in the latest version as well.
> {code:java}
> Types
>   .buildGroup(repetition).as(LogicalTypeAnnotation.listType())
>   .addField(Types
> .buildGroup(REPEATED)
> // "array" is the name chosen by parquet-hive (1.7.0 and prior version)
> .addField(convertField(StructField("array", elementType, nullable)))
> .named("bag"))
>   .named(field.name) {code}
> the comment above is misleading since Hive always uses "array_element" as the 
> name.
> It is imported by this PR [https://github.com/apache/spark/pull/14399] and 
> relates to this issue https://issues.apache.org/jira/browse/SPARK-16777
> Furthermore, the parquet-hive module has been removed from the parquet-mr 
> project https://issues.apache.org/jira/browse/PARQUET-1676 
> I suggest removing this piece of comment and will submit a PR later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45334) Remove misleading comment in parquetSchemaConverter

2023-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-45334.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43119
[https://github.com/apache/spark/pull/43119]

> Remove misleading comment in parquetSchemaConverter
> ---
>
> Key: SPARK-45334
> URL: https://issues.apache.org/jira/browse/SPARK-45334
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Mengran Lan
>Assignee: Mengran Lan
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I'm debugging a parquet issue and reading spark code as references. Happened 
> to find a misleading comment which remains in the latest version as well.
> {code:java}
> Types
>   .buildGroup(repetition).as(LogicalTypeAnnotation.listType())
>   .addField(Types
> .buildGroup(REPEATED)
> // "array" is the name chosen by parquet-hive (1.7.0 and prior version)
> .addField(convertField(StructField("array", elementType, nullable)))
> .named("bag"))
>   .named(field.name) {code}
> the comment above is misleading since Hive always uses "array_element" as the 
> name.
> It is imported by this PR [https://github.com/apache/spark/pull/14399] and 
> relates to this issue https://issues.apache.org/jira/browse/SPARK-16777
> Furthermore, the parquet-hive module has been removed from the parquet-mr 
> project https://issues.apache.org/jira/browse/PARQUET-1676 
> I suggest removing this piece of comment and will submit a PR later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45302) Remove PID communication between Python workers when no demon is used

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45302.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43087
[https://github.com/apache/spark/pull/43087]

> Remove PID communication between Python workers when no demon is used
> -
>
> Key: SPARK-45302
> URL: https://issues.apache.org/jira/browse/SPARK-45302
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We don't need to send the PID around when JDK 9+ is used because we can get 
> the API directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45302) Remove PID communication between Python workers when no demon is used

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45302:


Assignee: Hyukjin Kwon

> Remove PID communication between Python workers when no demon is used
> -
>
> Key: SPARK-45302
> URL: https://issues.apache.org/jira/browse/SPARK-45302
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We don't need to send the PID around when JDK 9+ is used because we can get 
> the API directly.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45282) Join loses records for cached datasets

2023-09-26 Thread XiDuo You (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769383#comment-17769383
 ] 

XiDuo You commented on SPARK-45282:
---

I can not re-produce this issue in master branch (4.0.0), [~koert] have you 
tried master branch ?

> Join loses records for cached datasets
> --
>
> Key: SPARK-45282
> URL: https://issues.apache.org/jira/browse/SPARK-45282
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
> Environment: spark 3.4.1 on apache hadoop 3.3.6 or kubernetes 1.26 or 
> databricks 13.3
>Reporter: koert kuipers
>Priority: Major
>  Labels: CorrectnessBug, correctness
>
> we observed this issue on spark 3.4.1 but it is also present on 3.5.0. it is 
> not present on spark 3.3.1.
> it only shows up in distributed environment. i cannot replicate in unit test. 
> however i did get it to show up on hadoop cluster, kubernetes, and on 
> databricks 13.3
> the issue is that records are dropped when two cached dataframes are joined. 
> it seems in spark 3.4.1 in queryplan some Exchanges are dropped as an 
> optimization while in spark 3.3.1 these Exhanges are still present. it seems 
> to be an issue with AQE with canChangeCachedPlanOutputPartitioning=true.
> to reproduce on distributed cluster these settings needed are:
> {code:java}
> spark.sql.adaptive.advisoryPartitionSizeInBytes 33554432
> spark.sql.adaptive.coalescePartitions.parallelismFirst false
> spark.sql.adaptive.enabled true
> spark.sql.optimizer.canChangeCachedPlanOutputPartitioning true {code}
> code using scala to reproduce is:
> {code:java}
> import java.util.UUID
> import org.apache.spark.sql.functions.col
> import spark.implicits._
> val data = (1 to 100).toDS().map(i => 
> UUID.randomUUID().toString).persist()
> val left = data.map(k => (k, 1))
> val right = data.map(k => (k, k)) // if i change this to k => (k, 1) it works!
> println("number of left " + left.count())
> println("number of right " + right.count())
> println("number of (left join right) " +
>   left.toDF("key", "value1").join(right.toDF("key", "value2"), "key").count()
> )
> val left1 = left
>   .toDF("key", "value1")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of left1 " + left1.count())
> val right1 = right
>   .toDF("key", "value2")
>   .repartition(col("key")) // comment out this line to make it work
>   .persist()
> println("number of right1 " + right1.count())
> println("number of (left1 join right1) " +  left1.join(right1, 
> "key").count()) // this gives incorrect result{code}
> this produces the following output:
> {code:java}
> number of left 100
> number of right 100
> number of (left join right) 100
> number of left1 100
> number of right1 100
> number of (left1 join right1) 859531 {code}
> note that the last number (the incorrect one) actually varies depending on 
> settings and cluster size etc.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45348) Make the Maven build in GitHub Action check "javadoc:javadoc".

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45348:


 Summary: Make the Maven build in GitHub Action check 
"javadoc:javadoc".
 Key: SPARK-45348
 URL: https://issues.apache.org/jira/browse/SPARK-45348
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45339) Pyspark should log errors it retries

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45339.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43127
[https://github.com/apache/spark/pull/43127]

> Pyspark should log errors it retries
> 
>
> Key: SPARK-45339
> URL: https://issues.apache.org/jira/browse/SPARK-45339
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45339) Pyspark should log errors it retries

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45339:


Assignee: Alice Sayutina

> Pyspark should log errors it retries
> 
>
> Key: SPARK-45339
> URL: https://issues.apache.org/jira/browse/SPARK-45339
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43662) Enable ReshapeParityTests.test_merge_asof

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43662:
---
Labels: pull-request-available  (was: )

> Enable ReshapeParityTests.test_merge_asof
> -
>
> Key: SPARK-43662
> URL: https://issues.apache.org/jira/browse/SPARK-43662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Enable ReshapeParityTests.test_merge_asof



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45328) Remove Hive support prior to 2.0.0

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45328:


Assignee: Hyukjin Kwon

> Remove Hive support prior to 2.0.0
> --
>
> Key: SPARK-45328
> URL: https://issues.apache.org/jira/browse/SPARK-45328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> They don't support JDK 17, and we can't make it supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45328) Remove Hive support prior to 2.0.0

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45328.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43116
[https://github.com/apache/spark/pull/43116]

> Remove Hive support prior to 2.0.0
> --
>
> Key: SPARK-45328
> URL: https://issues.apache.org/jira/browse/SPARK-45328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> They don't support JDK 17, and we can't make it supported.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45347) Include SparkThrowable in FetchErrorDetailsResponse

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45347?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45347:
---
Labels: pull-request-available  (was: )

> Include SparkThrowable in FetchErrorDetailsResponse
> ---
>
> Key: SPARK-45347
> URL: https://issues.apache.org/jira/browse/SPARK-45347
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yihong He
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45347) Include SparkThrowable in FetchErrorDetailsResponse

2023-09-26 Thread Yihong He (Jira)
Yihong He created SPARK-45347:
-

 Summary: Include SparkThrowable in FetchErrorDetailsResponse
 Key: SPARK-45347
 URL: https://issues.apache.org/jira/browse/SPARK-45347
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 4.0.0
Reporter: Yihong He






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-26 Thread Thomas Graves (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thomas Graves updated SPARK-44940:
--
Fix Version/s: 3.5.0
   (was: 3.5.1)

> Improve performance of JSON parsing when 
> "spark.sql.json.enablePartialResults" is enabled
> -
>
> Key: SPARK-44940
> URL: https://issues.apache.org/jira/browse/SPARK-44940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.2, 3.5.0
>
>
> Follow-up on https://issues.apache.org/jira/browse/SPARK-40646.
> I found that JSON parsing is significantly slower due to exception creation 
> in control flow. Also, some fields are not parsed correctly and the exception 
> is thrown in certain cases: 
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590)
>   ... 39 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44940) Improve performance of JSON parsing when "spark.sql.json.enablePartialResults" is enabled

2023-09-26 Thread Thomas Graves (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44940?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769338#comment-17769338
 ] 

Thomas Graves commented on SPARK-44940:
---

 I noticed this went into 3.5.0  
([https://github.com/apache/spark/commits/v3.5.0)] so updating the fixed 
versions.

> Improve performance of JSON parsing when 
> "spark.sql.json.enablePartialResults" is enabled
> -
>
> Key: SPARK-44940
> URL: https://issues.apache.org/jira/browse/SPARK-44940
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0, 4.0.0
>Reporter: Ivan Sadikov
>Assignee: Ivan Sadikov
>Priority: Major
>  Labels: correctness, pull-request-available
> Fix For: 3.4.2, 3.5.1
>
>
> Follow-up on https://issues.apache.org/jira/browse/SPARK-40646.
> I found that JSON parsing is significantly slower due to exception creation 
> in control flow. Also, some fields are not parsed correctly and the exception 
> is thrown in certain cases: 
> {code:java}
> Caused by: java.lang.ClassCastException: 
> org.apache.spark.sql.catalyst.util.GenericArrayData cannot be cast to 
> org.apache.spark.sql.catalyst.InternalRow
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.BaseGenericInternalRow.getStruct$(rows.scala:51)
>   at 
> org.apache.spark.sql.catalyst.expressions.GenericInternalRow.getStruct(rows.scala:195)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
>   at scala.collection.Iterator$$anon$10.next(Iterator.scala:461)
>   at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1$$anon$2.getNext(FileScanRDD.scala:590)
>   ... 39 more {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44442) Drop mesos support

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-2?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-2:
---
Labels: pull-request-available  (was: )

> Drop mesos support
> --
>
> Key: SPARK-2
> URL: https://issues.apache.org/jira/browse/SPARK-2
> Project: Spark
>  Issue Type: Sub-task
>  Components: Mesos
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> [https://spark.apache.org/docs/latest/running-on-mesos.html]
>  
> {_}Note{_}: Apache Mesos support is deprecated as of Apache Spark 3.2.0. It 
> will be removed in a future version.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44034) Add a new test group for sql module

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44034:
---
Labels: pull-request-available  (was: )

> Add a new test group for sql module
> ---
>
> Key: SPARK-44034
> URL: https://issues.apache.org/jira/browse/SPARK-44034
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45346) Parquet schema inference should respect case sensitive flag when merging schema

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45346:
---
Labels: pull-request-available  (was: )

> Parquet schema inference should respect case sensitive flag when merging 
> schema
> ---
>
> Key: SPARK-45346
> URL: https://issues.apache.org/jira/browse/SPARK-45346
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45346) Parquet schema inference should respect case sensitive flag when merging schema

2023-09-26 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-45346:
---

 Summary: Parquet schema inference should respect case sensitive 
flag when merging schema
 Key: SPARK-45346
 URL: https://issues.apache.org/jira/browse/SPARK-45346
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0, 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45345) Refactor release-build.sh

2023-09-26 Thread Yang Jie (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45345?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769278#comment-17769278
 ] 

Yang Jie commented on SPARK-45345:
--

Currently, I'm not familiar enough with this, so I'm not sure if it's necessary 
to refactor `release-build.sh` for Spark 4.0.

> Refactor release-build.sh
> -
>
> Key: SPARK-45345
> URL: https://issues.apache.org/jira/browse/SPARK-45345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+

2023-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-44366.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43075
[https://github.com/apache/spark/pull/43075]

> Migrate antlr4 from 4.9 to 4.10+
> 
>
> Key: SPARK-44366
> URL: https://issues.apache.org/jira/browse/SPARK-44366
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44366) Migrate antlr4 from 4.9 to 4.10+

2023-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-44366:


Assignee: Yang Jie

> Migrate antlr4 from 4.9 to 4.10+
> 
>
> Key: SPARK-44366
> URL: https://issues.apache.org/jira/browse/SPARK-44366
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-26 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan resolved SPARK-44756.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42426
[https://github.com/apache/spark/pull/42426]

> Executor hangs when RetryingBlockTransferor fails to initiate retry
> ---
>
> Key: SPARK-44756
> URL: https://issues.apache.org/jira/browse/SPARK-44756
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.3.1
>Reporter: Harunobu Daikoku
>Assignee: Harunobu Daikoku
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We have been observing this issue several times in our production where some 
> executors are being stuck at BlockTransferService#fetchBlockSync().
> After some investigation, the issue seems to be caused by an unhandled edge 
> case in RetryingBlockTransferor.
> 1. Shuffle transfer fails for whatever reason
> {noformat}
> java.io.IOException: Cannot allocate memory
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>   at 
> org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340)
>   at 
> org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> {noformat}
> 2. The above exception caught by 
> [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381],
>  and propagated to the exception handler
> 3. Exception reaches 
> [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180],
>  and it tries to initiate retry
> {noformat}
> 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying 
> fetch (1/3) for 1 outstanding blocks after 5000 ms
> {noformat}
> 4. Retry initiation fails (in our case, it fails to create a new thread)
> 5. Exception caught by 
> [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309],
>  and not further processed
> {noformat}
> 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An 
> exception java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:719)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357)
>   at 
> org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56)
>   at 
> 

[jira] [Assigned] (SPARK-44756) Executor hangs when RetryingBlockTransferor fails to initiate retry

2023-09-26 Thread Mridul Muralidharan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44756?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mridul Muralidharan reassigned SPARK-44756:
---

Assignee: Harunobu Daikoku

> Executor hangs when RetryingBlockTransferor fails to initiate retry
> ---
>
> Key: SPARK-44756
> URL: https://issues.apache.org/jira/browse/SPARK-44756
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle, Spark Core
>Affects Versions: 3.3.1
>Reporter: Harunobu Daikoku
>Assignee: Harunobu Daikoku
>Priority: Minor
>  Labels: pull-request-available
>
> We have been observing this issue several times in our production where some 
> executors are being stuck at BlockTransferService#fetchBlockSync().
> After some investigation, the issue seems to be caused by an unhandled edge 
> case in RetryingBlockTransferor.
> 1. Shuffle transfer fails for whatever reason
> {noformat}
> java.io.IOException: Cannot allocate memory
>   at sun.nio.ch.FileDispatcherImpl.write0(Native Method)
>   at sun.nio.ch.FileDispatcherImpl.write(FileDispatcherImpl.java:60)
>   at sun.nio.ch.IOUtil.writeFromNativeBuffer(IOUtil.java:93)
>   at sun.nio.ch.IOUtil.write(IOUtil.java:51)
>   at sun.nio.ch.FileChannelImpl.write(FileChannelImpl.java:211)
>   at 
> org.apache.spark.network.shuffle.SimpleDownloadFile$SimpleDownloadWritableChannel.write(SimpleDownloadFile.java:78)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onData(OneForOneBlockFetcher.java:340)
>   at 
> org.apache.spark.network.client.StreamInterceptor.handle(StreamInterceptor.java:79)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.feedInterceptor(TransportFrameDecoder.java:263)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.channelRead(TransportFrameDecoder.java:87)
>   at 
> io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:379)
> {noformat}
> 2. The above exception caught by 
> [AbstractChannelHandlerContext#invokeChannelRead()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L381],
>  and propagated to the exception handler
> 3. Exception reaches 
> [RetryingBlockTransferor#initiateRetry()|https://github.com/apache/spark/blob/v3.3.1/common/network-shuffle/src/main/java/org/apache/spark/network/shuffle/RetryingBlockTransferor.java#L178-L180],
>  and it tries to initiate retry
> {noformat}
> 23/08/09 16:58:37 shuffle-client-4-2 INFO RetryingBlockTransferor: Retrying 
> fetch (1/3) for 1 outstanding blocks after 5000 ms
> {noformat}
> 4. Retry initiation fails (in our case, it fails to create a new thread)
> 5. Exception caught by 
> [AbstractChannelHandlerContext#invokeExceptionCaught()|https://github.com/netty/netty/blob/netty-4.1.74.Final/transport/src/main/java/io/netty/channel/AbstractChannelHandlerContext.java#L305-L309],
>  and not further processed
> {noformat}
> 23/08/09 16:58:53 shuffle-client-4-2 DEBUG AbstractChannelHandlerContext: An 
> exception java.lang.OutOfMemoryError: unable to create new native thread
>   at java.lang.Thread.start0(Native Method)
>   at java.lang.Thread.start(Thread.java:719)
>   at 
> java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:957)
>   at 
> java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1378)
>   at 
> java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.initiateRetry(RetryingBlockTransferor.java:182)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor.access$500(RetryingBlockTransferor.java:43)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.handleBlockTransferFailure(RetryingBlockTransferor.java:230)
>   at 
> org.apache.spark.network.shuffle.RetryingBlockTransferor$RetryingBlockTransferListener.onBlockFetchFailure(RetryingBlockTransferor.java:260)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher.failRemainingBlocks(OneForOneBlockFetcher.java:318)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher.access$300(OneForOneBlockFetcher.java:55)
>   at 
> org.apache.spark.network.shuffle.OneForOneBlockFetcher$DownloadCallback.onFailure(OneForOneBlockFetcher.java:357)
>   at 
> org.apache.spark.network.client.StreamInterceptor.exceptionCaught(StreamInterceptor.java:56)
>   at 
> org.apache.spark.network.util.TransportFrameDecoder.exceptionCaught(TransportFrameDecoder.java:231)
>   at 
> 

[jira] [Updated] (SPARK-45344) Remove all scala version string check

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45344:
---
Labels: pull-request-available  (was: )

> Remove all scala version string check
> -
>
> Key: SPARK-45344
> URL: https://issues.apache.org/jira/browse/SPARK-45344
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45343) CSV multiLine documentation is confusing

2023-09-26 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-45343:
-
Priority: Trivial  (was: Major)

> CSV multiLine documentation is confusing
> 
>
> Key: SPARK-45343
> URL: https://issues.apache.org/jira/browse/SPARK-45343
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Bill Schneider
>Priority: Trivial
>  Labels: pull-request-available
>
> This is confusing, maybe copy-paste from JSON:
> |Parse one record, which may span multiple lines, per file. CSV built-in 
> functions ignore this option.|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45343) CSV multiLine documentation is confusing

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45343?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45343:
---
Labels: pull-request-available  (was: )

> CSV multiLine documentation is confusing
> 
>
> Key: SPARK-45343
> URL: https://issues.apache.org/jira/browse/SPARK-45343
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Bill Schneider
>Priority: Major
>  Labels: pull-request-available
>
> This is confusing, maybe copy-paste from JSON:
> |Parse one record, which may span multiple lines, per file. CSV built-in 
> functions ignore this option.|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45345) Refactor release-build.sh

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45345:


 Summary: Refactor release-build.sh
 Key: SPARK-45345
 URL: https://issues.apache.org/jira/browse/SPARK-45345
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45344) Remove all scala version string check

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45344:


 Summary: Remove all scala version string check
 Key: SPARK-45344
 URL: https://issues.apache.org/jira/browse/SPARK-45344
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45217) Support change log level of specific package or class

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45217?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45217:
---
Labels: pull-request-available  (was: )

> Support change log level of specific package or class
> -
>
> Key: SPARK-45217
> URL: https://issues.apache.org/jira/browse/SPARK-45217
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Zhongwei Zhu
>Priority: Minor
>  Labels: pull-request-available
>
> Add SparkContext.setLogLevel(loggerName: String, logLevel: String) to support 
> change log level of specific package or class



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45343) CSV multiLine documentation is confusing

2023-09-26 Thread Bill Schneider (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45343?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17769236#comment-17769236
 ] 

Bill Schneider commented on SPARK-45343:


PR:https://github.com/apache/spark/pull/43132

> CSV multiLine documentation is confusing
> 
>
> Key: SPARK-45343
> URL: https://issues.apache.org/jira/browse/SPARK-45343
> Project: Spark
>  Issue Type: Documentation
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: Bill Schneider
>Priority: Major
>
> This is confusing, maybe copy-paste from JSON:
> |Parse one record, which may span multiple lines, per file. CSV built-in 
> functions ignore this option.|
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45316) Respect `spark.sql.files.ignoreMissingFiles` in HadoopRDD

2023-09-26 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45316.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43097
[https://github.com/apache/spark/pull/43097]

> Respect `spark.sql.files.ignoreMissingFiles` in HadoopRDD
> -
>
> Key: SPARK-45316
> URL: https://issues.apache.org/jira/browse/SPARK-45316
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, the SQL config spark.sql.files.ignoreMissingFiles influences on 
> RDDs created in Spark SQL such as FileScanRDD but doesn't impact on HadoopRDD 
> and NewHadoopRDD. The last RDDs have separate core config 
> spark.files.ignoreMissingFiles. That inconsistency might confuse users if 
> they don't know implementation details. This ticket aims to eliminate the 
> inconsistency.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45343) CSV multiLine documentation is confusing

2023-09-26 Thread Bill Schneider (Jira)
Bill Schneider created SPARK-45343:
--

 Summary: CSV multiLine documentation is confusing
 Key: SPARK-45343
 URL: https://issues.apache.org/jira/browse/SPARK-45343
 Project: Spark
  Issue Type: Documentation
  Components: Spark Core
Affects Versions: 3.5.0
Reporter: Bill Schneider


This is confusing, maybe copy-paste from JSON:
|Parse one record, which may span multiple lines, per file. CSV built-in 
functions ignore this option.|

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45342) Remove the scala doc compilation option specific to Scala 2.12.

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45342:


 Summary: Remove the scala doc compilation option specific to Scala 
2.12.
 Key: SPARK-45342
 URL: https://issues.apache.org/jira/browse/SPARK-45342
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45341) Make the sbt doc command execute successfully with Java 17

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45341:
---
Labels: pull-request-available  (was: )

> Make the sbt doc command execute successfully with Java 17
> --
>
> Key: SPARK-45341
> URL: https://issues.apache.org/jira/browse/SPARK-45341
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/Picked up 
> JAVA_TOOL_OPTIONS:-Duser.language=en
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVIndex.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBTypeInfo.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/UnsupportedStoreVersionException.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreIterator.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreView.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java...
> [error] Loading source file 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSerializer.java...
> [error] Constructing Javadoc information...
> [error] Building index for all the packages and classes...
> [error] Standard Doclet version 17.0.8+7-LTS
> [error] Building tree for all the packages and classes...
> [error] 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java:32:1:
>   error: heading used out of sequence: , compared to implicit preceding 
> heading: 
> [error]  * Serialization
> [error]    ^Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/InMemoryStore.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVIndex.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStore.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreIterator.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreSerializer.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreView.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVTypeInfo.html...
> [error] Generating 
> /Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/LevelDB.html...
> [error] 

[jira] [Updated] (SPARK-45341) Make the sbt doc command execute successfully with Java 17

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45341?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45341:
-
Description: 
{code:java}
[error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/Picked up 
JAVA_TOOL_OPTIONS:-Duser.language=en
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVIndex.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBTypeInfo.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/UnsupportedStoreVersionException.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreIterator.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreView.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSerializer.java...
[error] Constructing Javadoc information...
[error] Building index for all the packages and classes...
[error] Standard Doclet version 17.0.8+7-LTS
[error] Building tree for all the packages and classes...
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java:32:1:
  error: heading used out of sequence: , compared to implicit preceding 
heading: 
[error]  * Serialization
[error]    ^Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/InMemoryStore.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVIndex.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStore.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreIterator.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreSerializer.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreView.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVTypeInfo.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/LevelDB.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/LevelDB.TypeAliases.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/RocksDB.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/RocksDB.TypeAliases.html...
[error] Generating 

[jira] [Created] (SPARK-45341) Make the sbt doc command execute successfully with Java 17

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45341:


 Summary: Make the sbt doc command execute successfully with Java 17
 Key: SPARK-45341
 URL: https://issues.apache.org/jira/browse/SPARK-45341
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie


{code:java}
[error] /Users/yangjie01/SourceCode/git/spark-mine-sbt/Picked up 
JAVA_TOOL_OPTIONS:-Duser.language=en
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBTypeInfo.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/ArrayWrappers.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVIndex.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/InMemoryStore.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDBIterator.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDB.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBTypeInfo.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/UnsupportedStoreVersionException.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/LevelDB.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreIterator.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreView.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVTypeInfo.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/RocksDBIterator.java...
[error] Loading source file 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStoreSerializer.java...
[error] Constructing Javadoc information...
[error] Building index for all the packages and classes...
[error] Standard Doclet version 17.0.8+7-LTS
[error] Building tree for all the packages and classes...
[error] 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/src/main/java/org/apache/spark/util/kvstore/KVStore.java:32:1:
  error: heading used out of sequence: , compared to implicit preceding 
heading: 
[error]  * Serialization
[error]    ^Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/InMemoryStore.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVIndex.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStore.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreIterator.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreSerializer.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVStoreView.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/KVTypeInfo.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/LevelDB.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/LevelDB.TypeAliases.html...
[error] Generating 
/Users/yangjie01/SourceCode/git/spark-mine-sbt/common/kvstore/target/scala-2.13/api/org/apache/spark/util/kvstore/RocksDB.html...
[error] Generating 

[jira] [Updated] (SPARK-45340) Remove the SQL config spark.sql.hive.verifyPartitionPath

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45340?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45340:
---
Labels: pull-request-available  (was: )

> Remove the SQL config spark.sql.hive.verifyPartitionPath
> 
>
> Key: SPARK-45340
> URL: https://issues.apache.org/jira/browse/SPARK-45340
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> The SQL config spark.sql.hive.verifyPartitionPath has been deprecated a quite 
> a while in version 3.0. Can be removed in the version 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45340) Remove the SQL config spark.sql.hive.verifyPartitionPath

2023-09-26 Thread Max Gekk (Jira)
Max Gekk created SPARK-45340:


 Summary: Remove the SQL config spark.sql.hive.verifyPartitionPath
 Key: SPARK-45340
 URL: https://issues.apache.org/jira/browse/SPARK-45340
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Max Gekk
Assignee: Max Gekk


The SQL config spark.sql.hive.verifyPartitionPath has been deprecated a quite a 
while in version 3.0. Can be removed in the version 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43850) Cleanup unused imports related suppression rules for Scala 2.13

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-43850:
-
Parent: SPARK-45314
Issue Type: Sub-task  (was: Improvement)

> Cleanup unused imports related suppression rules for Scala 2.13
> ---
>
> Key: SPARK-43850
> URL: https://issues.apache.org/jira/browse/SPARK-43850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43850) Remove the import for scala.language.higherKinds and delete the corresponding suppression rule

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-43850:
-
Summary: Remove the import for scala.language.higherKinds and delete the 
corresponding suppression rule  (was: Cleanup unused imports related 
suppression rules for Scala 2.13)

> Remove the import for scala.language.higherKinds and delete the corresponding 
> suppression rule
> --
>
> Key: SPARK-43850
> URL: https://issues.apache.org/jira/browse/SPARK-43850
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43850) Cleanup unused imports related suppression rules for Scala 2.13

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43850:
---
Labels: pull-request-available  (was: )

> Cleanup unused imports related suppression rules for Scala 2.13
> ---
>
> Key: SPARK-43850
> URL: https://issues.apache.org/jira/browse/SPARK-43850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43850) Cleanup unused imports related suppression rules for Scala 2.13

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43850?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-43850:
-
Affects Version/s: 4.0.0
   (was: 3.5.0)

> Cleanup unused imports related suppression rules for Scala 2.13
> ---
>
> Key: SPARK-43850
> URL: https://issues.apache.org/jira/browse/SPARK-43850
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45271) Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors

2023-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-45271:
---

Assignee: BingKun Pan

> Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused 
> method in QueryCompilationErrors
> 
>
> Key: SPARK-45271
> URL: https://issues.apache.org/jira/browse/SPARK-45271
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45271) Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused method in QueryCompilationErrors

2023-09-26 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45271?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45271.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43044
[https://github.com/apache/spark/pull/43044]

> Merge _LEGACY_ERROR_TEMP_1113 into TABLE_OPERATION & delete some unused 
> method in QueryCompilationErrors
> 
>
> Key: SPARK-45271
> URL: https://issues.apache.org/jira/browse/SPARK-45271
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45339) Pyspark should log errors it retries

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45339?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45339:
---
Labels: pull-request-available  (was: )

> Pyspark should log errors it retries
> 
>
> Key: SPARK-45339
> URL: https://issues.apache.org/jira/browse/SPARK-45339
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45339) Pyspark should log errors it retries

2023-09-26 Thread Alice Sayutina (Jira)
Alice Sayutina created SPARK-45339:
--

 Summary: Pyspark should log errors it retries
 Key: SPARK-45339
 URL: https://issues.apache.org/jira/browse/SPARK-45339
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Alice Sayutina






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45309) Remove all SystemUtils.isJavaVersionAtLeast with JDK 9

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45309:


Assignee: Hyukjin Kwon

> Remove all SystemUtils.isJavaVersionAtLeast with JDK 9
> --
>
> Key: SPARK-45309
> URL: https://issues.apache.org/jira/browse/SPARK-45309
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> We use JDK 11+ so we can remove all Java 9+ conditions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45309) Remove all SystemUtils.isJavaVersionAtLeast with JDK 9

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45309.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43098
[https://github.com/apache/spark/pull/43098]

> Remove all SystemUtils.isJavaVersionAtLeast with JDK 9
> --
>
> Key: SPARK-45309
> URL: https://issues.apache.org/jira/browse/SPARK-45309
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We use JDK 11+ so we can remove all Java 9+ conditions



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45323) Upgrade snappy to 1.1.10.4

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45323.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43109
[https://github.com/apache/spark/pull/43109]

> Upgrade snappy to 1.1.10.4
> --
>
> Key: SPARK-45323
> URL: https://issues.apache.org/jira/browse/SPARK-45323
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Security Fix
> Fixed SnappyInputStream so as not to allocate too large memory when 
> decompressing data with an extremely large chunk size by @​tunnelshade (code 
> change)
> This does not affect users only using Snappy.compress/uncompress methods



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45323) Upgrade snappy to 1.1.10.4

2023-09-26 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45323:


Assignee: Bjørn Jørgensen

> Upgrade snappy to 1.1.10.4
> --
>
> Key: SPARK-45323
> URL: https://issues.apache.org/jira/browse/SPARK-45323
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Build
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Bjørn Jørgensen
>Assignee: Bjørn Jørgensen
>Priority: Major
>  Labels: pull-request-available
>
> Security Fix
> Fixed SnappyInputStream so as not to allocate too large memory when 
> decompressing data with an extremely large chunk size by @​tunnelshade (code 
> change)
> This does not affect users only using Snappy.compress/uncompress methods



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45338) Remove scala.collection.JavaConverters

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45338?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45338:
---
Labels: pull-request-available  (was: )

> Remove scala.collection.JavaConverters
> --
>
> Key: SPARK-45338
> URL: https://issues.apache.org/jira/browse/SPARK-45338
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, SQL
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Remove deprecated scala.collection.JavaConverters, replaced by 
> scala.jdk.CollectionConverters



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45337) Refactor `AbstractCommandBuilder#getScalaVersion` to remove the check for Scala 2.12.

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45337?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45337:
---
Labels: pull-request-available  (was: )

> Refactor `AbstractCommandBuilder#getScalaVersion`  to remove the check for 
> Scala 2.12.
> --
>
> Key: SPARK-45337
> URL: https://issues.apache.org/jira/browse/SPARK-45337
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45338) Remove scala.collection.JavaConverters

2023-09-26 Thread Jia Fan (Jira)
Jia Fan created SPARK-45338:
---

 Summary: Remove scala.collection.JavaConverters
 Key: SPARK-45338
 URL: https://issues.apache.org/jira/browse/SPARK-45338
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, SQL
Affects Versions: 4.0.0
Reporter: Jia Fan


Remove deprecated scala.collection.JavaConverters, replaced by 
scala.jdk.CollectionConverters



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45313) Inline `Iterators#size` and remove `Iterators.scala`

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45313?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-45313:
-
Parent: SPARK-45314
Issue Type: Sub-task  (was: Improvement)

> Inline `Iterators#size` and remove `Iterators.scala`
> 
>
> Key: SPARK-45313
> URL: https://issues.apache.org/jira/browse/SPARK-45313
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45337) Refactor `AbstractCommandBuilder#getScalaVersion` to remove the check for Scala 2.12.

2023-09-26 Thread Yang Jie (Jira)
Yang Jie created SPARK-45337:


 Summary: Refactor `AbstractCommandBuilder#getScalaVersion`  to 
remove the check for Scala 2.12.
 Key: SPARK-45337
 URL: https://issues.apache.org/jira/browse/SPARK-45337
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45321) Clean up the unnecessary Scala 2.12 related binary files.

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-45321.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43106
[https://github.com/apache/spark/pull/43106]

> Clean up the unnecessary Scala 2.12 related binary files.
> -
>
> Key: SPARK-45321
> URL: https://issues.apache.org/jira/browse/SPARK-45321
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45321) Clean up the unnecessary Scala 2.12 related binary files.

2023-09-26 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-45321:


Assignee: Yang Jie

> Clean up the unnecessary Scala 2.12 related binary files.
> -
>
> Key: SPARK-45321
> URL: https://issues.apache.org/jira/browse/SPARK-45321
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45336) Update the Oracle docker image version used for test and integration to use Oracle Database 23c Free

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45336:
---
Labels: pull-request-available  (was: )

> Update the Oracle docker image version used for test and integration to use 
> Oracle Database 23c Free
> 
>
> Key: SPARK-45336
> URL: https://issues.apache.org/jira/browse/SPARK-45336
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.5.0
>Reporter: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>
> This proposes to update the Docker image used for integration tests and 
> builds to Oracle Database 23c Free.
> The Docker image used for integration tests and builds currently uses Oracle 
> XE version 21.3.0. Oracle 21 support ends in April 2024. The latest Oracle 
> release is 23c, it is a long-term release supported till 2032. With Oracle 
> 23c, Oracle has changed the name of the free version of its database, from 
> Oracle XE (Express Edition) to Oracle Database Free.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45310) Mapstatus location type changed from external shuffle service to executor after decommission migration

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45310:
--

Assignee: (was: Apache Spark)

> Mapstatus location type changed from external shuffle service to executor 
> after decommission migration
> --
>
> Key: SPARK-45310
> URL: https://issues.apache.org/jira/browse/SPARK-45310
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: wuyi
>Priority: Major
>  Labels: pull-request-available
>
> When migrating shuffle blocks during decommission, the updated mapstatus 
> location doesn't respect the external shuffle service location when external 
> shuffle service is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45310) Mapstatus location type changed from external shuffle service to executor after decommission migration

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45310:
--

Assignee: Apache Spark

> Mapstatus location type changed from external shuffle service to executor 
> after decommission migration
> --
>
> Key: SPARK-45310
> URL: https://issues.apache.org/jira/browse/SPARK-45310
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.3, 3.1.3, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> When migrating shuffle blocks during decommission, the updated mapstatus 
> location doesn't respect the external shuffle service location when external 
> shuffle service is enabled.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45336) Update the Oracle docker image version used for test and integration to use Oracle Database 23c Free

2023-09-26 Thread Luca Canali (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45336?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Luca Canali updated SPARK-45336:

Description: 
This proposes to update the Docker image used for integration tests and builds 
to Oracle Database 23c Free.

The Docker image used for integration tests and builds currently uses Oracle XE 
version 21.3.0. Oracle 21 support ends in April 2024. The latest Oracle release 
is 23c, it is a long-term release supported till 2032. With Oracle 23c, Oracle 
has changed the name of the free version of its database, from Oracle XE 
(Express Edition) to Oracle Database Free.

 

  was:
This proposes to update the Docker image used for integration tests and builds 
to Oracle Database 23c Free.

The Docker image used for integration tests and builds currently uses Oracle XE 
version 21.3.0. Oracle 21 support ends in April 2024. The latest Oracle release 
is 23c, it is a long-term release support till 2032. With Oracle 23c, Oracle 
has changed the name of the free version of its database, from Oracle XE 
(Express Edition) to Oracle Database Free.

 


> Update the Oracle docker image version used for test and integration to use 
> Oracle Database 23c Free
> 
>
> Key: SPARK-45336
> URL: https://issues.apache.org/jira/browse/SPARK-45336
> Project: Spark
>  Issue Type: Improvement
>  Components: Tests
>Affects Versions: 3.5.0
>Reporter: Luca Canali
>Priority: Minor
>
> This proposes to update the Docker image used for integration tests and 
> builds to Oracle Database 23c Free.
> The Docker image used for integration tests and builds currently uses Oracle 
> XE version 21.3.0. Oracle 21 support ends in April 2024. The latest Oracle 
> release is 23c, it is a long-term release supported till 2032. With Oracle 
> 23c, Oracle has changed the name of the free version of its database, from 
> Oracle XE (Express Edition) to Oracle Database Free.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45336) Update the Oracle docker image version used for test and integration to use Oracle Database 23c Free

2023-09-26 Thread Luca Canali (Jira)
Luca Canali created SPARK-45336:
---

 Summary: Update the Oracle docker image version used for test and 
integration to use Oracle Database 23c Free
 Key: SPARK-45336
 URL: https://issues.apache.org/jira/browse/SPARK-45336
 Project: Spark
  Issue Type: Improvement
  Components: Tests
Affects Versions: 3.5.0
Reporter: Luca Canali


This proposes to update the Docker image used for integration tests and builds 
to Oracle Database 23c Free.

The Docker image used for integration tests and builds currently uses Oracle XE 
version 21.3.0. Oracle 21 support ends in April 2024. The latest Oracle release 
is 23c, it is a long-term release support till 2032. With Oracle 23c, Oracle 
has changed the name of the free version of its database, from Oracle XE 
(Express Edition) to Oracle Database Free.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45335) Correct the group of `ElementAt` and `TryElementAt`

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45335:
---
Labels: pull-request-available  (was: )

> Correct the group of `ElementAt` and `TryElementAt`
> ---
>
> Key: SPARK-45335
> URL: https://issues.apache.org/jira/browse/SPARK-45335
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45335) Correct the group of `ElementAt` and `TryElementAt`

2023-09-26 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-45335:
-

 Summary: Correct the group of `ElementAt` and `TryElementAt`
 Key: SPARK-45335
 URL: https://issues.apache.org/jira/browse/SPARK-45335
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43620) Support `Column` for SparkConnectColumn.__getitem__

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43620:
---
Labels: pull-request-available  (was: )

> Support `Column` for SparkConnectColumn.__getitem__
> ---
>
> Key: SPARK-43620
> URL: https://issues.apache.org/jira/browse/SPARK-43620
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> {code:java}
> pser = pd.Series(["a", "b", "c"])
> psser = ps.from_pandas(pser)
> psser.astype("category")  # internally calls 
> `map_scol[self.spark.column]`{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45334) Remove misleading comment in parquetSchemaConverter

2023-09-26 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45334?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45334:
---
Labels: pull-request-available  (was: )

> Remove misleading comment in parquetSchemaConverter
> ---
>
> Key: SPARK-45334
> URL: https://issues.apache.org/jira/browse/SPARK-45334
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Mengran Lan
>Priority: Trivial
>  Labels: pull-request-available
>
> I'm debugging a parquet issue and reading spark code as references. Happened 
> to find a misleading comment which remains in the latest version as well.
> {code:java}
> Types
>   .buildGroup(repetition).as(LogicalTypeAnnotation.listType())
>   .addField(Types
> .buildGroup(REPEATED)
> // "array" is the name chosen by parquet-hive (1.7.0 and prior version)
> .addField(convertField(StructField("array", elementType, nullable)))
> .named("bag"))
>   .named(field.name) {code}
> the comment above is misleading since Hive always uses "array_element" as the 
> name.
> It is imported by this PR [https://github.com/apache/spark/pull/14399] and 
> relates to this issue https://issues.apache.org/jira/browse/SPARK-16777
> Furthermore, the parquet-hive module has been removed from the parquet-mr 
> project https://issues.apache.org/jira/browse/PARQUET-1676 
> I suggest removing this piece of comment and will submit a PR later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45334) Remove misleading comment in parquetSchemaConverter

2023-09-26 Thread Mengran Lan (Jira)
Mengran Lan created SPARK-45334:
---

 Summary: Remove misleading comment in parquetSchemaConverter
 Key: SPARK-45334
 URL: https://issues.apache.org/jira/browse/SPARK-45334
 Project: Spark
  Issue Type: Documentation
  Components: SQL
Affects Versions: 3.5.0
Reporter: Mengran Lan


I'm debugging a parquet issue and reading spark code as references. Happened to 
find a misleading comment which remains in the latest version as well.
{code:java}
Types
  .buildGroup(repetition).as(LogicalTypeAnnotation.listType())
  .addField(Types
.buildGroup(REPEATED)
// "array" is the name chosen by parquet-hive (1.7.0 and prior version)
.addField(convertField(StructField("array", elementType, nullable)))
.named("bag"))
  .named(field.name) {code}
the comment above is misleading since Hive always uses "array_element" as the 
name.

It is imported by this PR [https://github.com/apache/spark/pull/14399] and 
relates to this issue https://issues.apache.org/jira/browse/SPARK-16777

Furthermore, the parquet-hive module has been removed from the parquet-mr 
project https://issues.apache.org/jira/browse/PARQUET-1676 

I suggest removing this piece of comment and will submit a PR later.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45232) Add missing function groups to SQL references

2023-09-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45232:
-

Assignee: Ruifeng Zheng

> Add missing function groups to SQL references
> -
>
> Key: SPARK-45232
> URL: https://issues.apache.org/jira/browse/SPARK-45232
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45232) Add missing function groups to SQL references

2023-09-26 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45232.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43011
[https://github.com/apache/spark/pull/43011]

> Add missing function groups to SQL references
> -
>
> Key: SPARK-45232
> URL: https://issues.apache.org/jira/browse/SPARK-45232
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org