[jira] [Updated] (SPARK-45421) Catch AnalysisException over InlineCTE

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45421:
---
Labels: pull-request-available  (was: )

> Catch AnalysisException over InlineCTE
> --
>
> Key: SPARK-45421
> URL: https://issues.apache.org/jira/browse/SPARK-45421
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45421) Catch AnalysisException over InlineCTE

2023-10-04 Thread Rui Wang (Jira)
Rui Wang created SPARK-45421:


 Summary: Catch AnalysisException over InlineCTE
 Key: SPARK-45421
 URL: https://issues.apache.org/jira/browse/SPARK-45421
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45420) Add DataType.fromDDL into PySpark

2023-10-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45420:


 Summary: Add DataType.fromDDL into PySpark
 Key: SPARK-45420
 URL: https://issues.apache.org/jira/browse/SPARK-45420
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Same feature as DataTyep.fromDDL in Scala. This is quite used often.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45417) Make InheritableThread inherit the active session

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45417:
---
Labels: pull-request-available  (was: )

> Make InheritableThread inherit the active session
> -
>
> Key: SPARK-45417
> URL: https://issues.apache.org/jira/browse/SPARK-45417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Chungmin Lee
>Priority: Major
>  Labels: pull-request-available
>
> Repro:
> {code:java}
> # repro.py
> from multiprocessing.pool import ThreadPool
> from pyspark import inheritable_thread_target
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName("Test").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> def f(i, spark):
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
>     print(f"{i} local property foo = 
> {spark.sparkContext.getLocalProperty('foo')}")
>     spark = SparkSession.builder.appName("Test").getOrCreate()
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
> pool = ThreadPool(4)
> spark.sparkContext.setLocalProperty("foo", "bar")
> pool.starmap(inheritable_thread_target(f), [(i, spark) for i in 
> range(4)]){code}
> Run as: {{./bin/spark-submit repro.py}}
> {{getOrCreate()}} doesn't set the active session either. The only way is 
> calling the Java function directly: 
> {{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45417) Make InheritableThread inherit the active session

2023-10-04 Thread Chungmin Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chungmin Lee updated SPARK-45417:
-
Description: 
Repro:

{code:java}
# repro.py
from multiprocessing.pool import ThreadPool
from pyspark import inheritable_thread_target
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

def f(i, spark):
    print(f"{i} spark = {spark}")
    print(f"{i} active session = {SparkSession.getActiveSession()}")
    print(f"{i} local property foo = 
{spark.sparkContext.getLocalProperty('foo')}")
    spark = SparkSession.builder.appName("Test").getOrCreate()
    print(f"{i} spark = {spark}")
    print(f"{i} active session = {SparkSession.getActiveSession()}")

pool = ThreadPool(4)
spark.sparkContext.setLocalProperty("foo", "bar")
pool.starmap(inheritable_thread_target(f), [(i, spark) for i in range(4)]){code}

Run as: {{./bin/spark-submit repro.py}}

{{getOrCreate()}} doesn't set the active session either. The only way is 
calling the Java function directly: 
{{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.

 

  was:
Repro:

{code:java}
from multiprocessing.pool import ThreadPool
from pyspark import inheritable_thread_target
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

def f(i, spark):
    print(f"{i} spark = {spark}")
    print(f"{i} active session = {SparkSession.getActiveSession()}")
    print(f"{i} local property foo = 
{spark.sparkContext.getLocalProperty('foo')}")
    spark = SparkSession.builder.appName("Test").getOrCreate()
    print(f"{i} spark = {spark}")
    print(f"{i} active session = {SparkSession.getActiveSession()}")

pool = ThreadPool(4)
spark.sparkContext.setLocalProperty("foo", "bar")
pool.starmap(inheritable_thread_target(f), [(i, spark) for i in range(4)]){code}

{{getOrCreate()}} doesn't set the active session either. The only way is 
calling the Java function directly: 
{{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.

 


> Make InheritableThread inherit the active session
> -
>
> Key: SPARK-45417
> URL: https://issues.apache.org/jira/browse/SPARK-45417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Chungmin Lee
>Priority: Major
>
> Repro:
> {code:java}
> # repro.py
> from multiprocessing.pool import ThreadPool
> from pyspark import inheritable_thread_target
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName("Test").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> def f(i, spark):
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
>     print(f"{i} local property foo = 
> {spark.sparkContext.getLocalProperty('foo')}")
>     spark = SparkSession.builder.appName("Test").getOrCreate()
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
> pool = ThreadPool(4)
> spark.sparkContext.setLocalProperty("foo", "bar")
> pool.starmap(inheritable_thread_target(f), [(i, spark) for i in 
> range(4)]){code}
> Run as: {{./bin/spark-submit repro.py}}
> {{getOrCreate()}} doesn't set the active session either. The only way is 
> calling the Java function directly: 
> {{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45417) Make InheritableThread inherit the active session

2023-10-04 Thread Chungmin Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chungmin Lee updated SPARK-45417:
-
Summary: Make InheritableThread inherit the active session  (was: Make 
InheritableThread inherit active session)

> Make InheritableThread inherit the active session
> -
>
> Key: SPARK-45417
> URL: https://issues.apache.org/jira/browse/SPARK-45417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Chungmin Lee
>Priority: Major
>
> Repro:
> {code:java}
> from multiprocessing.pool import ThreadPool
> from pyspark import inheritable_thread_target
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName("Test").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> def f(i, spark):
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
>     print(f"{i} local property foo = 
> {spark.sparkContext.getLocalProperty('foo')}")
>     spark = SparkSession.builder.appName("Test").getOrCreate()
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
> pool = ThreadPool(4)
> spark.sparkContext.setLocalProperty("foo", "bar")
> pool.starmap(inheritable_thread_target(f), [(i, spark) for i in 
> range(4)]){code}
> {{getOrCreate()}} doesn't set the active session either. The only way is 
> calling the Java function directly: 
> {{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45417) Make InheritableThread inherit active session

2023-10-04 Thread Chungmin Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chungmin Lee updated SPARK-45417:
-
Summary: Make InheritableThread inherit active session  (was: 
InheritableThread doesn't inherit active session)

> Make InheritableThread inherit active session
> -
>
> Key: SPARK-45417
> URL: https://issues.apache.org/jira/browse/SPARK-45417
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Chungmin Lee
>Priority: Major
>
> Repro:
> {code:java}
> from multiprocessing.pool import ThreadPool
> from pyspark import inheritable_thread_target
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.appName("Test").getOrCreate()
> spark.sparkContext.setLogLevel("ERROR")
> def f(i, spark):
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
>     print(f"{i} local property foo = 
> {spark.sparkContext.getLocalProperty('foo')}")
>     spark = SparkSession.builder.appName("Test").getOrCreate()
>     print(f"{i} spark = {spark}")
>     print(f"{i} active session = {SparkSession.getActiveSession()}")
> pool = ThreadPool(4)
> spark.sparkContext.setLocalProperty("foo", "bar")
> pool.starmap(inheritable_thread_target(f), [(i, spark) for i in 
> range(4)]){code}
> {{getOrCreate()}} doesn't set the active session either. The only way is 
> calling the Java function directly: 
> {{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45386.
--
Fix Version/s: 3.5.1
   Resolution: Fixed

Issue resolved by pull request 43213
[https://github.com/apache/spark/pull/43213]

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1
>
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45386:


Assignee: Emil Ejbyfeldt

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Assignee: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45412) Validate the plan and session in DataFrame.__init__

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45412.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43215
[https://github.com/apache/spark/pull/43215]

> Validate the plan and session in DataFrame.__init__
> ---
>
> Key: SPARK-45412
> URL: https://issues.apache.org/jira/browse/SPARK-45412
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45412) Validate the plan and session in DataFrame.__init__

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45412:


Assignee: Ruifeng Zheng

> Validate the plan and session in DataFrame.__init__
> ---
>
> Key: SPARK-45412
> URL: https://issues.apache.org/jira/browse/SPARK-45412
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43657) reuse SPARK_CONF_DIR config maps between driver and executor

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43657?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43657:
---
Labels: pull-request-available  (was: )

> reuse SPARK_CONF_DIR config maps between driver and executor
> 
>
> Key: SPARK-43657
> URL: https://issues.apache.org/jira/browse/SPARK-43657
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.2.4, 3.3.2, 3.4.0
>Reporter: YE
>Priority: Major
>  Labels: pull-request-available
>
> Currently, Spark on k8s-cluster creates two config maps per application: one 
> for the driver and another for the executor. However the config map for 
> executor is almost identical for config map for driver, there's no need to 
> create there two duplicate config maps. As ConfigMaps are object on K8S, 
> there would be some limitations for ConfigMaps on K8S:
>  # more config maps means more objects on etcd, and adds overhead to API 
> server
>  # Spark driver pod might be ran under limited permission, which means, it 
> might not be possible to create resources rather than exec pod. Therefore 
> driver might not be allowed to create config maps.
> I would submit a pr to reuse SPARK_CONF_DIR config maps for running spark on 
> k8s-cluster mode.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42629) Update the description of default data source in the document

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42629:
---
Labels: pull-request-available  (was: )

> Update the description of default data source in the document
> -
>
> Key: SPARK-42629
> URL: https://issues.apache.org/jira/browse/SPARK-42629
> Project: Spark
>  Issue Type: Task
>  Components: Documentation
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42727) Support executing spark commands in the root directory when local mode is specified

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42727:
---
Labels: pull-request-available  (was: )

> Support executing spark commands in the root directory when local mode is 
> specified
> ---
>
> Key: SPARK-42727
> URL: https://issues.apache.org/jira/browse/SPARK-42727
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.5.0
>Reporter: xiaoping.huang
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44025) CSV Table Read Error with CharType(length) column

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44025:
---
Labels: pull-request-available  (was: )

> CSV Table Read Error with CharType(length) column
> -
>
> Key: SPARK-44025
> URL: https://issues.apache.org/jira/browse/SPARK-44025
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
> Environment: {{apache/spark:v3.4.0 image}}
>Reporter: Fengyu Cao
>Priority: Major
>  Labels: pull-request-available
>
> Problem:
>  # read a CSV format table
>  # table has a `CharType(length)` column
>  # read table failed with Exception:  `org.apache.spark.SparkException: Job 
> aborted due to stage failure: Task 0 in stage 36.0 failed 4 times, most 
> recent failure: Lost task 0.3 in stage 36.0 (TID 72) (10.113.9.208 executor 
> 11): java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).`
>  
> reproduce with official image:
>  # {{docker run -it apache/spark:v3.4.0 /opt/spark/bin/spark-sql}}
>  # {{CREATE TABLE csv_bug (name STRING, age INT, job CHAR(4)) USING CSV 
> OPTIONS ('header' = 'true', 'sep' = ';') LOCATION 
> "/opt/spark/examples/src/main/resources/people.csv";}}
>  # SELECT * FROM csv_bug;
>  # ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct) should be the subset of dataSchema 
> (struct).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43255) Assign a name to the error class _LEGACY_ERROR_TEMP_2020

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43255?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43255:
---
Labels: pull-request-available starter  (was: starter)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2020
> 
>
> Key: SPARK-43255
> URL: https://issues.apache.org/jira/browse/SPARK-43255
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2020* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44082) Generate operator does not update reference set properly

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44082?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44082:
---
Labels: pull-request-available  (was: )

> Generate operator does not update reference set properly
> 
>
> Key: SPARK-44082
> URL: https://issues.apache.org/jira/browse/SPARK-44082
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
>
> Before 
> ```
> == Optimized Logical Plan ==
> Project [col1#2, col2#19]
> +- Generate replicaterows(sum#17L, col1#2, col2#3), [2], false, [col1#2, 
> col2#3]
>+- Filter (isnotnull(sum#17L) AND (sum#17L > 0))
>   +- Aggregate [col1#2, col2#19], [col1#2, col2#19, sum(vcol#14L) AS 
> sum#17L]
>  +- Union false, false
> :- Aggregate [col1#2], [1 AS vcol#14L, col1#2, first(col2#3, 
> false) AS col2#19]
> :  +- LogicalRDD [col1#2, col2#3], false
> +- Project [-1 AS vcol#15L, col1#8, col2#9]
>+- LogicalRDD [col1#8, col2#9], false
> ```
> Couldn't find col2#3 in [col1#2,col2#19,sum#17L]
> after 
> ```
> == Optimized Logical Plan ==
> Project [col1#2, col2#19]
> +- Generate replicaterows(sum#17L, col1#2, col2#19), [2], false, [col1#2, 
> col2#19]
>+- Filter (isnotnull(sum#17L) AND (sum#17L > 0))
>   +- Aggregate [col1#2, col2#19], [col1#2, col2#19, sum(vcol#14L) AS 
> sum#17L]
>  +- Union false, false
> :- Aggregate [col1#2], [1 AS vcol#14L, col1#2, first(col2#3, 
> false) AS col2#19]
> :  +- LogicalRDD [col1#2, col2#3], false
> +- Project [-1 AS vcol#15L, col1#8, col2#9]
>+- LogicalRDD [col1#8, col2#9], false
> ```



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44137) Change handling of iterable objects for on field in joins

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44137:
---
Labels: pull-request-available  (was: )

> Change handling of iterable objects for on field in joins
> -
>
> Key: SPARK-44137
> URL: https://issues.apache.org/jira/browse/SPARK-44137
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: John Haberstroh
>Priority: Minor
>  Labels: pull-request-available
>
> The {{on}} field complained when I passed it a Tuple. That's because it saw 
> that it checked for {{list}} exactly, and so wrapped it into a list like 
> {{{}[on]{}}}, leading to immediate failure. This was surprising -- typically, 
> tuple and list should be interchangeable, and typically tuple is the more 
> readily accepted type. I have proposed a change that moves towards the 
> principle of least surprise for this situation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45345) Refactor release-build.sh

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45345.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43212
[https://github.com/apache/spark/pull/43212]

> Refactor release-build.sh
> -
>
> Key: SPARK-45345
> URL: https://issues.apache.org/jira/browse/SPARK-45345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45345) Refactor release-build.sh

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45345:


Assignee: Hyukjin Kwon

> Refactor release-build.sh
> -
>
> Key: SPARK-45345
> URL: https://issues.apache.org/jira/browse/SPARK-45345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45009) Correlated EXISTS subqueries in join ON condition unsupported and fail with internal error

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45009:
---
Labels: pull-request-available  (was: )

> Correlated EXISTS subqueries in join ON condition unsupported and fail with 
> internal error
> --
>
> Key: SPARK-45009
> URL: https://issues.apache.org/jira/browse/SPARK-45009
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> They are not handled in decorrelation, and we also don’t have any checks to 
> block them, so these queries have outer references in the query plan leading 
> to internal errors:
> {code:java}
> CREATE TEMP VIEW x(x1, x2) AS VALUES (0, 1), (1, 2);
> CREATE TEMP VIEW y(y1, y2) AS VALUES (0, 2), (0, 3);
> CREATE TEMP VIEW z(z1, z2) AS VALUES (0, 2), (0, 3);
> select * from x left join y on x1 = y1 and exists (select * from z where z1 = 
> x1)
> Error occurred during query planning: 
> org.apache.spark.sql.catalyst.plans.logical.Filter cannot be cast to 
> org.apache.spark.sql.execution.SparkPlan {code}
> PullupCorrelatedPredicates#apply and RewritePredicateSubquery only handle 
> subqueries in UnaryNode, it seems to assume that they cannot occur elsewhere, 
> like in a join ON condition.
> We will need to decide whether to block them properly in analysis (i.e. give 
> a proper error for them), or see if we can add support for them.
> Also note, scalar subqueries in the ON condition are unsupported too but 
> return a proper error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45396) Add doc entry for `pyspark.ml.connect` module

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45396.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43210
[https://github.com/apache/spark/pull/43210]

> Add doc entry for `pyspark.ml.connect` module
> -
>
> Key: SPARK-45396
> URL: https://issues.apache.org/jira/browse/SPARK-45396
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add doc entry for `pyspark.ml.connect` module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45396) Add doc entry for `pyspark.ml.connect` module

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45396?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45396:


Assignee: Weichen Xu

> Add doc entry for `pyspark.ml.connect` module
> -
>
> Key: SPARK-45396
> URL: https://issues.apache.org/jira/browse/SPARK-45396
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML, PySpark
>Affects Versions: 3.5.0
>Reporter: Weichen Xu
>Assignee: Weichen Xu
>Priority: Major
>  Labels: pull-request-available
>
> Add doc entry for `pyspark.ml.connect` module



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45410) Add Python GitHub Action Daily Job

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45410.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43209
[https://github.com/apache/spark/pull/43209]

> Add Python GitHub Action Daily Job
> --
>
> Key: SPARK-45410
> URL: https://issues.apache.org/jira/browse/SPARK-45410
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28973) Add TimeType to Catalyst

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-28973?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-28973:
---
Labels: pull-request-available  (was: )

> Add TimeType to Catalyst
> 
>
> Key: SPARK-28973
> URL: https://issues.apache.org/jira/browse/SPARK-28973
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> The time type should represent local time in microsecond precision with valid 
> range of values [00:00:00.00, 23:59:59.99]. Internally, time can be 
> stored as number of microseconds since 00:00:00.00.
> Support `java.time.LocalTime` as the external type for Catalyst's TimeType. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45418) Change CURRENT_SCHEMA() column alias to match function name

2023-10-04 Thread Michael Zhang (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771992#comment-17771992
 ] 

Michael Zhang commented on SPARK-45418:
---

I am currently working on this

> Change CURRENT_SCHEMA() column alias to match function name
> ---
>
> Key: SPARK-45418
> URL: https://issues.apache.org/jira/browse/SPARK-45418
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Michael Zhang
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45418) Change CURRENT_SCHEMA() column alias to match function name

2023-10-04 Thread Michael Zhang (Jira)
Michael Zhang created SPARK-45418:
-

 Summary: Change CURRENT_SCHEMA() column alias to match function 
name
 Key: SPARK-45418
 URL: https://issues.apache.org/jira/browse/SPARK-45418
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.5.0
Reporter: Michael Zhang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45408) [CORE] Add RPC SSL settings to TransportConf

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45408:
---
Labels: pull-request-available  (was: )

> [CORE] Add RPC SSL settings to TransportConf
> 
>
> Key: SPARK-45408
> URL: https://issues.apache.org/jira/browse/SPARK-45408
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Hasnain Lakhani
>Priority: Major
>  Labels: pull-request-available
>
> Add support for the settings for SSL RPC support to TransportConf and some 
> associated tests + sample configs used by other tests



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45416:
---
Labels: pull-request-available  (was: )

> Sanity check that Spark Connect returns arrow batches in order
> --
>
> Key: SPARK-45416
> URL: https://issues.apache.org/jira/browse/SPARK-45416
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Juliusz Sompolski
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45417) InheritableThread doesn't inherit active session

2023-10-04 Thread Chungmin Lee (Jira)
Chungmin Lee created SPARK-45417:


 Summary: InheritableThread doesn't inherit active session
 Key: SPARK-45417
 URL: https://issues.apache.org/jira/browse/SPARK-45417
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 3.5.0
Reporter: Chungmin Lee


Repro:

{code:java}
from multiprocessing.pool import ThreadPool
from pyspark import inheritable_thread_target
from pyspark.sql import SparkSession

spark = SparkSession.builder.appName("Test").getOrCreate()
spark.sparkContext.setLogLevel("ERROR")

def f(i, spark):
    print(f"{i} spark = {spark}")
    print(f"{i} active session = {SparkSession.getActiveSession()}")
    print(f"{i} local property foo = 
{spark.sparkContext.getLocalProperty('foo')}")
    spark = SparkSession.builder.appName("Test").getOrCreate()
    print(f"{i} spark = {spark}")
    print(f"{i} active session = {SparkSession.getActiveSession()}")

pool = ThreadPool(4)
spark.sparkContext.setLocalProperty("foo", "bar")
pool.starmap(inheritable_thread_target(f), [(i, spark) for i in range(4)]){code}

{{getOrCreate()}} doesn't set the active session either. The only way is 
calling the Java function directly: 
{{spark._jsparkSession.setActiveSession(spark._jsparkSession)}}.

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45416) Sanity check that Spark Connect returns arrow batches in order

2023-10-04 Thread Juliusz Sompolski (Jira)
Juliusz Sompolski created SPARK-45416:
-

 Summary: Sanity check that Spark Connect returns arrow batches in 
order
 Key: SPARK-45416
 URL: https://issues.apache.org/jira/browse/SPARK-45416
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Juliusz Sompolski






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-45391) spark-connect-repl is not working on macOS

2023-10-04 Thread Michael Baker (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771928#comment-17771928
 ] 

Michael Baker commented on SPARK-45391:
---

I'm getting this same error after installing spark-connect-repl in an arm64 
ubuntu:focal Docker container.

> spark-connect-repl is not working on macOS
> --
>
> Key: SPARK-45391
> URL: https://issues.apache.org/jira/browse/SPARK-45391
> Project: Spark
>  Issue Type: Bug
>  Components: Connect Contrib
>Affects Versions: 3.5.0
> Environment: MacBook M2
> cs version
> 2.1.7
> scala -version
> Scala code runner version 2.12.18 -- Copyright 2002-2023, LAMP/EPFL and 
> Lightbend, Inc.
>Reporter: Vu Tan
>Priority: Major
>
> I followed 
> [https://spark.apache.org/docs/latest/spark-connect-overview.html#use-spark-connect-for-interactive-analysis]
>  to try spark-connect-repl on my local PC but got the following error:
>  
> ---
> spark-connect-repl
> Exception in thread "main" java.lang.NoClassDefFoundError: 
> org/sparkproject/connect/client/com/google/common/io/BaseEncoding
>     at 
> org.sparkproject.connect.client.io.grpc.Metadata.(Metadata.java:114)
>     at 
> org.apache.spark.sql.connect.client.SparkConnectClient$.(SparkConnectClient.scala:329)
>     at 
> org.apache.spark.sql.connect.client.SparkConnectClient$.(SparkConnectClient.scala)
>     at 
> org.apache.spark.sql.application.ConnectRepl$.doMain(ConnectRepl.scala:61)
>     at 
> org.apache.spark.sql.application.ConnectRepl$.main(ConnectRepl.scala:50)
>     at org.apache.spark.sql.application.ConnectRepl.main(ConnectRepl.scala)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at coursier.bootstrap.launcher.a.a(Unknown Source)
>     at coursier.bootstrap.launcher.Launcher.main(Unknown Source)
> Caused by: java.lang.ClassNotFoundException: 
> org.sparkproject.connect.client.com.google.common.io.BaseEncoding
>     at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
>     at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
>     ... 12 more
> ---
>  
> Do you have any idea why this is happening and how to solve it? 
> Thank you.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-38200) [SQL] Spark JDBC Savemode Supports Upsert

2023-10-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-38200?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771921#comment-17771921
 ] 

Hudson commented on SPARK-38200:


User 'EnricoMi' has created a pull request for this issue:
https://github.com/apache/spark/pull/41518

> [SQL] Spark JDBC Savemode Supports Upsert
> -
>
> Key: SPARK-38200
> URL: https://issues.apache.org/jira/browse/SPARK-38200
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> upsert sql for different databases, Most databases support merge sql:
> sqlserver merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/sqlserver/SqlServerDialect.java]
> mysql: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/mysql/MysqlDialect.java]
> oracle merge into sql : 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/oracle/OracleDialect.java]
> postgres: 
> [https://github.com/apache/incubator-seatunnel/blob/dev/seatunnel-connectors-v2/connector-jdbc/src/main/java/org/apache/seatunnel/connectors/seatunnel/jdbc/internal/dialect/psql/PostgresDialect.java]
> postgres merg into sql : 
> [https://www.postgresql.org/docs/current/sql-merge.html]
> db2 merge into sql : 
> [https://www.ibm.com/docs/en/db2-for-zos/12?topic=statements-merge]
> derby merge into sql: 
> [https://db.apache.org/derby/docs/10.14/ref/rrefsqljmerge.html]
> he merg into sql : 
> [https://www.tutorialspoint.com/h2_database/h2_database_merge.htm]
>  
> [~yao] 
>  
> https://github.com/melin/datatunnel/tree/master/plugins/jdbc/src/main/scala/com/superior/datatunnel/plugin/jdbc/support/dialect
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43991) Use the value of spark.eventLog.compression.codec set by user when write compact file

2023-10-04 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771920#comment-17771920
 ] 

Hudson commented on SPARK-43991:


User 'shuyouZZ' has created a pull request for this issue:
https://github.com/apache/spark/pull/41491

> Use the value of spark.eventLog.compression.codec set by user when write 
> compact file
> -
>
> Key: SPARK-43991
> URL: https://issues.apache.org/jira/browse/SPARK-43991
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.4.0
>Reporter: shuyouZZ
>Priority: Major
>
> Currently, if enable rolling log in SHS, only {{originalFilePath}} is used to 
> determine the path of compact file.
> {code:java}
> override val logPath: String = originalFilePath.toUri.toString + 
> EventLogFileWriter.COMPACTED
> {code}
> If the user set {{spark.eventLog.compression.codec}} in sparkConf and it is 
> different from the default value of spark conf, when the log compact logic is 
> triggered, the old event log file will be compacted and use the default value 
> of spark conf.
> {code:java}
> protected val compressionCodec =
> if (shouldCompress) {
>   Some(CompressionCodec.createCodec(sparkConf, 
> sparkConf.get(EVENT_LOG_COMPRESSION_CODEC)))
> } else {
>   None
> }
> private[history] val compressionCodecName = compressionCodec.map { c =>
> CompressionCodec.getShortName(c.getClass.getName)
>   }
> {code}
> However, The compression codec used by EventLogFileReader to read log is 
> split from the log path, this will lead to EventLogFileReader can not read 
> the compacted log file normally.
> {code:java}
> def codecName(log: Path): Option[String] = {
> // Compression codec is encoded as an extension, e.g. app_123.lzf
> // Since we sanitize the app ID to not include periods, it is safe to 
> split on it
> val logName = log.getName.stripSuffix(COMPACTED).stripSuffix(IN_PROGRESS)
> logName.split("\\.").tail.lastOption
>   }
> {code}
> So we should override the {{shouldCompress}} and {{compressionCodec}} 
> variable in class {{{}CompactedEventLogFileWriter{}}}, use the compression 
> codec set by the user.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45404) Support AWS_ENDPOINT_URL env variable

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45404.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43205
[https://github.com/apache/spark/pull/43205]

> Support AWS_ENDPOINT_URL env variable
> -
>
> Key: SPARK-45404
> URL: https://issues.apache.org/jira/browse/SPARK-45404
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45404) Support AWS_ENDPOINT_URL env variable

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45404:
-

Assignee: Dongjoon Hyun

> Support AWS_ENDPOINT_URL env variable
> -
>
> Key: SPARK-45404
> URL: https://issues.apache.org/jira/browse/SPARK-45404
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44735) Log a warning when inserting columns with the same name by row that don't match up

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44735:
---
Labels: pull-request-available  (was: )

> Log a warning when inserting columns with the same name by row that don't 
> match up
> --
>
> Key: SPARK-44735
> URL: https://issues.apache.org/jira/browse/SPARK-44735
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Holden Karau
>Priority: Minor
>  Labels: pull-request-available
>
> With SPARK-42750 people can now insert by name, but sometimes people forget 
> it. We should log warning when it *looks like* someone forgot it (e.g. insert 
> by column number with all the same names *but* not matching up in row).



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45413) Add warning for prepare drop LevelDB support

2023-10-04 Thread Jia Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jia Fan updated SPARK-45413:

Summary: Add warning for prepare drop LevelDB support  (was: Drop leveldb 
support for `spark.history.store.hybridStore.diskBackend`)

> Add warning for prepare drop LevelDB support
> 
>
> Key: SPARK-45413
> URL: https://issues.apache.org/jira/browse/SPARK-45413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Remove leveldb support for `spark.history.store.hybridStore.diskBackend`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by Sean Owen [[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [[https://github.com/srowen/spark-xml]] to post an issue 
here because "This code has been ported to Apache Spark now anyway so won't be 
updated here" (refer to his comment [here|#issuecomment-1744792958]).
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails with 
the following error.

_Caused by: java.lang.IllegalArgumentException: Failed to convert value 
MyDescription (class of class java.lang.String) in type 
ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true)
 to XML._

Please find attached the full trace of the error.
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:python}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
{code:python}
fake_file_df = 

[jira] [Reopened] (SPARK-45088) Make `getitem` work with duplicated columns

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45088?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reopened SPARK-45088:
---
  Assignee: (was: Ruifeng Zheng)

> Make `getitem` work with duplicated columns
> ---
>
> Key: SPARK-45088
> URL: https://issues.apache.org/jira/browse/SPARK-45088
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails with 
the following error.

_Caused by: java.lang.IllegalArgumentException: Failed to convert value 
MyDescription (class of class java.lang.String) in type 
ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true)
 to XML._

Please find attached the full trace of the error.
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:python}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
{code:python}

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails with 
the following error.

_Caused by: java.lang.IllegalArgumentException: Failed to convert value 
MyDescription (class of class java.lang.String) in type 
ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true)
 to XML._

Please find attached the full trace of the error.
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:python}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.

 
{code:python}

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails with 
the following error.

_Caused by: java.lang.IllegalArgumentException: Failed to convert value 
MyDescription (class of class java.lang.String) in type 
ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true)
 to XML._

Please find attached the full trace of the error.
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:python}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
{code:python}

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails with 
the following error.

_Caused by: java.lang.IllegalArgumentException: Failed to convert value 
MyDescription (class of class java.lang.String) in type 
ArrayType(StructType(StructField(_ID,StringType,true),StructField(_Level,StringType,true)),true)
 to XML._

Please find attached the full trace of the error.
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:python}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.

 
{code:python}

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:python}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.

 
{code:python}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: 

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.
{code:java}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:java}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.

 
{code:java}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS 

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.
{code:java}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()
fake_file_df \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path) {code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:java}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
fake_file_df = spark \
.sql( """SELECT CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID, CAST(STRUCT('UPC' AS `_Type`, 
'123' AS `_VALUE`) AS STRUCT<_Type: STRING, _VALUE: STRING>) AS UPC, 
CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS 

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.
{code:java}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

fake_file_df.display()
fake_file_df \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)
{code}
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
{code:java}
fake_file_df \
    .select(
        "ItemID",
        "UPC",
        "_SerialNumberFlag",
        "Description",
        "MerchandiseHierarchy",
        "ItemPrice"
    ) \
    .coalesce(1) \
    .write \
    .format('com.databricks.spark.xml') \
    .option('declaration', 'version="1.0" encoding="UTF-8"') \
    .option("nullValue", "") \
    .option('rootTag', "root_tag") \
    .option('rowTag', "row_tag") \
    .mode('overwrite') \
    .save(xml_folder_path){code}
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
fake_file_df = spark \
.sql( """SELECT CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID, CAST(STRUCT('UPC' AS `_Type`, 
'123' AS `_VALUE`) AS STRUCT<_Type: STRING, _VALUE: STRING>) AS UPC, 
CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS ARRAY>) AS 

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.

 
{code:java}
fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

fake_file_df.display()
fake_file_df \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)
{code}
 

 

I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:

fake_file_df \
.select( "ItemID", "UPC", "_SerialNumberFlag", "Description", 
"MerchandiseHierarchy", "ItemPrice" ) \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
fake_file_df = spark \
.sql( """SELECT CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID, CAST(STRUCT('UPC' AS `_Type`, 
'123' AS `_VALUE`) AS STRUCT<_Type: STRING, _VALUE: STRING>) AS UPC, 
CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS ARRAY>) AS MerchandiseHierarchy, CAST(ARRAY(STRUCT(NULL AS 
`_ValueTypeCode`, NULL AS `_VALUE`)) AS ARRAY>) AS ItemPrice, 

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Description: 
h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first one I post, and feel free to edit it as you like - I welcome 
your feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment [here|#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.

```py

fake_file_df = spark \
    .sql(
        """SELECT
            CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
            CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: 
STRING, _VALUE: STRING>) AS UPC,
            CAST('' AS STRING) AS _SerialNumberFlag,
            CAST('MyDescription' AS STRING) AS Description,
            CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS 
ARRAY>) AS MerchandiseHierarchy,
            CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
            CAST('' AS STRING) AS Color,
            CAST('' AS STRING) AS IntendedIndustry,
            CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,
            CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS 
Marketing,
            CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: STRING>) AS 
BrandOwner,
            CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) 
AS ARRAY>) AS 
ItemAttribute_culinary,
            CAST(ARRAY(STRUCT(NULL AS `_Name`, ARRAY(ARRAY(STRUCT(NULL AS 
`AttributeCode`, NULL AS `AttributeValue`))) AS `_VALUE`)) AS 
ARRAY) AS ItemAttribute_noculinary,
            CAST(STRUCT(STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS 
`Depth`, STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, 
STRUCT(NULL AS `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements,
            CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,
            CAST('' AS STRING) AS ItemImageUrl,
            CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, 
NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,
            CAST('Add' AS STRING) AS _Action
        ;"""
    )

# fake_file_df.display()

fake_file_df \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)

```

I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
fake_file_df \
.select( "ItemID", "UPC", "_SerialNumberFlag", "Description", 
"MerchandiseHierarchy", "ItemPrice" ) \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
fake_file_df = spark \
.sql( """SELECT CAST(STRUCT('ItemId' AS `_Type`, '123' AS `_VALUE`) AS 
STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID, CAST(STRUCT('UPC' AS `_Type`, 
'123' AS `_VALUE`) AS STRUCT<_Type: STRING, _VALUE: STRING>) AS UPC, 
CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS ARRAY>) AS MerchandiseHierarchy, CAST(ARRAY(STRUCT(NULL AS 
`_ValueTypeCode`, NULL AS `_VALUE`)) AS ARRAY>) AS ItemPrice, CAST(STRUCT(NULL 

[jira] [Updated] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Giuseppe Ceravolo updated SPARK-45414:
--
Attachment: IllegalArgumentException.txt

> spark-xml misplaces string tag content
> --
>
> Key: SPARK-45414
> URL: https://issues.apache.org/jira/browse/SPARK-45414
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, Spark Core
>Affects Versions: 3.3.0
>Reporter: Giuseppe Ceravolo
>Priority: Critical
> Attachments: IllegalArgumentException.txt
>
>
> h1. Intro
> Hi all! Please expect some degree of incompleteness in this issue as this is 
> the very first I post, and feel free to edit it as you like - I welcome your 
> feedback.
> My goal is to provide you with as many details and indications as I can on 
> this issue that I am currently facing with a Client of mine on its Production 
> environment (we use Azure Databricks DBR 11.3 LTS).
> I was told by [Sean Owen|[srowen (Sean Owen) 
> (github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
> repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
> issue here because "This code has been ported to Apache Spark now anyway so 
> won't be updated here" (refer to his comment 
> [here|[https://github.com/databricks/spark-xml/issues/431#issuecomment-1744792958]).]
> h1. Issue
> When I write a DataFrame into xml format via the spark-xml library either (1) 
> I get an error if empty string columns are in between non-string nested ones 
> or (2) if I put all string columns at the end then I get a wrong xml where 
> the content of string tags are misplaced into the following ones.
> h1. Code to reproduce the issue
> Please find below the end-to-end code snippet that results into the error
> h2. CASE (1): ERROR
> When empty strings are in between non-string nested ones, the write fails. 
> Please find attached the full trace of the error.
> fake_file_df = spark \
> .sql("""SELECTCAST(STRUCT('ItemId' AS `_Type`, '123' 
> AS `_VALUE`) AS STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
> CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: STRING, 
> _VALUE: STRING>) AS UPC,CAST('' AS STRING) AS _SerialNumberFlag,  
>   CAST('MyDescription' AS STRING) AS Description,
> CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS ARRAY STRING, _Level: STRING>>) AS MerchandiseHierarchy,
> CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
> ARRAY>) AS ItemPrice,  
>   CAST('' AS STRING) AS Color,CAST('' AS STRING) AS 
> IntendedIndustry,CAST(STRUCT(NULL AS `Name`) AS STRUCT STRING>) AS Manufacturer,CAST(STRUCT(NULL AS `Season`) AS 
> STRUCT) AS Marketing,CAST(STRUCT(NULL AS `_Name`) 
> AS STRUCT<_Name: STRING>) AS BrandOwner,
> CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 'Value1' AS `_VALUE`)) AS 
> ARRAY>) AS 
> ItemAttribute_culinary,CAST(ARRAY(STRUCT(NULL AS `_Name`, 
> ARRAY(ARRAY(STRUCT(NULL AS `AttributeCode`, NULL AS `AttributeValue`))) AS 
> `_VALUE`)) AS ARRAY ARRAY) AS 
> ItemAttribute_noculinary,CAST(STRUCT(STRUCT(NULL AS 
> `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Depth`, STRUCT(NULL AS 
> `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, STRUCT(NULL AS 
> `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
> `_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, Height: 
> STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: 
> STRING, _VALUE: STRING>, Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: 
> STRING>>) AS ItemMeasurements,CAST(STRUCT('GroupA' AS 
> `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, '1' AS `TaxAmount`) AS 
> STRUCT) AS 
> TaxInformation,CAST('' AS STRING) AS ItemImageUrl,
> CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS `_franchiseeId`, NULL AS 
> `_franchiseeName`))) AS ARRAY STRING, _franchiseeName: STRING>>>) AS ItemFranchisees,CAST('Add' 
> AS STRING) AS _Action;""")# fake_file_df.display()fake_file_df \
> .coalesce(1) \
> .write \
> .format('com.databricks.spark.xml') \
> .option('declaration', 'version="1.0" encoding="UTF-8"') \
> .option("nullValue", "") \
> .option('rootTag', "root_tag") \
> .option('rowTag', "row_tag") \
> .mode('overwrite') \
> .save(xml_folder_path)
> I noticed that it works if I try to write all columns up to "Color" 
> (excluded), namely:
> fake_file_df \
> .select("ItemID","UPC","_SerialNumberFlag",   
>  "Description","MerchandiseHierarchy","ItemPrice") \
> .coalesce(1) \
> 

[jira] [Created] (SPARK-45414) spark-xml misplaces string tag content

2023-10-04 Thread Giuseppe Ceravolo (Jira)
Giuseppe Ceravolo created SPARK-45414:
-

 Summary: spark-xml misplaces string tag content
 Key: SPARK-45414
 URL: https://issues.apache.org/jira/browse/SPARK-45414
 Project: Spark
  Issue Type: Bug
  Components: PySpark, Spark Core
Affects Versions: 3.3.0
Reporter: Giuseppe Ceravolo


h1. Intro

Hi all! Please expect some degree of incompleteness in this issue as this is 
the very first I post, and feel free to edit it as you like - I welcome your 
feedback.

My goal is to provide you with as many details and indications as I can on this 
issue that I am currently facing with a Client of mine on its Production 
environment (we use Azure Databricks DBR 11.3 LTS).

I was told by [Sean Owen|[srowen (Sean Owen) 
(github.com)|https://github.com/srowen]], who maintains the spark-xml maven 
repository on GitHub [here|[https://github.com/srowen/spark-xml],] to post an 
issue here because "This code has been ported to Apache Spark now anyway so 
won't be updated here" (refer to his comment 
[here|[https://github.com/databricks/spark-xml/issues/431#issuecomment-1744792958]).]
h1. Issue

When I write a DataFrame into xml format via the spark-xml library either (1) I 
get an error if empty string columns are in between non-string nested ones or 
(2) if I put all string columns at the end then I get a wrong xml where the 
content of string tags are misplaced into the following ones.
h1. Code to reproduce the issue

Please find below the end-to-end code snippet that results into the error
h2. CASE (1): ERROR

When empty strings are in between non-string nested ones, the write fails. 
Please find attached the full trace of the error.
fake_file_df = spark \
.sql("""SELECTCAST(STRUCT('ItemId' AS `_Type`, '123' AS 
`_VALUE`) AS STRUCT<_Type: STRING, _VALUE: STRING>) AS ItemID,
CAST(STRUCT('UPC' AS `_Type`, '123' AS `_VALUE`) AS STRUCT<_Type: STRING, 
_VALUE: STRING>) AS UPC,CAST('' AS STRING) AS _SerialNumberFlag,
CAST('MyDescription' AS STRING) AS Description,
CAST(ARRAY(STRUCT(NULL AS `_ID`, NULL AS `_Level`)) AS ARRAY>) AS MerchandiseHierarchy,
CAST(ARRAY(STRUCT(NULL AS `_ValueTypeCode`, NULL AS `_VALUE`)) AS 
ARRAY>) AS ItemPrice,
CAST('' AS STRING) AS Color,CAST('' AS STRING) AS IntendedIndustry, 
   CAST(STRUCT(NULL AS `Name`) AS STRUCT) AS 
Manufacturer,CAST(STRUCT(NULL AS `Season`) AS STRUCT) AS Marketing,CAST(STRUCT(NULL AS `_Name`) AS STRUCT<_Name: 
STRING>) AS BrandOwner,CAST(ARRAY(STRUCT('Attribute1' AS `_Name`, 
'Value1' AS `_VALUE`)) AS ARRAY>) 
AS ItemAttribute_culinary,CAST(ARRAY(STRUCT(NULL AS `_Name`, 
ARRAY(ARRAY(STRUCT(NULL AS `AttributeCode`, NULL AS `AttributeValue`))) AS 
`_VALUE`)) AS ARRAY) AS 
ItemAttribute_noculinary,CAST(STRUCT(STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Depth`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Height`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Width`, STRUCT(NULL AS 
`_UnitOfMeasure`, NULL AS `_VALUE`) AS `Diameter`) AS STRUCT, Height: STRUCT<_UnitOfMeasure: 
STRING, _VALUE: STRING>, Width: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>, 
Diameter: STRUCT<_UnitOfMeasure: STRING, _VALUE: STRING>>) AS ItemMeasurements, 
   CAST(STRUCT('GroupA' AS `TaxGroupID`, 'CodeA' AS `TaxExemptCode`, 
'1' AS `TaxAmount`) AS STRUCT) AS TaxInformation,CAST('' AS STRING) AS 
ItemImageUrl,CAST(ARRAY(ARRAY(STRUCT(NULL AS `_action`, NULL AS 
`_franchiseeId`, NULL AS `_franchiseeName`))) AS ARRAY>>) AS ItemFranchisees,  
  CAST('Add' AS STRING) AS _Action;""")# 
fake_file_df.display()fake_file_df \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)
I noticed that it works if I try to write all columns up to "Color" (excluded), 
namely:
fake_file_df \
.select("ItemID","UPC","_SerialNumberFlag",
"Description","MerchandiseHierarchy","ItemPrice") \
.coalesce(1) \
.write \
.format('com.databricks.spark.xml') \
.option('declaration', 'version="1.0" encoding="UTF-8"') \
.option("nullValue", "") \
.option('rootTag', "root_tag") \
.option('rowTag', "row_tag") \
.mode('overwrite') \
.save(xml_folder_path)
h2. CASE (2): MISPLACED XML

When I put all string columns at the end of the 1-row DataFrame it mistakenly 
writes the content of one column into the tag right after it.
fake_file_df = spark \
.sql("""SELECTCAST(STRUCT('ItemId' AS `_Type`, '123' AS 

[jira] [Commented] (SPARK-45093) AddArtifacts should give proper error messages if it fails

2023-10-04 Thread Nikita Awasthi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-45093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771836#comment-17771836
 ] 

Nikita Awasthi commented on SPARK-45093:


User 'cdkrot' has created a pull request for this issue:
https://github.com/apache/spark/pull/43216

> AddArtifacts should give proper error messages if it fails
> --
>
> Key: SPARK-45093
> URL: https://issues.apache.org/jira/browse/SPARK-45093
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0
>Reporter: Alice Sayutina
>Assignee: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I've been trying to do some testing of udf's using code in other module, so 
> that  AddArtifact is necessary.
>  
> I got the following error:
>  
>  
> {code:java}
> Traceback (most recent call last):
>   File "/Users/alice.sayutina/db-connect-playground/udf2.py", line 5, in 
> 
>     spark.addArtifacts("udf2_support.py", pyfile=True)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/session.py",
>  line 744, in addArtifacts
>     self._client.add_artifacts(*path, pyfile=pyfile, archive=archive, 
> file=file)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/core.py",
>  line 1582, in add_artifacts
>     self._artifact_manager.add_artifacts(*path, pyfile=pyfile, 
> archive=archive, file=file)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 283, in add_artifacts
>     self._request_add_artifacts(requests)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 259, in _request_add_artifacts
>     response: proto.AddArtifactsResponse = self._retrieve_responses(requests)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/pyspark/sql/connect/client/artifact.py",
>  line 256, in _retrieve_responses
>     return self._stub.AddArtifacts(requests, metadata=self._metadata)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py",
>  line 1246, in __call__
>     return _end_unary_response_blocking(state, call, False, None)
>   File 
> "/Users/alice.sayutina/db-connect-venv/lib/python3.10/site-packages/grpc/_channel.py",
>  line 910, in _end_unary_response_blocking
>     raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
> grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated 
> with:
>         status = StatusCode.UNKNOWN
>         details = "Exception iterating requests!"
>         debug_error_string = "None"
> {code}
>  
> Which doesn't give any clue about what happens.
> Only after noticeable investigation I found the problem: I'm specifying the 
> wrong path and the artifact fails to upload. Specifically what happens is 
> that ArtifactManager doesn't read the file immediately, but rather creates 
> iterator object which will incrementally generate requests to send. This 
> iterator is passed to grpc's stream_unary to consume and actually send, and 
> while grpc catches the error (see above), it suppresses the underlying 
> exception.
> I think we should improve pyspark user experience. One of the possible ways 
> to fix this is to wrap ArtifactsManager._create_requests with an iterator 
> wrapper which would log the throwable into spark connect logger so that user 
> would see something like below at least when the debug mode is on.
>  
> {code:java}
> FileNotFoundError: [Errno 2] No such file or directory: 
> '/Users/alice.sayutina/udf2_support.py' {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45413) Drop leveldb support for `spark.history.store.hybridStore.diskBackend`

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45413:
---
Labels: pull-request-available  (was: )

> Drop leveldb support for `spark.history.store.hybridStore.diskBackend`
> --
>
> Key: SPARK-45413
> URL: https://issues.apache.org/jira/browse/SPARK-45413
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Jia Fan
>Priority: Major
>  Labels: pull-request-available
>
> Remove leveldb support for `spark.history.store.hybridStore.diskBackend`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45394) Retry handling for add_artifact

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45394?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45394:
---
Labels: pull-request-available  (was: )

> Retry handling for add_artifact
> ---
>
> Key: SPARK-45394
> URL: https://issues.apache.org/jira/browse/SPARK-45394
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Alice Sayutina
>Priority: Major
>  Labels: pull-request-available
>
> There is no retry handling within add_artifact



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44223) Drop leveldb support

2023-10-04 Thread Jia Fan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771824#comment-17771824
 ] 

Jia Fan commented on SPARK-44223:
-

Drop leveldb support for `spark.shuffle.service.db.backend` will be created 
after 4.0.0 released.

> Drop leveldb support
> 
>
> Key: SPARK-44223
> URL: https://issues.apache.org/jira/browse/SPARK-44223
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> The leveldb project seems to be no longer maintained, and we can always 
> replace it with rocksdb. I think we can remove support and dependencies on 
> leveldb in Spark 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45413) Drop leveldb support for `spark.history.store.hybridStore.diskBackend`

2023-10-04 Thread Jia Fan (Jira)
Jia Fan created SPARK-45413:
---

 Summary: Drop leveldb support for 
`spark.history.store.hybridStore.diskBackend`
 Key: SPARK-45413
 URL: https://issues.apache.org/jira/browse/SPARK-45413
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Jia Fan


Remove leveldb support for `spark.history.store.hybridStore.diskBackend`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45412) Validate the plan and session in DataFrame.__init__

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45412?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45412:
---
Labels: pull-request-available  (was: )

> Validate the plan and session in DataFrame.__init__
> ---
>
> Key: SPARK-45412
> URL: https://issues.apache.org/jira/browse/SPARK-45412
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45412) Validate the plan and session in DataFrame.__init__

2023-10-04 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-45412:
-

 Summary: Validate the plan and session in DataFrame.__init__
 Key: SPARK-45412
 URL: https://issues.apache.org/jira/browse/SPARK-45412
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45355) Fix function groups in Scala Doc

2023-10-04 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45355:
-

Assignee: Ruifeng Zheng

> Fix function groups in Scala Doc
> 
>
> Key: SPARK-45355
> URL: https://issues.apache.org/jira/browse/SPARK-45355
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45355) Fix function groups in Scala Doc

2023-10-04 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45355.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43148
[https://github.com/apache/spark/pull/43148]

> Fix function groups in Scala Doc
> 
>
> Key: SPARK-45355
> URL: https://issues.apache.org/jira/browse/SPARK-45355
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-43656) Fix pyspark.sql.column._to_java_column to accept Connect Column

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43656:
---
Labels: pull-request-available  (was: )

> Fix pyspark.sql.column._to_java_column to accept Connect Column
> ---
>
> Key: SPARK-43656
> URL: https://issues.apache.org/jira/browse/SPARK-43656
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Run `NumPyCompatParityTests.test_np_spark_compat_frame` to repro.
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45386:
--

Assignee: (was: Apache Spark)

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45386:
--

Assignee: Apache Spark

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45386:
--

Assignee: Apache Spark

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45386) Correctness issue when persisting using StorageLevel.NONE

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45386?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45386:
--

Assignee: (was: Apache Spark)

> Correctness issue when persisting using StorageLevel.NONE
> -
>
> Key: SPARK-45386
> URL: https://issues.apache.org/jira/browse/SPARK-45386
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Emil Ejbyfeldt
>Priority: Major
>  Labels: pull-request-available
>
> When using spark 3.5.0 this code
> {code:java}
> import org.apache.spark.storage.StorageLevel
> spark.createDataset(Seq(1,2,3)).persist(StorageLevel.NONE).count() {code}
> incorrectly returns 0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45355) Fix function groups in Scala Doc

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45355:
--

Assignee: (was: Apache Spark)

> Fix function groups in Scala Doc
> 
>
> Key: SPARK-45355
> URL: https://issues.apache.org/jira/browse/SPARK-45355
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45355) Fix function groups in Scala Doc

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45355:
--

Assignee: Apache Spark

> Fix function groups in Scala Doc
> 
>
> Key: SPARK-45355
> URL: https://issues.apache.org/jira/browse/SPARK-45355
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45345) Refactor release-build.sh

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45345:
--

Assignee: Apache Spark

> Refactor release-build.sh
> -
>
> Key: SPARK-45345
> URL: https://issues.apache.org/jira/browse/SPARK-45345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45345) Refactor release-build.sh

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45345:
--

Assignee: (was: Apache Spark)

> Refactor release-build.sh
> -
>
> Key: SPARK-45345
> URL: https://issues.apache.org/jira/browse/SPARK-45345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-45411) Fix snapshot build in publish_snapshot.yml

2023-10-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45411?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-45411:
-


> Fix snapshot build in publish_snapshot.yml
> --
>
> Key: SPARK-45411
> URL: https://issues.apache.org/jira/browse/SPARK-45411
> Project: Spark
>  Issue Type: Sub-task
>Reporter: Hyukjin Kwon
>Priority: Major
>
> https://github.com/apache/spark/actions/workflows/publish_snapshot.yml 
> snapshots being failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-45411) Fix snapshot build in publish_snapshot.yml

2023-10-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-45411:


 Summary: Fix snapshot build in publish_snapshot.yml
 Key: SPARK-45411
 URL: https://issues.apache.org/jira/browse/SPARK-45411
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/actions/workflows/publish_snapshot.yml 
snapshots being failed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45345) Refactor release-build.sh

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45345?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45345:
---
Labels: pull-request-available  (was: )

> Refactor release-build.sh
> -
>
> Key: SPARK-45345
> URL: https://issues.apache.org/jira/browse/SPARK-45345
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45398) Include `ESCAPE` to `sql()` of `Like`

2023-10-04 Thread Max Gekk (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-45398.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43196
[https://github.com/apache/spark/pull/43196]

> Include `ESCAPE` to `sql()` of `Like`
> -
>
> Key: SPARK-45398
> URL: https://issues.apache.org/jira/browse/SPARK-45398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix the `sql()` method of the `Like` expression and append the `ESCAPE` 
> closure. That should become consistent to `toString` and fix the issue:
> {code:sql}
> spark-sql (default)> create temp view tbl as (SELECT 'a|_' like 'a||_' escape 
> '|', 'a|_' like 'a||_' escape 'a');
> [COLUMN_ALREADY_EXISTS] The column `a|_ like a||_` already exists. Consider 
> to choose another name or rename the existing column.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45398) Include `ESCAPE` to `sql()` of `Like`

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45398?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45398:
---
Labels: pull-request-available  (was: )

> Include `ESCAPE` to `sql()` of `Like`
> -
>
> Key: SPARK-45398
> URL: https://issues.apache.org/jira/browse/SPARK-45398
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> Fix the `sql()` method of the `Like` expression and append the `ESCAPE` 
> closure. That should become consistent to `toString` and fix the issue:
> {code:sql}
> spark-sql (default)> create temp view tbl as (SELECT 'a|_' like 'a||_' escape 
> '|', 'a|_' like 'a||_' escape 'a');
> [COLUMN_ALREADY_EXISTS] The column `a|_ like a||_` already exists. Consider 
> to choose another name or rename the existing column.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-45400) Refer to the unescaping rules from expression descriptions

2023-10-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45400?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45400:
---
Labels: pull-request-available  (was: )

> Refer to the unescaping rules from expression descriptions
> --
>
> Key: SPARK-45400
> URL: https://issues.apache.org/jira/browse/SPARK-45400
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Minor
>  Labels: pull-request-available
>
> Update the expression/function description and refer to the unescaping rules 
> in the items where regexp parameters are described. This should less confuse 
> users.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44223) Drop leveldb support

2023-10-04 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44223?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17771736#comment-17771736
 ] 

Dongjoon Hyun commented on SPARK-44223:
---

This will happen at Apache Spark 4.1.0 because SPARK-45351 is targeting Apache 
Spark 4.0.0.

> Drop leveldb support
> 
>
> Key: SPARK-44223
> URL: https://issues.apache.org/jira/browse/SPARK-44223
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> The leveldb project seems to be no longer maintained, and we can always 
> replace it with rocksdb. I think we can remove support and dependencies on 
> leveldb in Spark 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44223) Drop leveldb support

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44223?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44223:
--
Parent: (was: SPARK-44111)
Issue Type: Task  (was: Sub-task)

> Drop leveldb support
> 
>
> Key: SPARK-44223
> URL: https://issues.apache.org/jira/browse/SPARK-44223
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> The leveldb project seems to be no longer maintained, and we can always 
> replace it with rocksdb. I think we can remove support and dependencies on 
> leveldb in Spark 4.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45409) Pin `torch<=2.0.1`

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45409.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43207
[https://github.com/apache/spark/pull/43207]

> Pin `torch<=2.0.1`
> --
>
> Key: SPARK-45409
> URL: https://issues.apache.org/jira/browse/SPARK-45409
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45409) Pin `torch<=2.0.1`

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45409:
-

Assignee: Dongjoon Hyun

> Pin `torch<=2.0.1`
> --
>
> Key: SPARK-45409
> URL: https://issues.apache.org/jira/browse/SPARK-45409
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45407) Skip Unidoc in SparkR GitHub Action Job

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45407:
-

Assignee: Dongjoon Hyun

> Skip Unidoc in SparkR GitHub Action Job
> ---
>
> Key: SPARK-45407
> URL: https://issues.apache.org/jira/browse/SPARK-45407
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45407) Skip Unidoc in SparkR GitHub Action Job

2023-10-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45407.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 43208
[https://github.com/apache/spark/pull/43208]

> Skip Unidoc in SparkR GitHub Action Job
> ---
>
> Key: SPARK-45407
> URL: https://issues.apache.org/jira/browse/SPARK-45407
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org