date:20240220

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos)，pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx
Namespace:        superior
Priority:         0
Service Account:  spark
Node:             cdh2/172.18.5.45
Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
Labels:           profile=production
                  spark-app-name=spark-pi
                  spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
                  spark-role=driver
                  spark-version=3.4.2
Annotations:      
Status:           Failed
IP:               10.244.1.4
IPs:
  IP:  10.244.1.4
Containers:
  spark-kubernetes-driver:
    Container ID:  
containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15
    Image:         spark:3.4.1
    Image ID:      
docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class
      org.apache.spark.examples.SparkPi
      spark-internal
      5
    State:          Terminated
      Reason:       Error
      Exit Code:    1
      Started:      Wed, 21 Feb 2024 15:49:54 +0800
      Finished:     Wed, 21 Feb 2024 15:49:56

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos)，pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    --conf 
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
 \
    --conf 
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
 \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx
Namespace:        superior
Priority:         0
Service Account:  spark
Node:             cdh2/172.18.5.45
Start Time:       Wed, 21 Feb 2024 15:48:08 +0800
Labels:           profile=production
                  spark-app-name=spark-pi
                  spark-app-selector=spark-728e24e49f9040fa86b04c521463020b
                  spark-role=driver
                  spark-version=3.4.2
Annotations:      
Status:           Failed
IP:               10.244.1.4
IPs:
  IP:  10.244.1.4
Containers:
  spark-kubernetes-driver:
    Container ID:  
containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15
    Image:         spark:3.4.1
    Image ID:      
docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694
    Ports:         7078/TCP, 7079/TCP, 4040/TCP
    Host Ports:    0/TCP, 0/TCP, 0/TCP
    Args:
      driver
      --properties-file
      /opt/spark/conf/spark.properties
      --class

[jira] [Assigned] (SPARK-47112) Write logs into a file in SparkR Windows build

2024-02-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47112:


Assignee: Hyukjin Kwon

> Write logs into a file in SparkR Windows build
> --
>
> Key: SPARK-47112
> URL: https://issues.apache.org/jira/browse/SPARK-47112
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/7977185456/job/21779508822
> This write too many logs, and difficult to see the real test cases. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47112) Write logs into a file in SparkR Windows build

2024-02-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47112.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45192
[https://github.com/apache/spark/pull/45192]

> Write logs into a file in SparkR Windows build
> --
>
> Key: SPARK-47112
> URL: https://issues.apache.org/jira/browse/SPARK-47112
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/runs/7977185456/job/21779508822
> This write too many logs, and difficult to see the real test cases. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47116) Install proper Python version in SparkR Windows build to avoid warnings

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47116:
---
Labels: pull-request-available  (was: )

> Install proper Python version in SparkR Windows build to avoid warnings
> ---
>
> Key: SPARK-47116
> URL: https://issues.apache.org/jira/browse/SPARK-47116
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830
> {code}
> Traceback (most recent call last):
>   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 183, 
> in _run_module_as_main
> mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
>   File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 109, 
> in _get_module_details
> __import__(pkg_name)
>   File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\__init__.py", line 
> [53](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:54),
>  in 
>   File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\rdd.py", line 
> [54](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:55),
>  in 
>   File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\java_gateway.py", 
> line 33, in 
>   File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 
> 69, in 
>   File 
> "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\__init__.py", 
> line 1, in 
>   File 
> "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\cloudpickle.py", 
> line 80, in 
> ImportError: cannot import name 'CellType' from 'types' 
> (C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\types.py)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47116) Install proper Python version in SparkR Windows build to avoid warnings

2024-02-20 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-47116:


 Summary: Install proper Python version in SparkR Windows build to 
avoid warnings
 Key: SPARK-47116
 URL: https://issues.apache.org/jira/browse/SPARK-47116
 Project: Spark
  Issue Type: Test
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830

{code}
Traceback (most recent call last):
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 183, in 
_run_module_as_main
mod_name, mod_spec, code = _get_module_details(mod_name, _Error)
  File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 109, in 
_get_module_details
__import__(pkg_name)
  File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\__init__.py", line 
[53](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:54),
 in 
  File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\rdd.py", line 
[54](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:55),
 in 
  File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\java_gateway.py", line 
33, in 
  File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 
69, in 
  File 
"D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\__init__.py", line 
1, in 
  File 
"D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\cloudpickle.py", 
line 80, in 
ImportError: cannot import name 'CellType' from 'types' 
(C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\types.py)
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47113:
--
Parent: SPARK-47046
Issue Type: Sub-task  (was: Task)

> Revert S3A endpoint fixup logic of SPARK-35878
> --
>
> Key: SPARK-47113
> URL: https://issues.apache.org/jira/browse/SPARK-47113
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47113:
-

Assignee: Steve Loughran

> Revert S3A endpoint fixup logic of SPARK-35878
> --
>
> Key: SPARK-47113
> URL: https://issues.apache.org/jira/browse/SPARK-47113
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Steve Loughran
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47113.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45193
[https://github.com/apache/spark/pull/45193]

> Revert S3A endpoint fixup logic of SPARK-35878
> --
>
> Key: SPARK-47113
> URL: https://issues.apache.org/jira/browse/SPARK-47113
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47115) Use larger memory for Maven builds

2024-02-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47115:


Assignee: Hyukjin Kwon

> Use larger memory for Maven builds
> --
>
> Key: SPARK-47115
> URL: https://issues.apache.org/jira/browse/SPARK-47115
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> *** RUN ABORTED ***
> An exception or error caused a run to abort: unable to create native thread: 
> possibly out of memory or process/resource limits reached 
>   java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>   at java.base/java.lang.Thread.start0(Native Method)
>   at java.base/java.lang.Thread.start(Thread.java:1553)
>   at java.base/java.lang.System$2.start(System.java:2577)
>   at 
> java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
>   ...
> Warning:  The requested profile "volcano" could not be activated because it 
> does not exist.
> Warning:  The requested profile "hive" could not be activated because it does 
> not exist.
> Error:  Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
> spark-core_2.13: There are test failures -> [Help 1]
> Error:  
> Error:  To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> Error:  Re-run Maven using the -X switch to enable full debug logging.
> Error:  
> Error:  For more information about the errors and possible solutions, please 
> read the following articles:
> Error:  [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> Error:  
> Error:  After correcting the problems, you can resume the build with the 
> command
> Error:mvn  -rf :spark-core_2.13
> Error: Process completed with exit code 1.
> {code}
> https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47115) Use larger memory for Maven builds

2024-02-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47115.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45195
[https://github.com/apache/spark/pull/45195]

> Use larger memory for Maven builds
> --
>
> Key: SPARK-47115
> URL: https://issues.apache.org/jira/browse/SPARK-47115
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> *** RUN ABORTED ***
> An exception or error caused a run to abort: unable to create native thread: 
> possibly out of memory or process/resource limits reached 
>   java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>   at java.base/java.lang.Thread.start0(Native Method)
>   at java.base/java.lang.Thread.start(Thread.java:1553)
>   at java.base/java.lang.System$2.start(System.java:2577)
>   at 
> java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
>   ...
> Warning:  The requested profile "volcano" could not be activated because it 
> does not exist.
> Warning:  The requested profile "hive" could not be activated because it does 
> not exist.
> Error:  Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
> spark-core_2.13: There are test failures -> [Help 1]
> Error:  
> Error:  To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> Error:  Re-run Maven using the -X switch to enable full debug logging.
> Error:  
> Error:  For more information about the errors and possible solutions, please 
> read the following articles:
> Error:  [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> Error:  
> Error:  After correcting the problems, you can resume the build with the 
> command
> Error:mvn  -rf :spark-core_2.13
> Error: Process completed with exit code 1.
> {code}
> https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47115) Use larger memory for Maven builds

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47115:
---
Labels: pull-request-available  (was: )

> Use larger memory for Maven builds
> --
>
> Key: SPARK-47115
> URL: https://issues.apache.org/jira/browse/SPARK-47115
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> *** RUN ABORTED ***
> An exception or error caused a run to abort: unable to create native thread: 
> possibly out of memory or process/resource limits reached 
>   java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
> memory or process/resource limits reached
>   at java.base/java.lang.Thread.start0(Native Method)
>   at java.base/java.lang.Thread.start(Thread.java:1553)
>   at java.base/java.lang.System$2.start(System.java:2577)
>   at 
> java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
>   at 
> java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
>   at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
>   at 
> org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
>   at 
> org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
>   ...
> Warning:  The requested profile "volcano" could not be activated because it 
> does not exist.
> Warning:  The requested profile "hive" could not be activated because it does 
> not exist.
> Error:  Failed to execute goal 
> org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project 
> spark-core_2.13: There are test failures -> [Help 1]
> Error:  
> Error:  To see the full stack trace of the errors, re-run Maven with the -e 
> switch.
> Error:  Re-run Maven using the -X switch to enable full debug logging.
> Error:  
> Error:  For more information about the errors and possible solutions, please 
> read the following articles:
> Error:  [Help 1] 
> http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
> Error:  
> Error:  After correcting the problems, you can resume the build with the 
> command
> Error:mvn  -rf :spark-core_2.13
> Error: Process completed with exit code 1.
> {code}
> https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47115) Use larger memory for Maven builds

2024-02-20 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-47115:


 Summary: Use larger memory for Maven builds
 Key: SPARK-47115
 URL: https://issues.apache.org/jira/browse/SPARK-47115
 Project: Spark
  Issue Type: Test
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


{code}
*** RUN ABORTED ***
An exception or error caused a run to abort: unable to create native thread: 
possibly out of memory or process/resource limits reached 
  java.lang.OutOfMemoryError: unable to create native thread: possibly out of 
memory or process/resource limits reached
  at java.base/java.lang.Thread.start0(Native Method)
  at java.base/java.lang.Thread.start(Thread.java:1553)
  at java.base/java.lang.System$2.start(System.java:2577)
  at 
java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953)
  at 
java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364)
  at 
org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128)
  at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190)
  at org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127)
  at 
org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46)
  ...
Warning:  The requested profile "volcano" could not be activated because it 
does not exist.
Warning:  The requested profile "hive" could not be activated because it does 
not exist.
Error:  Failed to execute goal org.scalatest:scalatest-maven-plugin:2.2.0:test 
(test) on project spark-core_2.13: There are test failures -> [Help 1]
Error:  
Error:  To see the full stack trace of the errors, re-run Maven with the -e 
switch.
Error:  Re-run Maven using the -X switch to enable full debug logging.
Error:  
Error:  For more information about the errors and possible solutions, please 
read the following articles:
Error:  [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException
Error:  
Error:  After correcting the problems, you can resume the build with the command
Error:mvn  -rf :spark-core_2.13
Error: Process completed with exit code 1.
{code}

https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-47114:
--
Description: 
spark runs in kubernetes and accesses an external hdfs cluster (kerberos)，pod 
error logs
{code:java}
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed{code}
This error generally occurs when the krb5 file cannot be found

[~yao] [~Qin Yao] 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    --conf 
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
 \
    --conf 
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
 \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ '[' -z /opt/java/openjdk ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
++ command -v readarray
+ '[' readarray ']'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi 
spark-internal 5
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:

[jira] [Created] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file

2024-02-20 Thread melin (Jira)

melin created SPARK-47114:
-

 Summary: In the spark driver pod. Failed to access the krb5 file
 Key: SPARK-47114
 URL: https://issues.apache.org/jira/browse/SPARK-47114
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.4.1
Reporter: melin


spark runs in kubernetes and accesses an external hdfs cluster (kerberos)

 
{code:java}
./bin/spark-submit \
    --master k8s://https://172.18.5.44:6443 \
    --deploy-mode cluster \
    --name spark-pi \
    --class org.apache.spark.examples.SparkPi \
    --conf spark.executor.instances=1 \
    --conf spark.kubernetes.submission.waitAppCompletion=true \
    --conf spark.kubernetes.driver.pod.name=spark-xxx \
    --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \
    --conf spark.kubernetes.driver.label.profile=production \
    --conf spark.kubernetes.executor.label.profile=production \
    --conf spark.kubernetes.namespace=superior \
    --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \
    --conf 
spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0
 \
    --conf 
spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \
    --conf spark.kubernetes.container.image.pullPolicy=Always \
    --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \
    --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf  \
    --conf spark.kerberos.principal=superior/ad...@datacyber.com  \
    --conf spark.kerberos.keytab=/root/superior.keytab  \
    --conf 
spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml
 \
    --conf 
spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml
 \
    
file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar
  5{code}
{code:java}
(base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior
++ id -u
+ myuid=0
++ id -g
+ mygid=0
+ set +e
++ getent passwd 0
+ uidentry=root:x:0:0:root:/root:/bin/bash
+ set -e
+ '[' -z root:x:0:0:root:/root:/bin/bash ']'
+ '[' -z /opt/java/openjdk ']'
+ SPARK_CLASSPATH=':/opt/spark/jars/*'
+ env
+ grep SPARK_JAVA_OPT_
+ sort -t_ -k4 -n
+ sed 's/[^=]*=\(.*\)/\1/g'
++ command -v readarray
+ '[' readarray ']'
+ readarray -t SPARK_EXECUTOR_JAVA_OPTS
+ '[' -n '' ']'
+ '[' -z ']'
+ '[' -z ']'
+ '[' -n '' ']'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*'
+ '[' -z x ']'
+ SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*'
+ case "$1" in
+ shift 1
+ CMD=("$SPARK_HOME/bin/spark-submit" --conf 
"spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@")
+ exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file 
/opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi 
spark-internal 5
Exception in thread "main" java.lang.IllegalArgumentException: Can't get 
Kerberos realm
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71)
        at 
org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315)
        at 
org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300)
        at 
org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395)
        at 
org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389)
        at 
org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119)
        at 
org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385)
        at 
org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955)
        at 
org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192)
        at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215)
        at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
        at 
org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:)
        at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120)
        at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading 
failed
        at 
java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown
 Source)
        at 
org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120)
        at 
org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69)
        ... 13 more
(base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior
Name:             spark-xxx

[jira] [Updated] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47113:
---
Labels: pull-request-available  (was: )

> Revert S3A endpoint fixup logic of SPARK-35878
> --
>
> Key: SPARK-47113
> URL: https://issues.apache.org/jira/browse/SPARK-47113
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878

2024-02-20 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47113:
-

 Summary: Revert S3A endpoint fixup logic of SPARK-35878
 Key: SPARK-47113
 URL: https://issues.apache.org/jira/browse/SPARK-47113
 Project: Spark
  Issue Type: Task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46928) Support ListState in Arbitrary State API v2

2024-02-20 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46928.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44961
[https://github.com/apache/spark/pull/44961]

> Support ListState in Arbitrary State API v2
> ---
>
> Key: SPARK-46928
> URL: https://issues.apache.org/jira/browse/SPARK-46928
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Bhuwan Sahni
>Assignee: Bhuwan Sahni
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As part of Arbitrary State API v2 
> ([https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig),]
>  we need to support ListState. This task encounters adding support for 
> ListState in Scala. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46928) Support ListState in Arbitrary State API v2

2024-02-20 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-46928:


Assignee: Bhuwan Sahni

> Support ListState in Arbitrary State API v2
> ---
>
> Key: SPARK-46928
> URL: https://issues.apache.org/jira/browse/SPARK-46928
> Project: Spark
>  Issue Type: New Feature
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Bhuwan Sahni
>Assignee: Bhuwan Sahni
>Priority: Major
>  Labels: pull-request-available
>
> As part of Arbitrary State API v2 
> ([https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig),]
>  we need to support ListState. This task encounters adding support for 
> ListState in Scala. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46934) Unable to create Hive View from certain Spark Dataframe StructType

2024-02-20 Thread Yu-Ting LIN (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819062#comment-17819062
 ] 

Yu-Ting LIN commented on SPARK-46934:
-

[~dongjoon] As I have mentioned before, we are currently mainly using Spark 
3.3.2 and we also have a plan to migrate to Spark 3.5. I did not figure out 
which Spark versions supported my use cases. Based on my understanding, both 
Spark 3.3 and 3.5 do not support this feature.

> Unable to create Hive View from certain Spark Dataframe StructType
> --
>
> Key: SPARK-46934
> URL: https://issues.apache.org/jira/browse/SPARK-46934
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0, 3.3.2, 3.3.4
> Environment: Tested in Spark 3.3.0, 3.3.2.
>Reporter: Yu-Ting LIN
>Assignee: Kent Yao
>Priority: Blocker
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> We are trying to create a Hive View using following SQL command "CREATE OR 
> REPLACE VIEW yuting AS SELECT INFO_ANN FROM table_2611810".
> Our table_2611810 has certain columns contain special characters such as "/". 
> Here is the schema of this table.
> {code:java}
> contigName              string
> start                   bigint
> end                     bigint
> names                   array
> referenceAllele         string
> alternateAlleles        array
> qual                    double
> filters                 array
> splitFromMultiAllelic    boolean
> INFO_NCAMP              int
> INFO_ODDRATIO           double
> INFO_NM                 double
> INFO_DBSNP_CAF          array
> INFO_SPANPAIR           int
> INFO_TLAMP              int
> INFO_PSTD               double
> INFO_QSTD               double
> INFO_SBF                double
> INFO_AF                 array
> INFO_QUAL               double
> INFO_SHIFT3             int
> INFO_VARBIAS            string
> INFO_HICOV              int
> INFO_PMEAN              double
> INFO_MSI                double
> INFO_VD                 int
> INFO_DP                 int
> INFO_HICNT              int
> INFO_ADJAF              double
> INFO_SVLEN              int
> INFO_RSEQ               string
> INFO_MSigDb             array
> INFO_NMD                array
> INFO_ANN                
> array,Annotation_Impact:string,Gene_Name:string,Gene_ID:string,Feature_Type:string,Feature_ID:string,Transcript_BioType:string,Rank:struct,HGVS_c:string,HGVS_p:string,cDNA_pos/cDNA_length:struct,CDS_pos/CDS_length:struct,AA_pos/AA_length:struct,Distance:int,ERRORS/WARNINGS/INFO:string>>
> INFO_BIAS               string
> INFO_MQ                 double
> INFO_HIAF               double
> INFO_END                int
> INFO_SPLITREAD          int
> INFO_GDAMP              int
> INFO_LSEQ               string
> INFO_LOF                array
> INFO_SAMPLE             string
> INFO_AMPFLAG            int
> INFO_SN                 double
> INFO_SVTYPE             string
> INFO_TYPE               string
> INFO_MSILEN             double
> INFO_DUPRATE            double
> INFO_DBSNP_COMMON       int
> INFO_REFBIAS            string
> genotypes               
> array,ALD:array,AF:array,phased:boolean,calls:array,VD:int,depth:int,RD:array>>
>  {code}
> You can see that column INFO_ANN is an array of struct and it contains column 
> which has "/" inside such as "cDNA_pos/cDNA_length", etc. 
> We believe that it is the root cause that cause the following SparkException:
> {code:java}
> scala> val schema = spark.sql("CREATE OR REPLACE VIEW yuting AS SELECT 
> INFO_ANN FROM table_2611810")
> 24/01/31 07:50:02.658 [main] WARN  o.a.spark.sql.catalyst.util.package - 
> Truncated the string representation of a plan since it was too large. This 
> behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'.
> org.apache.spark.SparkException: Cannot recognize hive type string: 
> array,Annotation_Impact:string,Gene_Name:string,Gene_ID:string,Feature_Type:string,Feature_ID:string,Transcript_BioType:string,Rank:struct,HGVS_c:string,HGVS_p:string,cDNA_pos/cDNA_length:struct,CDS_pos/CDS_length:struct,AA_pos/AA_length:struct,Distance:int,ERRORS/WARNINGS/INFO:string>>,
>  column: INFO_ANN
>   at 
> org.apache.spark.sql.errors.QueryExecutionErrors$.cannotRecognizeHiveTypeError(QueryExecutionErrors.scala:1455)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.getSparkSQLDataType(HiveClientImpl.scala:1022)
>   at 
> org.apache.spark.sql.hive.client.HiveClientImpl$.$anonfun$verifyColumnDataType$1(HiveClientImpl.scala:1037)
>   at scala.collection.Iterator.foreach(Iterator.scala:943)
>   at scala.collection.Iterator.foreach$(Iterator.scala:943)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1431)
>   at scala.collection.IterableLike.foreach(IterableLike.scala:74)
>   at

[jira] [Assigned] (SPARK-47052) Separate state tracking variables from MicroBatchExecution/StreamExecution

2024-02-20 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim reassigned SPARK-47052:


Assignee: Boyang Jerry Peng

> Separate state tracking variables from MicroBatchExecution/StreamExecution
> --
>
> Key: SPARK-47052
> URL: https://issues.apache.org/jira/browse/SPARK-47052
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
>  Labels: pull-request-available
>
> To improve code clarity and maintainability, I propose that we move all the 
> variables that track mutable state and metrics for streaming query into a 
> separate class.  With this refactor, it would be easy to track and find all 
> the mutable state a microbatch can have.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47052) Separate state tracking variables from MicroBatchExecution/StreamExecution

2024-02-20 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-47052.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45109
[https://github.com/apache/spark/pull/45109]

> Separate state tracking variables from MicroBatchExecution/StreamExecution
> --
>
> Key: SPARK-47052
> URL: https://issues.apache.org/jira/browse/SPARK-47052
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Boyang Jerry Peng
>Assignee: Boyang Jerry Peng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To improve code clarity and maintainability, I propose that we move all the 
> variables that track mutable state and metrics for streaming query into a 
> separate class.  With this refactor, it would be easy to track and find all 
> the mutable state a microbatch can have.  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47111) Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47111.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45191
[https://github.com/apache/spark/pull/45191]

> Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2
> ---
>
> Key: SPARK-47111
> URL: https://issues.apache.org/jira/browse/SPARK-47111
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47112) Write logs into a file in SparkR Windows build

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47112:
---
Labels: pull-request-available  (was: )

> Write logs into a file in SparkR Windows build
> --
>
> Key: SPARK-47112
> URL: https://issues.apache.org/jira/browse/SPARK-47112
> Project: Spark
>  Issue Type: Test
>  Components: Project Infra, SparkR
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/7977185456/job/21779508822
> This write too many logs, and difficult to see the real test cases. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47112) Write logs into a file in SparkR Windows build

2024-02-20 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-47112:


 Summary: Write logs into a file in SparkR Windows build
 Key: SPARK-47112
 URL: https://issues.apache.org/jira/browse/SPARK-47112
 Project: Spark
  Issue Type: Test
  Components: Project Infra, SparkR
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/actions/runs/7977185456/job/21779508822
This write too many logs, and difficult to see the real test cases. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47110) Reenble AmmoniteTest tests in Maven builds

2024-02-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47110:
-
Description: 
Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues.

See also https://github.com/apache/spark/pull/43909 and 
https://github.com/apache/spark/pull/40675

  was:
Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues.

See also https://github.com/apache/spark/pull/43909 and 
https://github.com/apache/spark/pull/45186


> Reenble AmmoniteTest tests in Maven builds
> --
>
> Key: SPARK-47110
> URL: https://issues.apache.org/jira/browse/SPARK-47110
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues.
> See also https://github.com/apache/spark/pull/43909 and 
> https://github.com/apache/spark/pull/40675



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47110) Reenble AmmoniteTest tests in Maven builds

2024-02-20 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47110:
-
Summary: Reenble AmmoniteTest tests in Maven builds  (was: Eanble 
AmmoniteTest tests in Maven builds)

> Reenble AmmoniteTest tests in Maven builds
> --
>
> Key: SPARK-47110
> URL: https://issues.apache.org/jira/browse/SPARK-47110
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues.
> See also https://github.com/apache/spark/pull/43909 and 
> https://github.com/apache/spark/pull/45186



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47110) Eanble AmmoniteTest tests in Maven builds

2024-02-20 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-47110:


 Summary: Eanble AmmoniteTest tests in Maven builds
 Key: SPARK-47110
 URL: https://issues.apache.org/jira/browse/SPARK-47110
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues.

See also https://github.com/apache/spark/pull/43909 and 
https://github.com/apache/spark/pull/45186



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47109) Upgrade `commons-compress` to 1.26.0

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47109.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45189
[https://github.com/apache/spark/pull/45189]

> Upgrade `commons-compress` to 1.26.0
> 
>
> Key: SPARK-47109
> URL: https://issues.apache.org/jira/browse/SPARK-47109
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47109) Upgrade `commons-compress` to 1.26.0

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47109:
-

Assignee: Dongjoon Hyun

> Upgrade `commons-compress` to 1.26.0
> 
>
> Key: SPARK-47109
> URL: https://issues.apache.org/jira/browse/SPARK-47109
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47101:
--
Affects Version/s: (was: 3.5.0)
   (was: 3.4.2)

> HiveExternalCatalog.verifyDataSchema does not fully comply with hive column 
> name rules
> --
>
> Key: SPARK-47101
> URL: https://issues.apache.org/jira/browse/SPARK-47101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules

2024-02-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47101:
-
Issue Type: Improvement  (was: Test)

> HiveExternalCatalog.verifyDataSchema does not fully comply with hive column 
> name rules
> --
>
> Key: SPARK-47101
> URL: https://issues.apache.org/jira/browse/SPARK-47101
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47109) Upgrade `commons-compress` to 1.26.0

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47109:
---
Labels: pull-request-available  (was: )

> Upgrade `commons-compress` to 1.26.0
> 
>
> Key: SPARK-47109
> URL: https://issues.apache.org/jira/browse/SPARK-47109
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47109) Upgrade `commons-compress` to 1.26.0

2024-02-20 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47109:
-

 Summary: Upgrade `commons-compress` to 1.26.0
 Key: SPARK-47109
 URL: https://issues.apache.org/jira/browse/SPARK-47109
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44814) Test to trigger protobuf 4.23.3 crash

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44814:
---
Labels: pull-request-available  (was: )

> Test to trigger protobuf 4.23.3 crash
> -
>
> Key: SPARK-44814
> URL: https://issues.apache.org/jira/browse/SPARK-44814
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Martin Grund
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46906) Add a check for stateful operator change for streaming

2024-02-20 Thread Jungtaek Lim (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-46906.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44927
[https://github.com/apache/spark/pull/44927]

> Add a check for stateful operator change for streaming
> --
>
> Key: SPARK-46906
> URL: https://issues.apache.org/jira/browse/SPARK-46906
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jing Zhan
>Assignee: Jing Zhan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently user will get a misleading error as 
> org.apache.spark.sql.execution.streaming.state.StateSchemaNotCompatible if 
> restarting query in the same checkpoint location and changing their stateful 
> operator. We need to catches such errors and throws a new error with 
> informative message.
> After physical planning, before execution phase, we will read from state 
> metadata with the current operator id to fetch operator name of committed 
> batch with the same operator id. If operator name does not match, throws the 
> error.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47108.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45185
[https://github.com/apache/spark/pull/45185]

> Set `derby.connection.requireAuthentication` to false explicitly in CLIs
> 
>
> Key: SPARK-47108
> URL: https://issues.apache.org/jira/browse/SPARK-47108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47108:
-

Assignee: Dongjoon Hyun

> Set `derby.connection.requireAuthentication` to false explicitly in CLIs
> 
>
> Key: SPARK-47108
> URL: https://issues.apache.org/jira/browse/SPARK-47108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47108:
---
Labels: pull-request-available  (was: )

> Set `derby.connection.requireAuthentication` to false explicitly in CLIs
> 
>
> Key: SPARK-47108
> URL: https://issues.apache.org/jira/browse/SPARK-47108
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs

2024-02-20 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47108:
-

 Summary: Set `derby.connection.requireAuthentication` to false 
explicitly in CLIs
 Key: SPARK-47108
 URL: https://issues.apache.org/jira/browse/SPARK-47108
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42328:
---
Labels: pull-request-available  (was: )

> Assign name to _LEGACY_ERROR_TEMP_1175
> --
>
> Key: SPARK-42328
> URL: https://issues.apache.org/jira/browse/SPARK-42328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46257:
--
Description: 
[https://db.apache.org/derby/releases/release-10_16_1_1.cgi]

1. Drop Java Security Manager.
{quote}Derby no longer supports the Java SecurityManager. This is because the 
Open JDK team deprecated the SecurityManager and marked it for removal.
{quote}
2. Compile on Java 17
{quote}Compile 10.16 into Java 17 byte code
{quote}

  was:
[https://db.apache.org/derby/releases/release-10_16_1_1.cgi]
{quote}Derby no longer supports the Java SecurityManager. This is because the 
Open JDK team deprecated the SecurityManager and marked it for removal.
{quote}


> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [https://db.apache.org/derby/releases/release-10_16_1_1.cgi]
> 1. Drop Java Security Manager.
> {quote}Derby no longer supports the Java SecurityManager. This is because the 
> Open JDK team deprecated the SecurityManager and marked it for removal.
> {quote}
> 2. Compile on Java 17
> {quote}Compile 10.16 into Java 17 byte code
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-46257:
--
Description: 
[https://db.apache.org/derby/releases/release-10_16_1_1.cgi]
{quote}Derby no longer supports the Java SecurityManager. This is because the 
Open JDK team deprecated the SecurityManager and marked it for removal.
{quote}

  was:https://db.apache.org/derby/releases/release-10_16_1_1.cgi


> Upgrade Derby to 10.16.1.1
> --
>
> Key: SPARK-46257
> URL: https://issues.apache.org/jira/browse/SPARK-46257
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [https://db.apache.org/derby/releases/release-10_16_1_1.cgi]
> {quote}Derby no longer supports the Java SecurityManager. This is because the 
> Open JDK team deprecated the SecurityManager and marked it for removal.
> {quote}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47085) Preformance issue on thrift API

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47085:
--
Fix Version/s: 3.5.2

> Preformance issue on thrift API
> ---
>
> Key: SPARK-47085
> URL: https://issues.apache.org/jira/browse/SPARK-47085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Izek Greenfield
>Assignee: Izek Greenfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> This new complexity was introduced in SPARK-39041.
> before this PR the code was:
> {code:java}
> while (curRow < maxRows && iter.hasNext) {
>   val sparkRow = iter.next()
>   val row = ArrayBuffer[Any]()
>   var curCol = 0
>   while (curCol < sparkRow.length) {
> if (sparkRow.isNullAt(curCol)) {
>   row += null
> } else {
>   addNonNullColumnValue(sparkRow, row, curCol, timeFormatters)
> }
> curCol += 1
>   }
>   resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]])
>   curRow += 1
> }{code}
>  foreach without the _*O(n^2)*_ complexity so this change just return the 
> state to what it was before.
>  
> In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
> {code:scala}
> ...
>  while (i < rowSize) {
>   val row = rows(I)
>   ...
> {code}
> It can be easily converted back into _*O( n )*_ complexity.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47085) Preformance issue on thrift API

2024-02-20 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818989#comment-17818989
 ] 

Dongjoon Hyun commented on SPARK-47085:
---

Thank you. I added SPARK-39041 as a link `is caused by`.

> Preformance issue on thrift API
> ---
>
> Key: SPARK-47085
> URL: https://issues.apache.org/jira/browse/SPARK-47085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Izek Greenfield
>Assignee: Izek Greenfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This new complexity was introduced in SPARK-39041.
> before this PR the code was:
> {code:java}
> while (curRow < maxRows && iter.hasNext) {
>   val sparkRow = iter.next()
>   val row = ArrayBuffer[Any]()
>   var curCol = 0
>   while (curCol < sparkRow.length) {
> if (sparkRow.isNullAt(curCol)) {
>   row += null
> } else {
>   addNonNullColumnValue(sparkRow, row, curCol, timeFormatters)
> }
> curCol += 1
>   }
>   resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]])
>   curRow += 1
> }{code}
>  foreach without the _*O(n^2)*_ complexity so this change just return the 
> state to what it was before.
>  
> In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
> {code:scala}
> ...
>  while (i < rowSize) {
>   val row = rows(I)
>   ...
> {code}
> It can be easily converted back into _*O( n )*_ complexity.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47105) Spark Container doesn't have spark group or spark user created

2024-02-20 Thread Albert Wong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818974#comment-17818974
 ] 

Albert Wong commented on SPARK-47105:
-

Related: https://issues.apache.org/jira/browse/SPARK-45557

> Spark Container doesn't have spark group or spark user created
> --
>
> Key: SPARK-47105
> URL: https://issues.apache.org/jira/browse/SPARK-47105
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Docker
>Affects Versions: 3.4.1
> Environment: Using container apache/spark-py:latest
>Reporter: Albert Wong
>Priority: Critical
>
> I see that 
> [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19]
>  is supposed to have a spark user and spark group created but checking the 
> container, it doesn't have those uid and gid created.  Both should have 185 
> uid and 185 gid.
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group
> root:x:0:
> daemon:x:1:
> bin:x:2:
> sys:x:3:
> adm:x:4:
> tty:x:5:
> disk:x:6:
> lp:x:7:
> mail:x:8:
> news:x:9:
> uucp:x:10:
> man:x:12:
> proxy:x:13:
> kmem:x:15:
> dialout:x:20:
> fax:x:21:
> voice:x:22:
> cdrom:x:24:
> floppy:x:25:
> tape:x:26:
> sudo:x:27:
> audio:x:29:
> dip:x:30:
> www-data:x:33:
> backup:x:34:
> operator:x:37:
> list:x:38:
> irc:x:39:
> src:x:40:
> gnats:x:41:
> shadow:x:42:
> utmp:x:43:
> video:x:44:
> sasl:x:45:
> plugdev:x:46:
> staff:x:50:
> games:x:60:
> users:x:100:
> nogroup:x:65534:
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
> root:x:0:0:root:/root:/bin/bash
> daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
> bin:x:2:2:bin:/bin:/usr/sbin/nologin
> sys:x:3:3:sys:/dev:/usr/sbin/nologin
> sync:x:4:65534:sync:/bin:/bin/sync
> games:x:5:60:games:/usr/games:/usr/sbin/nologin
> man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
> lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
> mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
> news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
> uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
> proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
> www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
> backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
> list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
> irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
> gnats:x:41:41:Gnats Bug-Reporting System 
> (admin):/var/lib/gnats:/usr/sbin/nologin
> nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
> _apt:x:100:65534::/nonexistent:/usr/sbin/nologin
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
> root:x:0:0:root:/root:/bin/bash
> daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
> bin:x:2:2:bin:/bin:/usr/sbin/nologin
> sys:x:3:3:sys:/dev:/usr/sbin/nologin
> sync:x:4:65534:sync:/bin:/bin/sync
> games:x:5:60:games:/usr/games:/usr/sbin/nologin
> man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
> lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
> mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
> news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
> uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
> proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
> www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
> backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
> list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
> irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
> gnats:x:41:41:Gnats Bug-Reporting System 
> (admin):/var/lib/gnats:/usr/sbin/nologin
> nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
> _apt:x:100:65534::/nonexistent:/usr/sbin/nologin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47105) Spark Container doesn't have spark group or spark user created

2024-02-20 Thread Albert Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Wong updated SPARK-47105:

Shepherd: Hyukjin Kwon

> Spark Container doesn't have spark group or spark user created
> --
>
> Key: SPARK-47105
> URL: https://issues.apache.org/jira/browse/SPARK-47105
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Docker
>Affects Versions: 3.4.1
> Environment: Using container apache/spark-py:latest
>Reporter: Albert Wong
>Priority: Critical
>
> I see that 
> [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19]
>  is supposed to have a spark user and spark group created but checking the 
> container, it doesn't have those uid and gid created.  Both should have 185 
> uid and 185 gid.
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group
> root:x:0:
> daemon:x:1:
> bin:x:2:
> sys:x:3:
> adm:x:4:
> tty:x:5:
> disk:x:6:
> lp:x:7:
> mail:x:8:
> news:x:9:
> uucp:x:10:
> man:x:12:
> proxy:x:13:
> kmem:x:15:
> dialout:x:20:
> fax:x:21:
> voice:x:22:
> cdrom:x:24:
> floppy:x:25:
> tape:x:26:
> sudo:x:27:
> audio:x:29:
> dip:x:30:
> www-data:x:33:
> backup:x:34:
> operator:x:37:
> list:x:38:
> irc:x:39:
> src:x:40:
> gnats:x:41:
> shadow:x:42:
> utmp:x:43:
> video:x:44:
> sasl:x:45:
> plugdev:x:46:
> staff:x:50:
> games:x:60:
> users:x:100:
> nogroup:x:65534:
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
> root:x:0:0:root:/root:/bin/bash
> daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
> bin:x:2:2:bin:/bin:/usr/sbin/nologin
> sys:x:3:3:sys:/dev:/usr/sbin/nologin
> sync:x:4:65534:sync:/bin:/bin/sync
> games:x:5:60:games:/usr/games:/usr/sbin/nologin
> man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
> lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
> mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
> news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
> uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
> proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
> www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
> backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
> list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
> irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
> gnats:x:41:41:Gnats Bug-Reporting System 
> (admin):/var/lib/gnats:/usr/sbin/nologin
> nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
> _apt:x:100:65534::/nonexistent:/usr/sbin/nologin
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
> root:x:0:0:root:/root:/bin/bash
> daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
> bin:x:2:2:bin:/bin:/usr/sbin/nologin
> sys:x:3:3:sys:/dev:/usr/sbin/nologin
> sync:x:4:65534:sync:/bin:/bin/sync
> games:x:5:60:games:/usr/games:/usr/sbin/nologin
> man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
> lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
> mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
> news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
> uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
> proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
> www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
> backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
> list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
> irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
> gnats:x:41:41:Gnats Bug-Reporting System 
> (admin):/var/lib/gnats:/usr/sbin/nologin
> nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
> _apt:x:100:65534::/nonexistent:/usr/sbin/nologin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47107) Implement partition reader for python streaming data source

2024-02-20 Thread Chaoqin Li (Jira)

Chaoqin Li created SPARK-47107:
--

 Summary: Implement partition reader for python streaming data 
source
 Key: SPARK-47107
 URL: https://issues.apache.org/jira/browse/SPARK-47107
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SS
Affects Versions: 4.0.0
Reporter: Chaoqin Li


Piggy back the PythonPartitionReaderFactory to implement reading a data 
partition for python streaming data source. Add test case to verify that python 
streaming data source can read and process data end to end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47106) Plan canonicalization test serializes/deserializes class that is not serializable

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47106:
--
Affects Version/s: 4.0.0
   (was: 3.4.0)
   (was: 3.4.1)

> Plan canonicalization test serializes/deserializes class that is not 
> serializable
> -
>
> Key: SPARK-47106
> URL: https://issues.apache.org/jira/browse/SPARK-47106
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Parth Chandra
>Priority: Minor
>
> The test
> {code:java}
> test("SPARK-23731 plans should be canonicalizable after being 
> (de)serialized"){code}
> serializes and deserializes 
> {code:java}
> FileSourceScanExec{code}
> which is not actually serializable. In particular, 
> FileSourceScanExec.relation is not serializable.
> The test still passes though.
> The test below derived from the above shows the issue - 
> {code:java}
> test("verify FileSourceScanExec (de)serialize") {
>   withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
> withTempPath { path =>
>   spark.range(1).write.parquet(path.getAbsolutePath)
>   val df = spark.read.parquet(path.getAbsolutePath)
>   val fileSourceScanExec =
> df.queryExecution.sparkPlan.collectFirst { case p:  
> FileSourceScanExec => p }.get
>   val serializer = SparkEnv.get.serializer.newInstance()
>   val relation = serializer.serialize(fileSourceScanExec.relation)
>   assert(relation != null)
>   val deserialized =
>   
> serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec))
>   assert(deserialized.relation != null)
> }
>   }
> }{code}
>  
> The test fails with -
> {code:java}
> (file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
>     - field (class: 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
> type: interface org.apache.spark.sql.execution.datasources.FileIndex)
>     - object (class 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
> java.io.NotSerializableException: 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex
> Serialization stack:
>     - object not serializable (class: 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
>     - field (class: 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
> type: interface org.apache.spark.sql.execution.datasources.FileIndex)
>     - object (class 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
>     at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>     at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
>     at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
>     at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32)
>     at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266)
>     at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47106) Plan canonicalization test serializes/deserializes class that is not serializable

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47106:
--
Issue Type: Improvement  (was: Test)

> Plan canonicalization test serializes/deserializes class that is not 
> serializable
> -
>
> Key: SPARK-47106
> URL: https://issues.apache.org/jira/browse/SPARK-47106
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Parth Chandra
>Priority: Minor
>
> The test
> {code:java}
> test("SPARK-23731 plans should be canonicalizable after being 
> (de)serialized"){code}
> serializes and deserializes 
> {code:java}
> FileSourceScanExec{code}
> which is not actually serializable. In particular, 
> FileSourceScanExec.relation is not serializable.
> The test still passes though.
> The test below derived from the above shows the issue - 
> {code:java}
> test("verify FileSourceScanExec (de)serialize") {
>   withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
> withTempPath { path =>
>   spark.range(1).write.parquet(path.getAbsolutePath)
>   val df = spark.read.parquet(path.getAbsolutePath)
>   val fileSourceScanExec =
> df.queryExecution.sparkPlan.collectFirst { case p:  
> FileSourceScanExec => p }.get
>   val serializer = SparkEnv.get.serializer.newInstance()
>   val relation = serializer.serialize(fileSourceScanExec.relation)
>   assert(relation != null)
>   val deserialized =
>   
> serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec))
>   assert(deserialized.relation != null)
> }
>   }
> }{code}
>  
> The test fails with -
> {code:java}
> (file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
>     - field (class: 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
> type: interface org.apache.spark.sql.execution.datasources.FileIndex)
>     - object (class 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
> java.io.NotSerializableException: 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex
> Serialization stack:
>     - object not serializable (class: 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: 
> org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
>     - field (class: 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
> type: interface org.apache.spark.sql.execution.datasources.FileIndex)
>     - object (class 
> org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
>     at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>     at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
>     at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
>     at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
>     at 
> org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32)
>     at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266)
>     at 
> org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32)
>     at 
> org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45557) Spark Connect can not be started because of missing user home dir in Docker container

2024-02-20 Thread Albert Wong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818942#comment-17818942
 ] 

Albert Wong commented on SPARK-45557:
-

Related https://issues.apache.org/jira/browse/SPARK-47105

> Spark Connect can not be started because of missing user home dir in Docker 
> container
> -
>
> Key: SPARK-45557
> URL: https://issues.apache.org/jira/browse/SPARK-45557
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Docker
>Affects Versions: 3.4.0, 3.4.1, 3.5.0
>Reporter: Niels Pardon
>Priority: Minor
>
> I was trying to start Spark Connect within a container using the Spark Docker 
> container images and ran into an issue where Ivy could not pull the Spark 
> Connect JAR since the user home /home/spark does not exist.
> Steps to reproduce:
> 1. Start the Spark container with `/bin/bash` as the command:
> {code:java}
> docker run -it --rm apache/spark:3.5.0 /bin/bash {code}
> 2. Try to start Spark Connect within the container:
>  
> {code:java}
> /opt/spark/sbin/start-connect-server.sh --packages 
> org.apache.spark:spark-connect_2.12:3.5.0 {code}
> which lead to this output:
>  
>  
> {code:java}
> starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to 
> /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
> failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class 
> org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect 
> server --packages org.apache.spark:spark-connect_2.12:3.5.0
>   at 
> org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535)
>   at 
> org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185)
>   at 
> org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334)
>   at 
> org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964)
>   at 
> org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194)
>   at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217)
>   at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91)
>   at 
> org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120)
>   at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129)
>   at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
> full log in 
> /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out
>  {code}
> where then the full log file looks like this:
> {code:java}
> Spark Command: /opt/java/openjdk/bin/java -cp 
> /opt/spark/conf:/opt/spark/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions 
> --add-opens=java.base/java.lang=ALL-UNNAMED 
> --add-opens=java.base/java.lang.invoke=ALL-UNNAMED 
> --add-opens=java.base/java.lang.reflect=ALL-UNNAMED 
> --add-opens=java.base/java.io=ALL-UNNAMED 
> --add-opens=java.base/java.net=ALL-UNNAMED 
> --add-opens=java.base/java.nio=ALL-UNNAMED 
> --add-opens=java.base/java.util=ALL-UNNAMED 
> --add-opens=java.base/java.util.concurrent=ALL-UNNAMED 
> --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED 
> --add-opens=java.base/sun.nio.ch=ALL-UNNAMED 
> --add-opens=java.base/sun.nio.cs=ALL-UNNAMED 
> --add-opens=java.base/sun.security.action=ALL-UNNAMED 
> --add-opens=java.base/sun.util.calendar=ALL-UNNAMED 
> --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED 
> -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit 
> --class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark 
> Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 
> spark-internal
> 
> :: loading settings :: url = 
> jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml
> Ivy Default Cache set to: /home/spark/.ivy2/cache
> The jars for the packages stored in: /home/spark/.ivy2/jars
> org.apache.spark#spark-connect_2.12 added as a dependency
> :: resolving dependencies :: 
> org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0
>   confs: [default]
> Exception in thread "main" java.io.FileNotFoundException: 
> /home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml
>  (No such file or directory)
>   at java.base/java.io.FileOutputStream.open0(Native Method)
>   at java.base/java.io.FileOutputStream.open(Unknown Source)
>   at java.base/java.io.FileOutputStream.(Unknown Source)
>   at java.base/java.io.FileOutputStream.(Unknown Source)
>   at 
>

[jira] [Updated] (SPARK-47105) Spark Container doesn't have spark group or spark user created

2024-02-20 Thread Albert Wong (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Albert Wong updated SPARK-47105:

Component/s: Spark Docker

> Spark Container doesn't have spark group or spark user created
> --
>
> Key: SPARK-47105
> URL: https://issues.apache.org/jira/browse/SPARK-47105
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Docker
>Affects Versions: 3.4.1
> Environment: Using container apache/spark-py:latest
>Reporter: Albert Wong
>Priority: Critical
>
> I see that 
> [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19]
>  is supposed to have a spark user and spark group created but checking the 
> container, it doesn't have those uid and gid created.  Both should have 185 
> uid and 185 gid.
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group
> root:x:0:
> daemon:x:1:
> bin:x:2:
> sys:x:3:
> adm:x:4:
> tty:x:5:
> disk:x:6:
> lp:x:7:
> mail:x:8:
> news:x:9:
> uucp:x:10:
> man:x:12:
> proxy:x:13:
> kmem:x:15:
> dialout:x:20:
> fax:x:21:
> voice:x:22:
> cdrom:x:24:
> floppy:x:25:
> tape:x:26:
> sudo:x:27:
> audio:x:29:
> dip:x:30:
> www-data:x:33:
> backup:x:34:
> operator:x:37:
> list:x:38:
> irc:x:39:
> src:x:40:
> gnats:x:41:
> shadow:x:42:
> utmp:x:43:
> video:x:44:
> sasl:x:45:
> plugdev:x:46:
> staff:x:50:
> games:x:60:
> users:x:100:
> nogroup:x:65534:
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
> root:x:0:0:root:/root:/bin/bash
> daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
> bin:x:2:2:bin:/bin:/usr/sbin/nologin
> sys:x:3:3:sys:/dev:/usr/sbin/nologin
> sync:x:4:65534:sync:/bin:/bin/sync
> games:x:5:60:games:/usr/games:/usr/sbin/nologin
> man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
> lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
> mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
> news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
> uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
> proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
> www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
> backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
> list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
> irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
> gnats:x:41:41:Gnats Bug-Reporting System 
> (admin):/var/lib/gnats:/usr/sbin/nologin
> nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
> _apt:x:100:65534::/nonexistent:/usr/sbin/nologin
> I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
> root:x:0:0:root:/root:/bin/bash
> daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
> bin:x:2:2:bin:/bin:/usr/sbin/nologin
> sys:x:3:3:sys:/dev:/usr/sbin/nologin
> sync:x:4:65534:sync:/bin:/bin/sync
> games:x:5:60:games:/usr/games:/usr/sbin/nologin
> man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
> lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
> mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
> news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
> uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
> proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
> www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
> backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
> list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
> irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
> gnats:x:41:41:Gnats Bug-Reporting System 
> (admin):/var/lib/gnats:/usr/sbin/nologin
> nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
> _apt:x:100:65534::/nonexistent:/usr/sbin/nologin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-40513) SPIP: Support Docker Official Image for Spark

2024-02-20 Thread Albert Wong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818935#comment-17818935
 ] 

Albert Wong commented on SPARK-40513:
-

Related issue.  https://issues.apache.org/jira/browse/SPARK-47105

> SPIP: Support Docker Official Image for Spark
> -
>
> Key: SPARK-40513
> URL: https://issues.apache.org/jira/browse/SPARK-40513
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, Spark Docker
>Affects Versions: 3.5.0
>Reporter: Yikun Jiang
>Assignee: Yikun Jiang
>Priority: Major
>  Labels: SPIP, pull-request-available
> Fix For: 3.5.0
>
>
> This SPIP is proposed to add [Docker Official 
> Image(DOI)|https://github.com/docker-library/official-images] to ensure the 
> Spark Docker images meet the quality standards for Docker images, to provide 
> these Docker images for users who want to use Apache Spark via Docker image.
> There are also several [Apache projects that release the Docker Official 
> Images|https://hub.docker.com/search?q=apache_filter=official], such 
> as: [flink|https://hub.docker.com/_/flink], 
> [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], 
> [zookeeper|https://hub.docker.com/_/zookeeper], 
> [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). 
> From the huge download statistics, we can see the real demands of users, and 
> from the support of other apache projects, we should also be able to do it.
> After support:
>  * The Dockerfile will still be maintained by the Apache Spark community and 
> reviewed by Docker.
>  * The images will be maintained by the Docker community to ensure the 
> quality standards for Docker images of the Docker community.
> It will also reduce the extra docker images maintenance effort (such as 
> frequently rebuilding, image security update) of the Apache Spark community.
>  
> SPIP DOC: 
> [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o]
> DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47104) Spark SQL query fails with NullPointerException

2024-02-20 Thread Bruce Robbins (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818934#comment-17818934
 ] 

Bruce Robbins commented on SPARK-47104:
---

It's not a CSV specific issue. You can reproduce with a cached view. The 
following fails on the master branch, when using {{spark-sql}}:
{noformat}
create or replace temp view v1(id, name) as values
(1, "fred"),
(2, "bob");

cache table v1;

select name, uuid() as _iid from (
  select s.name
  from v1 s
  join v1 t
  on s.name = t.name
  order by name
)
limit 20;
{noformat}
The exception is:
{noformat}
java.lang.NullPointerException: Cannot invoke 
"org.apache.spark.sql.catalyst.util.RandomUUIDGenerator.getNextUUIDUTF8String()"
 because "this.randomGen_0" is null
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$6(limit.scala:297)
at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$1(limit.scala:297)
at 
org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246)
at 
org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
at 
org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:286)
at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:390)
at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:418)
at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:390)
{noformat}
It seems that non-deterministic expressions are not getting initialized before 
being used in the unsafe projection. I can take a look.

> Spark SQL query fails with NullPointerException
> ---
>
> Key: SPARK-47104
> URL: https://issues.apache.org/jira/browse/SPARK-47104
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: Chhavi Bansal
>Priority: Major
>
> I am trying to run a very simple SQL query involving join and orderby clause 
> and then using UUID() function in the outermost select stmt. The query fails
> {code:java}
> val df = spark.read.format("csv").option("header", 
> "true").load("src/main/resources/titanic.csv")
> df.createOrReplaceTempView("titanic")
> val query = spark.sql(" select name, uuid() as _iid from (select s.name from 
> titanic s join titanic t on s.name = t.name order by name) ;") 
> query.show() // FAILS{code}
> Dataset is a normal csv file with the following columns
> {code:java}
> PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
>  {code}
> Below is the error
> {code:java}
> Exception in thread "main" java.lang.NullPointerException
> at 
> org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
>  Source)
> at 
> org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$2(limit.scala:207)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
> at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
> at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
> at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
> at scala.collection.TraversableLike.map(TraversableLike.scala:237)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
> at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
> at 
> org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:207)
> at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:338)
> at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:366)
> at 
> org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:338)
> at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715)
> at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728)
> at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706)
> at 
> org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
> at 
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
> at 
>

[jira] [Updated] (SPARK-47085) Preformance issue on thrift API

2024-02-20 Thread Izek Greenfield (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Izek Greenfield updated SPARK-47085:

Description: 
This new complexity was introduced in SPARK-39041.

before this PR the code was:
{code:java}
while (curRow < maxRows && iter.hasNext) {
  val sparkRow = iter.next()
  val row = ArrayBuffer[Any]()
  var curCol = 0
  while (curCol < sparkRow.length) {
if (sparkRow.isNullAt(curCol)) {
  row += null
} else {
  addNonNullColumnValue(sparkRow, row, curCol, timeFormatters)
}
curCol += 1
  }
  resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]])
  curRow += 1
}{code}
 foreach without the _*O(n^2)*_ complexity so this change just return the state 
to what it was before.

 

In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
{code:scala}
...
 while (i < rowSize) {
  val row = rows(I)
  ...
{code}
It can be easily converted back into _*O( n )*_ complexity.

 

 

  was:
This new complexity was introduced in SPARK-39041.

before this PR the code was:


{code:java}
def toTTableSchema(schema: StructType): TTableSchema = {
  val tTableSchema = new TTableSchema()
  schema.zipWithIndex.foreach { case (f, i) =>
tTableSchema.addToColumns(toTColumnDesc(f, i))
  }
  tTableSchema
} {code}
 foreach without the _*O(n^2)*_ complexity so this change just return the state 
to what it was before.

 

In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
{code:scala}
...
 while (i < rowSize) {
  val row = rows(I)
  ...
{code}
It can be easily converted back into _*O( n )*_ complexity.

 

 


> Preformance issue on thrift API
> ---
>
> Key: SPARK-47085
> URL: https://issues.apache.org/jira/browse/SPARK-47085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Izek Greenfield
>Assignee: Izek Greenfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This new complexity was introduced in SPARK-39041.
> before this PR the code was:
> {code:java}
> while (curRow < maxRows && iter.hasNext) {
>   val sparkRow = iter.next()
>   val row = ArrayBuffer[Any]()
>   var curCol = 0
>   while (curCol < sparkRow.length) {
> if (sparkRow.isNullAt(curCol)) {
>   row += null
> } else {
>   addNonNullColumnValue(sparkRow, row, curCol, timeFormatters)
> }
> curCol += 1
>   }
>   resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]])
>   curRow += 1
> }{code}
>  foreach without the _*O(n^2)*_ complexity so this change just return the 
> state to what it was before.
>  
> In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
> {code:scala}
> ...
>  while (i < rowSize) {
>   val row = rows(I)
>   ...
> {code}
> It can be easily converted back into _*O( n )*_ complexity.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47085) Preformance issue on thrift API

2024-02-20 Thread Izek Greenfield (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818932#comment-17818932
 ] 

Izek Greenfield commented on SPARK-47085:
-

[~dongjoon] I updated the details

> Preformance issue on thrift API
> ---
>
> Key: SPARK-47085
> URL: https://issues.apache.org/jira/browse/SPARK-47085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Izek Greenfield
>Assignee: Izek Greenfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This new complexity was introduced in SPARK-39041.
> before this PR the code was:
> {code:java}
> def toTTableSchema(schema: StructType): TTableSchema = {
>   val tTableSchema = new TTableSchema()
>   schema.zipWithIndex.foreach { case (f, i) =>
> tTableSchema.addToColumns(toTColumnDesc(f, i))
>   }
>   tTableSchema
> } {code}
>  foreach without the _*O(n^2)*_ complexity so this change just return the 
> state to what it was before.
>  
> In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
> {code:scala}
> ...
>  while (i < rowSize) {
>   val row = rows(I)
>   ...
> {code}
> It can be easily converted back into _*O( n )*_ complexity.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47085) Preformance issue on thrift API

2024-02-20 Thread Izek Greenfield (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Izek Greenfield updated SPARK-47085:

Description: 
This new complexity was introduced in SPARK-39041.

before this PR the code was:


{code:java}
def toTTableSchema(schema: StructType): TTableSchema = {
  val tTableSchema = new TTableSchema()
  schema.zipWithIndex.foreach { case (f, i) =>
tTableSchema.addToColumns(toTColumnDesc(f, i))
  }
  tTableSchema
} {code}
 foreach without the _*O(n^2)*_ complexity so this change just return the state 
to what it was before.

 

In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
{code:scala}
...
 while (i < rowSize) {
  val row = rows(I)
  ...
{code}
It can be easily converted back into _*O( n )*_ complexity.

 

 

  was:
This new complexity was introduced in SPARK-39041.

In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
{code:scala}
...
 while (i < rowSize) {
  val row = rows(I)
  ...
{code}
It can be easily converted back into _*O( n )*_ complexity.

 

 


> Preformance issue on thrift API
> ---
>
> Key: SPARK-47085
> URL: https://issues.apache.org/jira/browse/SPARK-47085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Izek Greenfield
>Assignee: Izek Greenfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This new complexity was introduced in SPARK-39041.
> before this PR the code was:
> {code:java}
> def toTTableSchema(schema: StructType): TTableSchema = {
>   val tTableSchema = new TTableSchema()
>   schema.zipWithIndex.foreach { case (f, i) =>
> tTableSchema.addToColumns(toTColumnDesc(f, i))
>   }
>   tTableSchema
> } {code}
>  foreach without the _*O(n^2)*_ complexity so this change just return the 
> state to what it was before.
>  
> In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
> {code:scala}
> ...
>  while (i < rowSize) {
>   val row = rows(I)
>   ...
> {code}
> It can be easily converted back into _*O( n )*_ complexity.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47106) Plan canonicalization test serializes/deserializes class that is not serializable

2024-02-20 Thread Parth Chandra (Jira)

Parth Chandra created SPARK-47106:
-

 Summary: Plan canonicalization test serializes/deserializes class 
that is not serializable
 Key: SPARK-47106
 URL: https://issues.apache.org/jira/browse/SPARK-47106
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.4.1, 3.4.0
Reporter: Parth Chandra


The test
{code:java}
test("SPARK-23731 plans should be canonicalizable after being 
(de)serialized"){code}
serializes and deserializes 
{code:java}
FileSourceScanExec{code}
which is not actually serializable. In particular, FileSourceScanExec.relation 
is not serializable.
The test still passes though.

The test below derived from the above shows the issue - 
{code:java}
test("verify FileSourceScanExec (de)serialize") {
  withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") {
withTempPath { path =>
  spark.range(1).write.parquet(path.getAbsolutePath)
  val df = spark.read.parquet(path.getAbsolutePath)
  val fileSourceScanExec =
df.queryExecution.sparkPlan.collectFirst { case p:  
FileSourceScanExec => p }.get
  val serializer = SparkEnv.get.serializer.newInstance()
  val relation = serializer.serialize(fileSourceScanExec.relation)
  assert(relation != null)
  val deserialized =
  
serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec))
  assert(deserialized.relation != null)
}
  }
}{code}
 
The test fails with -
{code:java}
(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
    - field (class: 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
type: interface org.apache.spark.sql.execution.datasources.FileIndex)
    - object (class 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
java.io.NotSerializableException: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex
Serialization stack:
    - object not serializable (class: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: 
org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1))
    - field (class: 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, 
type: interface org.apache.spark.sql.execution.datasources.FileIndex)
    - object (class 
org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet)
    at 
org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
    at 
org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49)
    at 
org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66)
    at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54)
    at 
org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32)
    at 
org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266)
    at 
org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32)
    at 
org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48)
 {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47105) Spark Container doesn't have spark group or spark user created

2024-02-20 Thread Albert Wong (Jira)

Albert Wong created SPARK-47105:
---

 Summary: Spark Container doesn't have spark group or spark user 
created
 Key: SPARK-47105
 URL: https://issues.apache.org/jira/browse/SPARK-47105
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.4.1
 Environment: Using container apache/spark-py:latest
Reporter: Albert Wong


I see that 
[https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19]
 is supposed to have a spark user and spark group created but checking the 
container, it doesn't have those uid and gid created.  Both should have 185 uid 
and 185 gid.

I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group
root:x:0:
daemon:x:1:
bin:x:2:
sys:x:3:
adm:x:4:
tty:x:5:
disk:x:6:
lp:x:7:
mail:x:8:
news:x:9:
uucp:x:10:
man:x:12:
proxy:x:13:
kmem:x:15:
dialout:x:20:
fax:x:21:
voice:x:22:
cdrom:x:24:
floppy:x:25:
tape:x:26:
sudo:x:27:
audio:x:29:
dip:x:30:
www-data:x:33:
backup:x:34:
operator:x:37:
list:x:38:
irc:x:39:
src:x:40:
gnats:x:41:
shadow:x:42:
utmp:x:43:
video:x:44:
sasl:x:45:
plugdev:x:46:
staff:x:50:
games:x:60:
users:x:100:
nogroup:x:65534:
I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System 
(admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin

I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
games:x:5:60:games:/usr/games:/usr/sbin/nologin
man:x:6:12:man:/var/cache/man:/usr/sbin/nologin
lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin
mail:x:8:8:mail:/var/mail:/usr/sbin/nologin
news:x:9:9:news:/var/spool/news:/usr/sbin/nologin
uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin
proxy:x:13:13:proxy:/bin:/usr/sbin/nologin
www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin
backup:x:34:34:backup:/var/backups:/usr/sbin/nologin
list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin
irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin
gnats:x:41:41:Gnats Bug-Reporting System 
(admin):/var/lib/gnats:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
_apt:x:100:65534::/nonexistent:/usr/sbin/nologin



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45615) Remove redundant"Auto-application to `()` is deprecated" compile suppression rules.

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-45615.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45179
[https://github.com/apache/spark/pull/45179]

> Remove redundant"Auto-application to `()` is deprecated" compile suppression 
> rules.
> ---
>
> Key: SPARK-45615
> URL: https://issues.apache.org/jira/browse/SPARK-45615
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Due to the issue https://github.com/scalatest/scalatest/issues/2297, we need 
> to wait until we upgrade a scalatest version before removing these 
> suppression rules.
> Maybe 3.2.18



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45615) Remove redundant"Auto-application to `()` is deprecated" compile suppression rules.

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-45615:
-

Assignee: Yang Jie

> Remove redundant"Auto-application to `()` is deprecated" compile suppression 
> rules.
> ---
>
> Key: SPARK-45615
> URL: https://issues.apache.org/jira/browse/SPARK-45615
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
>
> Due to the issue https://github.com/scalatest/scalatest/issues/2297, we need 
> to wait until we upgrade a scalatest version before removing these 
> suppression rules.
> Maybe 3.2.18



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47098) Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47098.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45175
[https://github.com/apache/spark/pull/45175]

> Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows
> ---
>
> Key: SPARK-47098
> URL: https://issues.apache.org/jira/browse/SPARK-47098
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Reduce the tools we use for better maintenance 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47085) Preformance issue on thrift API

2024-02-20 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818878#comment-17818878
 ] 

Dongjoon Hyun commented on SPARK-47085:
---

Hi, [~igreenfi]and [~yao]. 
Could you provide some background why this is a regression at 3.4.1 and 3.5.0? 
If this is not a regression at that version, we should change `Affected 
Versions` to `4.0.0` because this is an improvement.

> Preformance issue on thrift API
> ---
>
> Key: SPARK-47085
> URL: https://issues.apache.org/jira/browse/SPARK-47085
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Izek Greenfield
>Assignee: Izek Greenfield
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This new complexity was introduced in SPARK-39041.
> In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity:
> {code:scala}
> ...
>  while (i < rowSize) {
>   val row = rows(I)
>   ...
> {code}
> It can be easily converted back into _*O( n )*_ complexity.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46858) Upgrade Pandas to 2.2.0

2024-02-20 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46858.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44881
[https://github.com/apache/spark/pull/44881]

> Upgrade Pandas to 2.2.0
> ---
>
> Key: SPARK-46858
> URL: https://issues.apache.org/jira/browse/SPARK-46858
> Project: Spark
>  Issue Type: Bug
>  Components: Pandas API on Spark, PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175

2024-02-20 Thread Nikola Mandic (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818849#comment-17818849
 ] 

Nikola Mandic commented on SPARK-42328:
---

[~maxgekk] Yes, thank you.

> Assign name to _LEGACY_ERROR_TEMP_1175
> --
>
> Key: SPARK-42328
> URL: https://issues.apache.org/jira/browse/SPARK-42328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175

2024-02-20 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818844#comment-17818844
 ] 

Max Gekk commented on SPARK-42328:
--

@nikolamand-db Would you like to work on this?

> Assign name to _LEGACY_ERROR_TEMP_1175
> --
>
> Key: SPARK-42328
> URL: https://issues.apache.org/jira/browse/SPARK-42328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175

2024-02-20 Thread Max Gekk (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818844#comment-17818844
 ] 

Max Gekk edited comment on SPARK-42328 at 2/20/24 2:53 PM:
---

[~nikolamand-db] Would you like to work on this?


was (Author: maxgekk):
@nikolamand-db Would you like to work on this?

> Assign name to _LEGACY_ERROR_TEMP_1175
> --
>
> Key: SPARK-42328
> URL: https://issues.apache.org/jira/browse/SPARK-42328
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47104) Spark SQL query fails with NullPointerException

2024-02-20 Thread Chhavi Bansal (Jira)

Chhavi Bansal created SPARK-47104:
-

 Summary: Spark SQL query fails with NullPointerException
 Key: SPARK-47104
 URL: https://issues.apache.org/jira/browse/SPARK-47104
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.1
Reporter: Chhavi Bansal


I am trying to run a very simple SQL query involving join and orderby clause 
and then using UUID() function in the outermost select stmt. The query fails
{code:java}
val df = spark.read.format("csv").option("header", 
"true").load("src/main/resources/titanic.csv")
df.createOrReplaceTempView("titanic")
val query = spark.sql(" select name, uuid() as _iid from (select s.name from 
titanic s join titanic t on s.name = t.name order by name) ;") 
query.show() // FAILS{code}
Dataset is a normal csv file with the following columns
{code:java}
PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
 {code}
Below is the error
{code:java}
Exception in thread "main" java.lang.NullPointerException
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown
 Source)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$2(limit.scala:207)
at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36)
at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198)
at scala.collection.TraversableLike.map(TraversableLike.scala:237)
at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198)
at 
org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:207)
at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:338)
at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:366)
at 
org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:338)
at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715)
at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728)
at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103)
at 
org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163)
at 
org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90)
at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
at 
org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64)
at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704)
at org.apache.spark.sql.Dataset.head(Dataset.scala:2728)
at org.apache.spark.sql.Dataset.take(Dataset.scala:2935)
at org.apache.spark.sql.Dataset.getRows(Dataset.scala:287)
at org.apache.spark.sql.Dataset.showString(Dataset.scala:326)
at org.apache.spark.sql.Dataset.show(Dataset.scala:808)
at org.apache.spark.sql.Dataset.show(Dataset.scala:785)
at 
hyperspace2.sparkPlan$.delayedEndpoint$hyperspace2$sparkPlan$1(sparkPlan.scala:14)
at hyperspace2.sparkPlan$delayedInit$body.apply(sparkPlan.scala:6)
at scala.Function0.apply$mcV$sp(Function0.scala:39)
at scala.Function0.apply$mcV$sp$(Function0.scala:39)
at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17)
at scala.App.$anonfun$main$1$adapted(App.scala:80)
at scala.collection.immutable.List.foreach(List.scala:392)
at scala.App.main(App.scala:80)
at scala.App.main$(App.scala:78)
at hyperspace2.sparkPlan$.main(sparkPlan.scala:6)
at hyperspace2.sparkPlan.main(sparkPlan.scala) {code}
Note:
 # here if I remove order by clause then it produces the correct output.
 # This happens when I read the dataset using csv file, works fine if I make 
the dataframe using Seq().toDf
 # The query fails if I use spark.sql("query").show() but is success when I 
simple write it to csv file

[https://stackoverflow.com/questions/78020267/spark-sql-query-fails-with-nullpointerexception]

Please can someone look into why this happens just when using `show()` since 
this is failing queries in production for me.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47044) Add JDBC query to explain formatted command

2024-02-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47044:
---

Assignee: Uros Stankovic

> Add JDBC query to explain formatted command
> ---
>
> Key: SPARK-47044
> URL: https://issues.apache.org/jira/browse/SPARK-47044
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Major
>  Labels: pull-request-available
>
> Add generated JDBC query to EXPLAIN FORMATTED command when physical Scan node 
> should access to JDBC source to create RDD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47044) Add JDBC query to explain formatted command

2024-02-20 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47044.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45102
[https://github.com/apache/spark/pull/45102]

> Add JDBC query to explain formatted command
> ---
>
> Key: SPARK-47044
> URL: https://issues.apache.org/jira/browse/SPARK-47044
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Uros Stankovic
>Assignee: Uros Stankovic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add generated JDBC query to EXPLAIN FORMATTED command when physical Scan node 
> should access to JDBC source to create RDD.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47103) Make the default storage level of intermediate datasets for MLlib configurable

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47103:
---
Labels: pull-request-available  (was: )

> Make the default storage level of intermediate datasets for MLlib configurable
> --
>
> Key: SPARK-47103
> URL: https://issues.apache.org/jira/browse/SPARK-47103
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47103) Make the default storage level of intermediate datasets for MLlib configurable

2024-02-20 Thread Cheng Pan (Jira)

Cheng Pan created SPARK-47103:
-

 Summary: Make the default storage level of intermediate datasets 
for MLlib configurable
 Key: SPARK-47103
 URL: https://issues.apache.org/jira/browse/SPARK-47103
 Project: Spark
  Issue Type: Improvement
  Components: MLlib
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46992:
---
Labels: correctness pull-request-available  (was: correctness)

> Inconsistent results with 'sort', 'cache', and AQE.
> ---
>
> Key: SPARK-46992
> URL: https://issues.apache.org/jira/browse/SPARK-46992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.5.0
>Reporter: Denis Tarima
>Priority: Critical
>  Labels: correctness, pull-request-available
>
>  
> With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes 
> {color:#4c9aff}sample{color} results after caching.
> Moreover, when cached,  {color:#4c9aff}collect{color} returns records as if 
> it's not cached, which is inconsistent with {color:#4c9aff}count{color} and 
> {color:#4c9aff}show{color}.
> A script to reproduce:
> {code:scala}
> import spark.implicits._
> val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123)
> println("NON CACHED:")
> println("  count: " + df.count())
> println("  collect: " + df.collect().mkString(" "))
> df.show()
> println("CACHED:")
> df.cache().count()
> println("  count: " + df.count())
> println("  collect: " + df.collect().mkString(" "))
> df.show()
> df.unpersist()
> {code}
> output:
> {code:java}
> NON CACHED:
>   count: 2
>   collect: [1] [4]
> +---+
> | id|
> +---+
> |  1|
> |  4|
> +---+
> CACHED:
>   count: 3
>   collect: [1] [4]
> +---+
> | id|
> +---+
> |  1|
> |  2|
> |  3|
> +---+
> {code}
> BTW, disabling AQE 
> [{color:#4c9aff}spark.conf.set("spark.databricks.optimizer.adaptive.enabled", 
> "false"){color}] helps on Databricks clusters, but locally it has no effect, 
> at least on Spark 3.3.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.

2024-02-20 Thread Jie Han (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818784#comment-17818784
 ] 

Jie Han commented on SPARK-46992:
-

It's because the second collect() reuses qe.executedPlan which is a lazy 
variable already initialized by the first collect() call.

Let's see this code:
{code:java}
df.collect() // qe.executedPlan firstly initialized here
df.cache()
df.collect() // reuse the qe.executedPlan{code}

> Inconsistent results with 'sort', 'cache', and AQE.
> ---
>
> Key: SPARK-46992
> URL: https://issues.apache.org/jira/browse/SPARK-46992
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.2, 3.5.0
>Reporter: Denis Tarima
>Priority: Critical
>  Labels: correctness
>
>  
> With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes 
> {color:#4c9aff}sample{color} results after caching.
> Moreover, when cached,  {color:#4c9aff}collect{color} returns records as if 
> it's not cached, which is inconsistent with {color:#4c9aff}count{color} and 
> {color:#4c9aff}show{color}.
> A script to reproduce:
> {code:scala}
> import spark.implicits._
> val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123)
> println("NON CACHED:")
> println("  count: " + df.count())
> println("  collect: " + df.collect().mkString(" "))
> df.show()
> println("CACHED:")
> df.cache().count()
> println("  count: " + df.count())
> println("  collect: " + df.collect().mkString(" "))
> df.show()
> df.unpersist()
> {code}
> output:
> {code:java}
> NON CACHED:
>   count: 2
>   collect: [1] [4]
> +---+
> | id|
> +---+
> |  1|
> |  4|
> +---+
> CACHED:
>   count: 3
>   collect: [1] [4]
> +---+
> | id|
> +---+
> |  1|
> |  2|
> |  3|
> +---+
> {code}
> BTW, disabling AQE 
> [{color:#4c9aff}spark.conf.set("spark.databricks.optimizer.adaptive.enabled", 
> "false"){color}] helps on Databricks clusters, but locally it has no effect, 
> at least on Spark 3.3.2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47009) Create table with collation

2024-02-20 Thread Stefan Kandic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Kandic updated SPARK-47009:
--
Epic Link: SPARK-46830

> Create table with collation
> ---
>
> Key: SPARK-47009
> URL: https://issues.apache.org/jira/browse/SPARK-47009
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>
> Add support for creating table with columns containing non-default collated 
> data



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag

2024-02-20 Thread Stefan Kandic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Kandic updated SPARK-47102:
--
Epic Link: SPARK-46830

> Add COLLATION_ENABLED config flag
> -
>
> Key: SPARK-47102
> URL: https://issues.apache.org/jira/browse/SPARK-47102
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47015) Disable partitioning on collated columns

2024-02-20 Thread Stefan Kandic (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stefan Kandic updated SPARK-47015:
--
Epic Link: SPARK-46830

> Disable partitioning on collated columns
> 
>
> Key: SPARK-47015
> URL: https://issues.apache.org/jira/browse/SPARK-47015
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stefan Kandic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47102) Add COLLATION_ENABLED config flag

2024-02-20 Thread Mihailo Milosevic (Jira)

Mihailo Milosevic created SPARK-47102:
-

 Summary: Add COLLATION_ENABLED config flag
 Key: SPARK-47102
 URL: https://issues.apache.org/jira/browse/SPARK-47102
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47079:
---
Labels: pull-request-available  (was: )

> Unable to create PySpark dataframe containing Variant columns
> -
>
> Key: SPARK-47079
> URL: https://issues.apache.org/jira/browse/SPARK-47079
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark, SQL
>Affects Versions: 3.5.0
>Reporter: Desmond Cheong
>Priority: Major
>  Labels: pull-request-available
>
> Trying to create a dataframe containing a variant type results in:
> AssertionError: Undefined error message parameter for error class: 
> CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message 
> parameter for error class: CANNOT_PARSE_DATATYPE. Parameters:
> {'error': 'variant'}
> "}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-44149) Support DataFrame Merge API

2024-02-20 Thread Hussein Awala (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-44149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818746#comment-17818746
 ] 

Hussein Awala commented on SPARK-44149:
---

Is it duplicated by SPARK-46207 which was fixed by 
[#44119|https://github.com/apache/spark/pull/44119]? or it's a different Merge 
support? 

> Support DataFrame Merge API
> ---
>
> Key: SPARK-44149
> URL: https://issues.apache.org/jira/browse/SPARK-44149
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Huaxin Gao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread Mihailo Milosevic (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818730#comment-17818730
 ] 

Mihailo Milosevic commented on SPARK-43259:
---

I want to work on this issue.

Raised a PR for same https://github.com/apache/spark/pull/45095

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47101:
---
Labels: pull-request-available  (was: )

> HiveExternalCatalog.verifyDataSchema does not fully comply with hive column 
> name rules
> --
>
> Key: SPARK-47101
> URL: https://issues.apache.org/jira/browse/SPARK-47101
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules

2024-02-20 Thread Kent Yao (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47101:
-
Summary: HiveExternalCatalog.verifyDataSchema does not fully comply with 
hive column name rules  (was: HiveExternalCatalog.verifyDataSchema does not 
fully comply hive column name rules)

> HiveExternalCatalog.verifyDataSchema does not fully comply with hive column 
> name rules
> --
>
> Key: SPARK-47101
> URL: https://issues.apache.org/jira/browse/SPARK-47101
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.4.2, 3.5.0, 4.0.0
>Reporter: Kent Yao
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply hive column name rules

2024-02-20 Thread Kent Yao (Jira)

Kent Yao created SPARK-47101:


 Summary: HiveExternalCatalog.verifyDataSchema does not fully 
comply hive column name rules
 Key: SPARK-47101
 URL: https://issues.apache.org/jira/browse/SPARK-47101
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.5.0, 3.4.2, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43259:
--

Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46743) Count bug introduced for scalar subquery when using TEMPORARY VIEW, as compared to using table

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46743:
--

Assignee: (was: Apache Spark)

> Count bug introduced for scalar subquery when using TEMPORARY VIEW, as 
> compared to using table
> --
>
> Key: SPARK-46743
> URL: https://issues.apache.org/jira/browse/SPARK-46743
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.5.0
>Reporter: Andy Lam
>Priority: Major
>  Labels: pull-request-available
>
> Using the temp view reproduces COUNT bug, returns nulls instead of 0.
> With a table:
> {code:java}
> scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM 
> VALUES
>      |     (1, 1),
>      |     (2, 1),
>      |     (3, 3),
>      |     (6, 6),
>      |     (7, 7),
>      |     (9, 9) AS inner_table(a, b)""")
> val res6: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null 
> AS int) AS a, CAST(null as int) AS b ;")
> val res7: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM 
> null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect()
> val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], 
> [0]) {code}
> With a view:
>  
> {code:java}
> spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, 
> 1),(3, 3), (6, 6), (7, 7), (9, 9);")
> spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), 
> CAST(null as int);")
> spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view 
> WHERE null_view.a = outer_view.a) FROM outer_view""").collect()
> val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], 
> [null], [null], [null]){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-46743) Count bug introduced for scalar subquery when using TEMPORARY VIEW, as compared to using table

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46743:
--

Assignee: Apache Spark

> Count bug introduced for scalar subquery when using TEMPORARY VIEW, as 
> compared to using table
> --
>
> Key: SPARK-46743
> URL: https://issues.apache.org/jira/browse/SPARK-46743
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer
>Affects Versions: 3.5.0
>Reporter: Andy Lam
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Using the temp view reproduces COUNT bug, returns nulls instead of 0.
> With a table:
> {code:java}
> scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM 
> VALUES
>      |     (1, 1),
>      |     (2, 1),
>      |     (3, 3),
>      |     (6, 6),
>      |     (7, 7),
>      |     (9, 9) AS inner_table(a, b)""")
> val res6: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null 
> AS int) AS a, CAST(null as int) AS b ;")
> val res7: org.apache.spark.sql.DataFrame = []
> scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM 
> null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect()
> val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], 
> [0]) {code}
> With a view:
>  
> {code:java}
> spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, 
> 1),(3, 3), (6, 6), (7, 7), (9, 9);")
> spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), 
> CAST(null as int);")
> spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view 
> WHERE null_view.a = outer_view.a) FROM outer_view""").collect()
> val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], 
> [null], [null], [null]){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43259:
--

Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43259:
--

Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43259:
--

Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47100) Upgrade netty to 4.1.107.Final and netty-tcnative to 2.0.62.Final

2024-02-20 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-47100.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45178
[https://github.com/apache/spark/pull/45178]

> Upgrade netty to 4.1.107.Final and netty-tcnative to 2.0.62.Final
> -
>
> Key: SPARK-47100
> URL: https://issues.apache.org/jira/browse/SPARK-47100
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43259:
--

Assignee: Apache Spark

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024

2024-02-20 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-43259:
--

Assignee: (was: Apache Spark)

> Assign a name to the error class _LEGACY_ERROR_TEMP_2024
> 
>
> Key: SPARK-43259
> URL: https://issues.apache.org/jira/browse/SPARK-43259
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

93 matches

Mail list logo