[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx Namespace: superior Priority: 0 Service Account: spark Node: cdh2/172.18.5.45 Start Time: Wed, 21 Feb 2024 15:48:08 +0800 Labels: profile=production spark-app-name=spark-pi spark-app-selector=spark-728e24e49f9040fa86b04c521463020b spark-role=driver spark-version=3.4.2 Annotations: Status: Failed IP: 10.244.1.4 IPs: IP: 10.244.1.4 Containers: spark-kubernetes-driver: Container ID: containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15 Image: spark:3.4.1 Image ID: docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 State: Terminated Reason: Error Exit Code: 1 Started: Wed, 21 Feb 2024 15:49:54 +0800 Finished: Wed, 21 Feb 2024 15:49:56
[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ --conf spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml \ --conf spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx Namespace: superior Priority: 0 Service Account: spark Node: cdh2/172.18.5.45 Start Time: Wed, 21 Feb 2024 15:48:08 +0800 Labels: profile=production spark-app-name=spark-pi spark-app-selector=spark-728e24e49f9040fa86b04c521463020b spark-role=driver spark-version=3.4.2 Annotations: Status: Failed IP: 10.244.1.4 IPs: IP: 10.244.1.4 Containers: spark-kubernetes-driver: Container ID: containerd://cceaf13b70cc5f21a639e71cb8663989ec73e122380844624d4bfac3946bae15 Image: spark:3.4.1 Image ID: docker.io/library/spark@sha256:69fb485a0bcad88f9a2bf066e1b5d555f818126dc9df5a0b7e6a3b6d364bc694 Ports: 7078/TCP, 7079/TCP, 4040/TCP Host Ports: 0/TCP, 0/TCP, 0/TCP Args: driver --properties-file /opt/spark/conf/spark.properties --class
[jira] [Assigned] (SPARK-47112) Write logs into a file in SparkR Windows build
[ https://issues.apache.org/jira/browse/SPARK-47112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47112: Assignee: Hyukjin Kwon > Write logs into a file in SparkR Windows build > -- > > Key: SPARK-47112 > URL: https://issues.apache.org/jira/browse/SPARK-47112 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/7977185456/job/21779508822 > This write too many logs, and difficult to see the real test cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47112) Write logs into a file in SparkR Windows build
[ https://issues.apache.org/jira/browse/SPARK-47112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47112. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45192 [https://github.com/apache/spark/pull/45192] > Write logs into a file in SparkR Windows build > -- > > Key: SPARK-47112 > URL: https://issues.apache.org/jira/browse/SPARK-47112 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > https://github.com/apache/spark/actions/runs/7977185456/job/21779508822 > This write too many logs, and difficult to see the real test cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47116) Install proper Python version in SparkR Windows build to avoid warnings
[ https://issues.apache.org/jira/browse/SPARK-47116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47116: --- Labels: pull-request-available (was: ) > Install proper Python version in SparkR Windows build to avoid warnings > --- > > Key: SPARK-47116 > URL: https://issues.apache.org/jira/browse/SPARK-47116 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830 > {code} > Traceback (most recent call last): > File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 183, > in _run_module_as_main > mod_name, mod_spec, code = _get_module_details(mod_name, _Error) > File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 109, > in _get_module_details > __import__(pkg_name) > File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\__init__.py", line > [53](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:54), > in > File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\rdd.py", line > [54](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:55), > in > File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\java_gateway.py", > line 33, in > File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\serializers.py", line > 69, in > File > "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\__init__.py", > line 1, in > File > "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\cloudpickle.py", > line 80, in > ImportError: cannot import name 'CellType' from 'types' > (C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\types.py) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47116) Install proper Python version in SparkR Windows build to avoid warnings
Hyukjin Kwon created SPARK-47116: Summary: Install proper Python version in SparkR Windows build to avoid warnings Key: SPARK-47116 URL: https://issues.apache.org/jira/browse/SPARK-47116 Project: Spark Issue Type: Test Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830 {code} Traceback (most recent call last): File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 183, in _run_module_as_main mod_name, mod_spec, code = _get_module_details(mod_name, _Error) File "C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\runpy.py", line 109, in _get_module_details __import__(pkg_name) File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\__init__.py", line [53](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:54), in File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\rdd.py", line [54](https://github.com/HyukjinKwon/spark/actions/runs/7985005685/job/21802732830#step:10:55), in File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\java_gateway.py", line 33, in File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\serializers.py", line 69, in File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\__init__.py", line 1, in File "D:\a\spark\spark\python\lib\pyspark.zip\pyspark\cloudpickle\cloudpickle.py", line 80, in ImportError: cannot import name 'CellType' from 'types' (C:\hostedtoolcache\windows\Python\3.7.9\x64\lib\types.py) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878
[ https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47113: -- Parent: SPARK-47046 Issue Type: Sub-task (was: Task) > Revert S3A endpoint fixup logic of SPARK-35878 > -- > > Key: SPARK-47113 > URL: https://issues.apache.org/jira/browse/SPARK-47113 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878
[ https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47113: - Assignee: Steve Loughran > Revert S3A endpoint fixup logic of SPARK-35878 > -- > > Key: SPARK-47113 > URL: https://issues.apache.org/jira/browse/SPARK-47113 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Steve Loughran >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878
[ https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47113. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45193 [https://github.com/apache/spark/pull/45193] > Revert S3A endpoint fixup logic of SPARK-35878 > -- > > Key: SPARK-47113 > URL: https://issues.apache.org/jira/browse/SPARK-47113 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47115) Use larger memory for Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-47115: Assignee: Hyukjin Kwon > Use larger memory for Maven builds > -- > > Key: SPARK-47115 > URL: https://issues.apache.org/jira/browse/SPARK-47115 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > *** RUN ABORTED *** > An exception or error caused a run to abort: unable to create native thread: > possibly out of memory or process/resource limits reached > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:1553) > at java.base/java.lang.System$2.start(System.java:2577) > at > java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) > at > org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) > ... > Warning: The requested profile "volcano" could not be activated because it > does not exist. > Warning: The requested profile "hive" could not be activated because it does > not exist. > Error: Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project > spark-core_2.13: There are test failures -> [Help 1] > Error: > Error: To see the full stack trace of the errors, re-run Maven with the -e > switch. > Error: Re-run Maven using the -X switch to enable full debug logging. > Error: > Error: For more information about the errors and possible solutions, please > read the following articles: > Error: [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > Error: > Error: After correcting the problems, you can resume the build with the > command > Error:mvn -rf :spark-core_2.13 > Error: Process completed with exit code 1. > {code} > https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47115) Use larger memory for Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-47115. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45195 [https://github.com/apache/spark/pull/45195] > Use larger memory for Maven builds > -- > > Key: SPARK-47115 > URL: https://issues.apache.org/jira/browse/SPARK-47115 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > {code} > *** RUN ABORTED *** > An exception or error caused a run to abort: unable to create native thread: > possibly out of memory or process/resource limits reached > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:1553) > at java.base/java.lang.System$2.start(System.java:2577) > at > java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) > at > org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) > ... > Warning: The requested profile "volcano" could not be activated because it > does not exist. > Warning: The requested profile "hive" could not be activated because it does > not exist. > Error: Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project > spark-core_2.13: There are test failures -> [Help 1] > Error: > Error: To see the full stack trace of the errors, re-run Maven with the -e > switch. > Error: Re-run Maven using the -X switch to enable full debug logging. > Error: > Error: For more information about the errors and possible solutions, please > read the following articles: > Error: [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > Error: > Error: After correcting the problems, you can resume the build with the > command > Error:mvn -rf :spark-core_2.13 > Error: Process completed with exit code 1. > {code} > https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47115) Use larger memory for Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47115: --- Labels: pull-request-available (was: ) > Use larger memory for Maven builds > -- > > Key: SPARK-47115 > URL: https://issues.apache.org/jira/browse/SPARK-47115 > Project: Spark > Issue Type: Test > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > {code} > *** RUN ABORTED *** > An exception or error caused a run to abort: unable to create native thread: > possibly out of memory or process/resource limits reached > java.lang.OutOfMemoryError: unable to create native thread: possibly out of > memory or process/resource limits reached > at java.base/java.lang.Thread.start0(Native Method) > at java.base/java.lang.Thread.start(Thread.java:1553) > at java.base/java.lang.System$2.start(System.java:2577) > at > java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) > at > java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) > at > java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) > at > org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) > at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) > at > org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) > at > org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) > ... > Warning: The requested profile "volcano" could not be activated because it > does not exist. > Warning: The requested profile "hive" could not be activated because it does > not exist. > Error: Failed to execute goal > org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project > spark-core_2.13: There are test failures -> [Help 1] > Error: > Error: To see the full stack trace of the errors, re-run Maven with the -e > switch. > Error: Re-run Maven using the -X switch to enable full debug logging. > Error: > Error: For more information about the errors and possible solutions, please > read the following articles: > Error: [Help 1] > http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException > Error: > Error: After correcting the problems, you can resume the build with the > command > Error:mvn -rf :spark-core_2.13 > Error: Process completed with exit code 1. > {code} > https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47115) Use larger memory for Maven builds
Hyukjin Kwon created SPARK-47115: Summary: Use larger memory for Maven builds Key: SPARK-47115 URL: https://issues.apache.org/jira/browse/SPARK-47115 Project: Spark Issue Type: Test Components: Project Infra Affects Versions: 4.0.0 Reporter: Hyukjin Kwon {code} *** RUN ABORTED *** An exception or error caused a run to abort: unable to create native thread: possibly out of memory or process/resource limits reached java.lang.OutOfMemoryError: unable to create native thread: possibly out of memory or process/resource limits reached at java.base/java.lang.Thread.start0(Native Method) at java.base/java.lang.Thread.start(Thread.java:1553) at java.base/java.lang.System$2.start(System.java:2577) at java.base/jdk.internal.vm.SharedThreadContainer.start(SharedThreadContainer.java:152) at java.base/java.util.concurrent.ThreadPoolExecutor.addWorker(ThreadPoolExecutor.java:953) at java.base/java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1364) at org.apache.spark.rpc.netty.SharedMessageLoop.$anonfun$threadpool$1(MessageLoop.scala:128) at scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:190) at org.apache.spark.rpc.netty.SharedMessageLoop.(MessageLoop.scala:127) at org.apache.spark.rpc.netty.Dispatcher.sharedLoop$lzycompute(Dispatcher.scala:46) ... Warning: The requested profile "volcano" could not be activated because it does not exist. Warning: The requested profile "hive" could not be activated because it does not exist. Error: Failed to execute goal org.scalatest:scalatest-maven-plugin:2.2.0:test (test) on project spark-core_2.13: There are test failures -> [Help 1] Error: Error: To see the full stack trace of the errors, re-run Maven with the -e switch. Error: Re-run Maven using the -X switch to enable full debug logging. Error: Error: For more information about the errors and possible solutions, please read the following articles: Error: [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoFailureException Error: Error: After correcting the problems, you can resume the build with the command Error:mvn -rf :spark-core_2.13 Error: Process completed with exit code 1. {code} https://github.com/HyukjinKwon/spark/actions/runs/7984135094/job/21800463337 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
[ https://issues.apache.org/jira/browse/SPARK-47114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-47114: -- Description: spark runs in kubernetes and accesses an external hdfs cluster (kerberos),pod error logs {code:java} Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed{code} This error generally occurs when the krb5 file cannot be found [~yao] [~Qin Yao] {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ --conf spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml \ --conf spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior ++ id -u + myuid=0 ++ id -g + mygid=0 + set +e ++ getent passwd 0 + uidentry=root:x:0:0:root:/root:/bin/bash + set -e + '[' -z root:x:0:0:root:/root:/bin/bash ']' + '[' -z /opt/java/openjdk ']' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' ++ command -v readarray + '[' readarray ']' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name:
[jira] [Created] (SPARK-47114) In the spark driver pod. Failed to access the krb5 file
melin created SPARK-47114: - Summary: In the spark driver pod. Failed to access the krb5 file Key: SPARK-47114 URL: https://issues.apache.org/jira/browse/SPARK-47114 Project: Spark Issue Type: New Feature Components: Kubernetes Affects Versions: 3.4.1 Reporter: melin spark runs in kubernetes and accesses an external hdfs cluster (kerberos) {code:java} ./bin/spark-submit \ --master k8s://https://172.18.5.44:6443 \ --deploy-mode cluster \ --name spark-pi \ --class org.apache.spark.examples.SparkPi \ --conf spark.executor.instances=1 \ --conf spark.kubernetes.submission.waitAppCompletion=true \ --conf spark.kubernetes.driver.pod.name=spark-xxx \ --conf spark.kubernetes.executor.podNamePrefix=spark-executor-xxx \ --conf spark.kubernetes.driver.label.profile=production \ --conf spark.kubernetes.executor.label.profile=production \ --conf spark.kubernetes.namespace=superior \ --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ --conf spark.kubernetes.container.image=registry.cn-hangzhou.aliyuncs.com/melin1204/spark-jobserver:3.4.0 \ --conf spark.kubernetes.file.upload.path=hdfs://cdh1:8020/user/superior/kubernetes/ \ --conf spark.kubernetes.container.image.pullPolicy=Always \ --conf spark.kubernetes.container.image.pullSecrets=docker-reg-demos \ --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ --conf spark.kerberos.principal=superior/ad...@datacyber.com \ --conf spark.kerberos.keytab=/root/superior.keytab \ --conf spark.kubernetes.driver.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/driver.yaml \ --conf spark.kubernetes.executor.podTemplateFile=file:///root/spark-3.4.2-bin-hadoop3/executor.yaml \ file:///root/spark-3.4.2-bin-hadoop3/examples/jars/spark-examples_2.12-3.4.2.jar 5{code} {code:java} (base) [root@cdh1 ~]# kubectl logs spark-xxx -n superior ++ id -u + myuid=0 ++ id -g + mygid=0 + set +e ++ getent passwd 0 + uidentry=root:x:0:0:root:/root:/bin/bash + set -e + '[' -z root:x:0:0:root:/root:/bin/bash ']' + '[' -z /opt/java/openjdk ']' + SPARK_CLASSPATH=':/opt/spark/jars/*' + env + grep SPARK_JAVA_OPT_ + sort -t_ -k4 -n + sed 's/[^=]*=\(.*\)/\1/g' ++ command -v readarray + '[' readarray ']' + readarray -t SPARK_EXECUTOR_JAVA_OPTS + '[' -n '' ']' + '[' -z ']' + '[' -z ']' + '[' -n '' ']' + '[' -z x ']' + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' + '[' -z x ']' + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' + case "$1" in + shift 1 + CMD=("$SPARK_HOME/bin/spark-submit" --conf "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client "$@") + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf spark.driver.bindAddress=10.244.2.56 --deploy-mode client --properties-file /opt/spark/conf/spark.properties --class org.apache.spark.examples.SparkPi spark-internal 5 Exception in thread "main" java.lang.IllegalArgumentException: Can't get Kerberos realm at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:71) at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:315) at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:300) at org.apache.hadoop.security.UserGroupInformation.isAuthenticationMethodEnabled(UserGroupInformation.java:395) at org.apache.hadoop.security.UserGroupInformation.isSecurityEnabled(UserGroupInformation.java:389) at org.apache.hadoop.security.UserGroupInformation.loginUserFromKeytab(UserGroupInformation.java:1119) at org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:385) at org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:955) at org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:192) at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:215) at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) at org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:) at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1120) at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) Caused by: java.lang.IllegalArgumentException: KrbException: krb5.conf loading failed at java.security.jgss/javax.security.auth.kerberos.KerberosPrincipal.(Unknown Source) at org.apache.hadoop.security.authentication.util.KerberosUtil.getDefaultRealm(KerberosUtil.java:120) at org.apache.hadoop.security.HadoopKerberosName.setConfiguration(HadoopKerberosName.java:69) ... 13 more (base) [root@cdh1 ~]# kubectl describe pod spark-xxx -n superior Name: spark-xxx
[jira] [Updated] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878
[ https://issues.apache.org/jira/browse/SPARK-47113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47113: --- Labels: pull-request-available (was: ) > Revert S3A endpoint fixup logic of SPARK-35878 > -- > > Key: SPARK-47113 > URL: https://issues.apache.org/jira/browse/SPARK-47113 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47113) Revert S3A endpoint fixup logic of SPARK-35878
Dongjoon Hyun created SPARK-47113: - Summary: Revert S3A endpoint fixup logic of SPARK-35878 Key: SPARK-47113 URL: https://issues.apache.org/jira/browse/SPARK-47113 Project: Spark Issue Type: Task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46928) Support ListState in Arbitrary State API v2
[ https://issues.apache.org/jira/browse/SPARK-46928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-46928. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44961 [https://github.com/apache/spark/pull/44961] > Support ListState in Arbitrary State API v2 > --- > > Key: SPARK-46928 > URL: https://issues.apache.org/jira/browse/SPARK-46928 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bhuwan Sahni >Assignee: Bhuwan Sahni >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > As part of Arbitrary State API v2 > ([https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig),] > we need to support ListState. This task encounters adding support for > ListState in Scala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46928) Support ListState in Arbitrary State API v2
[ https://issues.apache.org/jira/browse/SPARK-46928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-46928: Assignee: Bhuwan Sahni > Support ListState in Arbitrary State API v2 > --- > > Key: SPARK-46928 > URL: https://issues.apache.org/jira/browse/SPARK-46928 > Project: Spark > Issue Type: New Feature > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Bhuwan Sahni >Assignee: Bhuwan Sahni >Priority: Major > Labels: pull-request-available > > As part of Arbitrary State API v2 > ([https://docs.google.com/document/d/1QtC5qd4WQEia9kl1Qv74WE0TiXYy3x6zeTykygwPWig),] > we need to support ListState. This task encounters adding support for > ListState in Scala. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46934) Unable to create Hive View from certain Spark Dataframe StructType
[ https://issues.apache.org/jira/browse/SPARK-46934?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17819062#comment-17819062 ] Yu-Ting LIN commented on SPARK-46934: - [~dongjoon] As I have mentioned before, we are currently mainly using Spark 3.3.2 and we also have a plan to migrate to Spark 3.5. I did not figure out which Spark versions supported my use cases. Based on my understanding, both Spark 3.3 and 3.5 do not support this feature. > Unable to create Hive View from certain Spark Dataframe StructType > -- > > Key: SPARK-46934 > URL: https://issues.apache.org/jira/browse/SPARK-46934 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.3.0, 3.3.2, 3.3.4 > Environment: Tested in Spark 3.3.0, 3.3.2. >Reporter: Yu-Ting LIN >Assignee: Kent Yao >Priority: Blocker > Labels: pull-request-available > Fix For: 4.0.0 > > > We are trying to create a Hive View using following SQL command "CREATE OR > REPLACE VIEW yuting AS SELECT INFO_ANN FROM table_2611810". > Our table_2611810 has certain columns contain special characters such as "/". > Here is the schema of this table. > {code:java} > contigName string > start bigint > end bigint > names array > referenceAllele string > alternateAlleles array > qual double > filters array > splitFromMultiAllelic boolean > INFO_NCAMP int > INFO_ODDRATIO double > INFO_NM double > INFO_DBSNP_CAF array > INFO_SPANPAIR int > INFO_TLAMP int > INFO_PSTD double > INFO_QSTD double > INFO_SBF double > INFO_AF array > INFO_QUAL double > INFO_SHIFT3 int > INFO_VARBIAS string > INFO_HICOV int > INFO_PMEAN double > INFO_MSI double > INFO_VD int > INFO_DP int > INFO_HICNT int > INFO_ADJAF double > INFO_SVLEN int > INFO_RSEQ string > INFO_MSigDb array > INFO_NMD array > INFO_ANN > array,Annotation_Impact:string,Gene_Name:string,Gene_ID:string,Feature_Type:string,Feature_ID:string,Transcript_BioType:string,Rank:struct,HGVS_c:string,HGVS_p:string,cDNA_pos/cDNA_length:struct,CDS_pos/CDS_length:struct,AA_pos/AA_length:struct,Distance:int,ERRORS/WARNINGS/INFO:string>> > INFO_BIAS string > INFO_MQ double > INFO_HIAF double > INFO_END int > INFO_SPLITREAD int > INFO_GDAMP int > INFO_LSEQ string > INFO_LOF array > INFO_SAMPLE string > INFO_AMPFLAG int > INFO_SN double > INFO_SVTYPE string > INFO_TYPE string > INFO_MSILEN double > INFO_DUPRATE double > INFO_DBSNP_COMMON int > INFO_REFBIAS string > genotypes > array,ALD:array,AF:array,phased:boolean,calls:array,VD:int,depth:int,RD:array>> > {code} > You can see that column INFO_ANN is an array of struct and it contains column > which has "/" inside such as "cDNA_pos/cDNA_length", etc. > We believe that it is the root cause that cause the following SparkException: > {code:java} > scala> val schema = spark.sql("CREATE OR REPLACE VIEW yuting AS SELECT > INFO_ANN FROM table_2611810") > 24/01/31 07:50:02.658 [main] WARN o.a.spark.sql.catalyst.util.package - > Truncated the string representation of a plan since it was too large. This > behavior can be adjusted by setting 'spark.sql.debug.maxToStringFields'. > org.apache.spark.SparkException: Cannot recognize hive type string: > array,Annotation_Impact:string,Gene_Name:string,Gene_ID:string,Feature_Type:string,Feature_ID:string,Transcript_BioType:string,Rank:struct,HGVS_c:string,HGVS_p:string,cDNA_pos/cDNA_length:struct,CDS_pos/CDS_length:struct,AA_pos/AA_length:struct,Distance:int,ERRORS/WARNINGS/INFO:string>>, > column: INFO_ANN > at > org.apache.spark.sql.errors.QueryExecutionErrors$.cannotRecognizeHiveTypeError(QueryExecutionErrors.scala:1455) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.getSparkSQLDataType(HiveClientImpl.scala:1022) > at > org.apache.spark.sql.hive.client.HiveClientImpl$.$anonfun$verifyColumnDataType$1(HiveClientImpl.scala:1037) > at scala.collection.Iterator.foreach(Iterator.scala:943) > at scala.collection.Iterator.foreach$(Iterator.scala:943) > at scala.collection.AbstractIterator.foreach(Iterator.scala:1431) > at scala.collection.IterableLike.foreach(IterableLike.scala:74) > at
[jira] [Assigned] (SPARK-47052) Separate state tracking variables from MicroBatchExecution/StreamExecution
[ https://issues.apache.org/jira/browse/SPARK-47052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim reassigned SPARK-47052: Assignee: Boyang Jerry Peng > Separate state tracking variables from MicroBatchExecution/StreamExecution > -- > > Key: SPARK-47052 > URL: https://issues.apache.org/jira/browse/SPARK-47052 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Boyang Jerry Peng >Assignee: Boyang Jerry Peng >Priority: Major > Labels: pull-request-available > > To improve code clarity and maintainability, I propose that we move all the > variables that track mutable state and metrics for streaming query into a > separate class. With this refactor, it would be easy to track and find all > the mutable state a microbatch can have. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47052) Separate state tracking variables from MicroBatchExecution/StreamExecution
[ https://issues.apache.org/jira/browse/SPARK-47052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-47052. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45109 [https://github.com/apache/spark/pull/45109] > Separate state tracking variables from MicroBatchExecution/StreamExecution > -- > > Key: SPARK-47052 > URL: https://issues.apache.org/jira/browse/SPARK-47052 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Boyang Jerry Peng >Assignee: Boyang Jerry Peng >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > To improve code clarity and maintainability, I propose that we move all the > variables that track mutable state and metrics for streaming query into a > separate class. With this refactor, it would be easy to track and find all > the mutable state a microbatch can have. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47111) Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2
[ https://issues.apache.org/jira/browse/SPARK-47111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47111. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45191 [https://github.com/apache/spark/pull/45191] > Upgrade `PostgreSQL` JDBC driver to 42.7.2 and docker image to 16.2 > --- > > Key: SPARK-47111 > URL: https://issues.apache.org/jira/browse/SPARK-47111 > Project: Spark > Issue Type: Sub-task > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47112) Write logs into a file in SparkR Windows build
[ https://issues.apache.org/jira/browse/SPARK-47112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47112: --- Labels: pull-request-available (was: ) > Write logs into a file in SparkR Windows build > -- > > Key: SPARK-47112 > URL: https://issues.apache.org/jira/browse/SPARK-47112 > Project: Spark > Issue Type: Test > Components: Project Infra, SparkR >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > > https://github.com/apache/spark/actions/runs/7977185456/job/21779508822 > This write too many logs, and difficult to see the real test cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47112) Write logs into a file in SparkR Windows build
Hyukjin Kwon created SPARK-47112: Summary: Write logs into a file in SparkR Windows build Key: SPARK-47112 URL: https://issues.apache.org/jira/browse/SPARK-47112 Project: Spark Issue Type: Test Components: Project Infra, SparkR Affects Versions: 4.0.0 Reporter: Hyukjin Kwon https://github.com/apache/spark/actions/runs/7977185456/job/21779508822 This write too many logs, and difficult to see the real test cases. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47110) Reenble AmmoniteTest tests in Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47110: - Description: Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues. See also https://github.com/apache/spark/pull/43909 and https://github.com/apache/spark/pull/40675 was: Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues. See also https://github.com/apache/spark/pull/43909 and https://github.com/apache/spark/pull/45186 > Reenble AmmoniteTest tests in Maven builds > -- > > Key: SPARK-47110 > URL: https://issues.apache.org/jira/browse/SPARK-47110 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues. > See also https://github.com/apache/spark/pull/43909 and > https://github.com/apache/spark/pull/40675 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47110) Reenble AmmoniteTest tests in Maven builds
[ https://issues.apache.org/jira/browse/SPARK-47110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-47110: - Summary: Reenble AmmoniteTest tests in Maven builds (was: Eanble AmmoniteTest tests in Maven builds) > Reenble AmmoniteTest tests in Maven builds > -- > > Key: SPARK-47110 > URL: https://issues.apache.org/jira/browse/SPARK-47110 > Project: Spark > Issue Type: Improvement > Components: Connect, Tests >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Priority: Major > > Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues. > See also https://github.com/apache/spark/pull/43909 and > https://github.com/apache/spark/pull/45186 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47110) Eanble AmmoniteTest tests in Maven builds
Hyukjin Kwon created SPARK-47110: Summary: Eanble AmmoniteTest tests in Maven builds Key: SPARK-47110 URL: https://issues.apache.org/jira/browse/SPARK-47110 Project: Spark Issue Type: Improvement Components: Connect, Tests Affects Versions: 4.0.0 Reporter: Hyukjin Kwon Disabled in https://github.com/apache/spark/pull/45186 because of TTY issues. See also https://github.com/apache/spark/pull/43909 and https://github.com/apache/spark/pull/45186 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47109) Upgrade `commons-compress` to 1.26.0
[ https://issues.apache.org/jira/browse/SPARK-47109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47109. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45189 [https://github.com/apache/spark/pull/45189] > Upgrade `commons-compress` to 1.26.0 > > > Key: SPARK-47109 > URL: https://issues.apache.org/jira/browse/SPARK-47109 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47109) Upgrade `commons-compress` to 1.26.0
[ https://issues.apache.org/jira/browse/SPARK-47109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47109: - Assignee: Dongjoon Hyun > Upgrade `commons-compress` to 1.26.0 > > > Key: SPARK-47109 > URL: https://issues.apache.org/jira/browse/SPARK-47109 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules
[ https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47101: -- Affects Version/s: (was: 3.5.0) (was: 3.4.2) > HiveExternalCatalog.verifyDataSchema does not fully comply with hive column > name rules > -- > > Key: SPARK-47101 > URL: https://issues.apache.org/jira/browse/SPARK-47101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules
[ https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47101: - Issue Type: Improvement (was: Test) > HiveExternalCatalog.verifyDataSchema does not fully comply with hive column > name rules > -- > > Key: SPARK-47101 > URL: https://issues.apache.org/jira/browse/SPARK-47101 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47109) Upgrade `commons-compress` to 1.26.0
[ https://issues.apache.org/jira/browse/SPARK-47109?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47109: --- Labels: pull-request-available (was: ) > Upgrade `commons-compress` to 1.26.0 > > > Key: SPARK-47109 > URL: https://issues.apache.org/jira/browse/SPARK-47109 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47109) Upgrade `commons-compress` to 1.26.0
Dongjoon Hyun created SPARK-47109: - Summary: Upgrade `commons-compress` to 1.26.0 Key: SPARK-47109 URL: https://issues.apache.org/jira/browse/SPARK-47109 Project: Spark Issue Type: Sub-task Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44814) Test to trigger protobuf 4.23.3 crash
[ https://issues.apache.org/jira/browse/SPARK-44814?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-44814: --- Labels: pull-request-available (was: ) > Test to trigger protobuf 4.23.3 crash > - > > Key: SPARK-44814 > URL: https://issues.apache.org/jira/browse/SPARK-44814 > Project: Spark > Issue Type: Improvement > Components: Connect >Affects Versions: 3.5.0 >Reporter: Martin Grund >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46906) Add a check for stateful operator change for streaming
[ https://issues.apache.org/jira/browse/SPARK-46906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jungtaek Lim resolved SPARK-46906. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44927 [https://github.com/apache/spark/pull/44927] > Add a check for stateful operator change for streaming > -- > > Key: SPARK-46906 > URL: https://issues.apache.org/jira/browse/SPARK-46906 > Project: Spark > Issue Type: Task > Components: Structured Streaming >Affects Versions: 4.0.0 >Reporter: Jing Zhan >Assignee: Jing Zhan >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Currently user will get a misleading error as > org.apache.spark.sql.execution.streaming.state.StateSchemaNotCompatible if > restarting query in the same checkpoint location and changing their stateful > operator. We need to catches such errors and throws a new error with > informative message. > After physical planning, before execution phase, we will read from state > metadata with the current operator id to fetch operator name of committed > batch with the same operator id. If operator name does not match, throws the > error. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs
[ https://issues.apache.org/jira/browse/SPARK-47108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47108. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45185 [https://github.com/apache/spark/pull/45185] > Set `derby.connection.requireAuthentication` to false explicitly in CLIs > > > Key: SPARK-47108 > URL: https://issues.apache.org/jira/browse/SPARK-47108 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs
[ https://issues.apache.org/jira/browse/SPARK-47108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-47108: - Assignee: Dongjoon Hyun > Set `derby.connection.requireAuthentication` to false explicitly in CLIs > > > Key: SPARK-47108 > URL: https://issues.apache.org/jira/browse/SPARK-47108 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs
[ https://issues.apache.org/jira/browse/SPARK-47108?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47108: --- Labels: pull-request-available (was: ) > Set `derby.connection.requireAuthentication` to false explicitly in CLIs > > > Key: SPARK-47108 > URL: https://issues.apache.org/jira/browse/SPARK-47108 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47108) Set `derby.connection.requireAuthentication` to false explicitly in CLIs
Dongjoon Hyun created SPARK-47108: - Summary: Set `derby.connection.requireAuthentication` to false explicitly in CLIs Key: SPARK-47108 URL: https://issues.apache.org/jira/browse/SPARK-47108 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175
[ https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-42328: --- Labels: pull-request-available (was: ) > Assign name to _LEGACY_ERROR_TEMP_1175 > -- > > Key: SPARK-42328 > URL: https://issues.apache.org/jira/browse/SPARK-42328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1
[ https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46257: -- Description: [https://db.apache.org/derby/releases/release-10_16_1_1.cgi] 1. Drop Java Security Manager. {quote}Derby no longer supports the Java SecurityManager. This is because the Open JDK team deprecated the SecurityManager and marked it for removal. {quote} 2. Compile on Java 17 {quote}Compile 10.16 into Java 17 byte code {quote} was: [https://db.apache.org/derby/releases/release-10_16_1_1.cgi] {quote}Derby no longer supports the Java SecurityManager. This is because the Open JDK team deprecated the SecurityManager and marked it for removal. {quote} > Upgrade Derby to 10.16.1.1 > -- > > Key: SPARK-46257 > URL: https://issues.apache.org/jira/browse/SPARK-46257 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [https://db.apache.org/derby/releases/release-10_16_1_1.cgi] > 1. Drop Java Security Manager. > {quote}Derby no longer supports the Java SecurityManager. This is because the > Open JDK team deprecated the SecurityManager and marked it for removal. > {quote} > 2. Compile on Java 17 > {quote}Compile 10.16 into Java 17 byte code > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46257) Upgrade Derby to 10.16.1.1
[ https://issues.apache.org/jira/browse/SPARK-46257?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-46257: -- Description: [https://db.apache.org/derby/releases/release-10_16_1_1.cgi] {quote}Derby no longer supports the Java SecurityManager. This is because the Open JDK team deprecated the SecurityManager and marked it for removal. {quote} was:https://db.apache.org/derby/releases/release-10_16_1_1.cgi > Upgrade Derby to 10.16.1.1 > -- > > Key: SPARK-46257 > URL: https://issues.apache.org/jira/browse/SPARK-46257 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > [https://db.apache.org/derby/releases/release-10_16_1_1.cgi] > {quote}Derby no longer supports the Java SecurityManager. This is because the > Open JDK team deprecated the SecurityManager and marked it for removal. > {quote} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47085) Preformance issue on thrift API
[ https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47085: -- Fix Version/s: 3.5.2 > Preformance issue on thrift API > --- > > Key: SPARK-47085 > URL: https://issues.apache.org/jira/browse/SPARK-47085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Izek Greenfield >Assignee: Izek Greenfield >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0, 3.5.2 > > > This new complexity was introduced in SPARK-39041. > before this PR the code was: > {code:java} > while (curRow < maxRows && iter.hasNext) { > val sparkRow = iter.next() > val row = ArrayBuffer[Any]() > var curCol = 0 > while (curCol < sparkRow.length) { > if (sparkRow.isNullAt(curCol)) { > row += null > } else { > addNonNullColumnValue(sparkRow, row, curCol, timeFormatters) > } > curCol += 1 > } > resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]]) > curRow += 1 > }{code} > foreach without the _*O(n^2)*_ complexity so this change just return the > state to what it was before. > > In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: > {code:scala} > ... > while (i < rowSize) { > val row = rows(I) > ... > {code} > It can be easily converted back into _*O( n )*_ complexity. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47085) Preformance issue on thrift API
[ https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818989#comment-17818989 ] Dongjoon Hyun commented on SPARK-47085: --- Thank you. I added SPARK-39041 as a link `is caused by`. > Preformance issue on thrift API > --- > > Key: SPARK-47085 > URL: https://issues.apache.org/jira/browse/SPARK-47085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Izek Greenfield >Assignee: Izek Greenfield >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This new complexity was introduced in SPARK-39041. > before this PR the code was: > {code:java} > while (curRow < maxRows && iter.hasNext) { > val sparkRow = iter.next() > val row = ArrayBuffer[Any]() > var curCol = 0 > while (curCol < sparkRow.length) { > if (sparkRow.isNullAt(curCol)) { > row += null > } else { > addNonNullColumnValue(sparkRow, row, curCol, timeFormatters) > } > curCol += 1 > } > resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]]) > curRow += 1 > }{code} > foreach without the _*O(n^2)*_ complexity so this change just return the > state to what it was before. > > In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: > {code:scala} > ... > while (i < rowSize) { > val row = rows(I) > ... > {code} > It can be easily converted back into _*O( n )*_ complexity. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47105) Spark Container doesn't have spark group or spark user created
[ https://issues.apache.org/jira/browse/SPARK-47105?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818974#comment-17818974 ] Albert Wong commented on SPARK-47105: - Related: https://issues.apache.org/jira/browse/SPARK-45557 > Spark Container doesn't have spark group or spark user created > -- > > Key: SPARK-47105 > URL: https://issues.apache.org/jira/browse/SPARK-47105 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Docker >Affects Versions: 3.4.1 > Environment: Using container apache/spark-py:latest >Reporter: Albert Wong >Priority: Critical > > I see that > [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19] > is supposed to have a spark user and spark group created but checking the > container, it doesn't have those uid and gid created. Both should have 185 > uid and 185 gid. > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group > root:x:0: > daemon:x:1: > bin:x:2: > sys:x:3: > adm:x:4: > tty:x:5: > disk:x:6: > lp:x:7: > mail:x:8: > news:x:9: > uucp:x:10: > man:x:12: > proxy:x:13: > kmem:x:15: > dialout:x:20: > fax:x:21: > voice:x:22: > cdrom:x:24: > floppy:x:25: > tape:x:26: > sudo:x:27: > audio:x:29: > dip:x:30: > www-data:x:33: > backup:x:34: > operator:x:37: > list:x:38: > irc:x:39: > src:x:40: > gnats:x:41: > shadow:x:42: > utmp:x:43: > video:x:44: > sasl:x:45: > plugdev:x:46: > staff:x:50: > games:x:60: > users:x:100: > nogroup:x:65534: > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd > root:x:0:0:root:/root:/bin/bash > daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin > bin:x:2:2:bin:/bin:/usr/sbin/nologin > sys:x:3:3:sys:/dev:/usr/sbin/nologin > sync:x:4:65534:sync:/bin:/bin/sync > games:x:5:60:games:/usr/games:/usr/sbin/nologin > man:x:6:12:man:/var/cache/man:/usr/sbin/nologin > lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin > mail:x:8:8:mail:/var/mail:/usr/sbin/nologin > news:x:9:9:news:/var/spool/news:/usr/sbin/nologin > uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin > proxy:x:13:13:proxy:/bin:/usr/sbin/nologin > www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin > backup:x:34:34:backup:/var/backups:/usr/sbin/nologin > list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin > irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin > gnats:x:41:41:Gnats Bug-Reporting System > (admin):/var/lib/gnats:/usr/sbin/nologin > nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin > _apt:x:100:65534::/nonexistent:/usr/sbin/nologin > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd > root:x:0:0:root:/root:/bin/bash > daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin > bin:x:2:2:bin:/bin:/usr/sbin/nologin > sys:x:3:3:sys:/dev:/usr/sbin/nologin > sync:x:4:65534:sync:/bin:/bin/sync > games:x:5:60:games:/usr/games:/usr/sbin/nologin > man:x:6:12:man:/var/cache/man:/usr/sbin/nologin > lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin > mail:x:8:8:mail:/var/mail:/usr/sbin/nologin > news:x:9:9:news:/var/spool/news:/usr/sbin/nologin > uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin > proxy:x:13:13:proxy:/bin:/usr/sbin/nologin > www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin > backup:x:34:34:backup:/var/backups:/usr/sbin/nologin > list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin > irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin > gnats:x:41:41:Gnats Bug-Reporting System > (admin):/var/lib/gnats:/usr/sbin/nologin > nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin > _apt:x:100:65534::/nonexistent:/usr/sbin/nologin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47105) Spark Container doesn't have spark group or spark user created
[ https://issues.apache.org/jira/browse/SPARK-47105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Wong updated SPARK-47105: Shepherd: Hyukjin Kwon > Spark Container doesn't have spark group or spark user created > -- > > Key: SPARK-47105 > URL: https://issues.apache.org/jira/browse/SPARK-47105 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Docker >Affects Versions: 3.4.1 > Environment: Using container apache/spark-py:latest >Reporter: Albert Wong >Priority: Critical > > I see that > [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19] > is supposed to have a spark user and spark group created but checking the > container, it doesn't have those uid and gid created. Both should have 185 > uid and 185 gid. > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group > root:x:0: > daemon:x:1: > bin:x:2: > sys:x:3: > adm:x:4: > tty:x:5: > disk:x:6: > lp:x:7: > mail:x:8: > news:x:9: > uucp:x:10: > man:x:12: > proxy:x:13: > kmem:x:15: > dialout:x:20: > fax:x:21: > voice:x:22: > cdrom:x:24: > floppy:x:25: > tape:x:26: > sudo:x:27: > audio:x:29: > dip:x:30: > www-data:x:33: > backup:x:34: > operator:x:37: > list:x:38: > irc:x:39: > src:x:40: > gnats:x:41: > shadow:x:42: > utmp:x:43: > video:x:44: > sasl:x:45: > plugdev:x:46: > staff:x:50: > games:x:60: > users:x:100: > nogroup:x:65534: > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd > root:x:0:0:root:/root:/bin/bash > daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin > bin:x:2:2:bin:/bin:/usr/sbin/nologin > sys:x:3:3:sys:/dev:/usr/sbin/nologin > sync:x:4:65534:sync:/bin:/bin/sync > games:x:5:60:games:/usr/games:/usr/sbin/nologin > man:x:6:12:man:/var/cache/man:/usr/sbin/nologin > lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin > mail:x:8:8:mail:/var/mail:/usr/sbin/nologin > news:x:9:9:news:/var/spool/news:/usr/sbin/nologin > uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin > proxy:x:13:13:proxy:/bin:/usr/sbin/nologin > www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin > backup:x:34:34:backup:/var/backups:/usr/sbin/nologin > list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin > irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin > gnats:x:41:41:Gnats Bug-Reporting System > (admin):/var/lib/gnats:/usr/sbin/nologin > nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin > _apt:x:100:65534::/nonexistent:/usr/sbin/nologin > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd > root:x:0:0:root:/root:/bin/bash > daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin > bin:x:2:2:bin:/bin:/usr/sbin/nologin > sys:x:3:3:sys:/dev:/usr/sbin/nologin > sync:x:4:65534:sync:/bin:/bin/sync > games:x:5:60:games:/usr/games:/usr/sbin/nologin > man:x:6:12:man:/var/cache/man:/usr/sbin/nologin > lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin > mail:x:8:8:mail:/var/mail:/usr/sbin/nologin > news:x:9:9:news:/var/spool/news:/usr/sbin/nologin > uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin > proxy:x:13:13:proxy:/bin:/usr/sbin/nologin > www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin > backup:x:34:34:backup:/var/backups:/usr/sbin/nologin > list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin > irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin > gnats:x:41:41:Gnats Bug-Reporting System > (admin):/var/lib/gnats:/usr/sbin/nologin > nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin > _apt:x:100:65534::/nonexistent:/usr/sbin/nologin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47107) Implement partition reader for python streaming data source
Chaoqin Li created SPARK-47107: -- Summary: Implement partition reader for python streaming data source Key: SPARK-47107 URL: https://issues.apache.org/jira/browse/SPARK-47107 Project: Spark Issue Type: Improvement Components: PySpark, SS Affects Versions: 4.0.0 Reporter: Chaoqin Li Piggy back the PythonPartitionReaderFactory to implement reading a data partition for python streaming data source. Add test case to verify that python streaming data source can read and process data end to end. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47106) Plan canonicalization test serializes/deserializes class that is not serializable
[ https://issues.apache.org/jira/browse/SPARK-47106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47106: -- Affects Version/s: 4.0.0 (was: 3.4.0) (was: 3.4.1) > Plan canonicalization test serializes/deserializes class that is not > serializable > - > > Key: SPARK-47106 > URL: https://issues.apache.org/jira/browse/SPARK-47106 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 4.0.0 >Reporter: Parth Chandra >Priority: Minor > > The test > {code:java} > test("SPARK-23731 plans should be canonicalizable after being > (de)serialized"){code} > serializes and deserializes > {code:java} > FileSourceScanExec{code} > which is not actually serializable. In particular, > FileSourceScanExec.relation is not serializable. > The test still passes though. > The test below derived from the above shows the issue - > {code:java} > test("verify FileSourceScanExec (de)serialize") { > withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") { > withTempPath { path => > spark.range(1).write.parquet(path.getAbsolutePath) > val df = spark.read.parquet(path.getAbsolutePath) > val fileSourceScanExec = > df.queryExecution.sparkPlan.collectFirst { case p: > FileSourceScanExec => p }.get > val serializer = SparkEnv.get.serializer.newInstance() > val relation = serializer.serialize(fileSourceScanExec.relation) > assert(relation != null) > val deserialized = > > serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec)) > assert(deserialized.relation != null) > } > } > }{code} > > The test fails with - > {code:java} > (file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1)) > - field (class: > org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, > type: interface org.apache.spark.sql.execution.datasources.FileIndex) > - object (class > org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet) > java.io.NotSerializableException: > org.apache.spark.sql.execution.datasources.InMemoryFileIndex > Serialization stack: > - object not serializable (class: > org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: > org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1)) > - field (class: > org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, > type: interface org.apache.spark.sql.execution.datasources.FileIndex) > - object (class > org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet) > at > org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66) > at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38) > at > org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32) > at > org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266) > at > org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264) > at > org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47106) Plan canonicalization test serializes/deserializes class that is not serializable
[ https://issues.apache.org/jira/browse/SPARK-47106?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-47106: -- Issue Type: Improvement (was: Test) > Plan canonicalization test serializes/deserializes class that is not > serializable > - > > Key: SPARK-47106 > URL: https://issues.apache.org/jira/browse/SPARK-47106 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Parth Chandra >Priority: Minor > > The test > {code:java} > test("SPARK-23731 plans should be canonicalizable after being > (de)serialized"){code} > serializes and deserializes > {code:java} > FileSourceScanExec{code} > which is not actually serializable. In particular, > FileSourceScanExec.relation is not serializable. > The test still passes though. > The test below derived from the above shows the issue - > {code:java} > test("verify FileSourceScanExec (de)serialize") { > withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") { > withTempPath { path => > spark.range(1).write.parquet(path.getAbsolutePath) > val df = spark.read.parquet(path.getAbsolutePath) > val fileSourceScanExec = > df.queryExecution.sparkPlan.collectFirst { case p: > FileSourceScanExec => p }.get > val serializer = SparkEnv.get.serializer.newInstance() > val relation = serializer.serialize(fileSourceScanExec.relation) > assert(relation != null) > val deserialized = > > serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec)) > assert(deserialized.relation != null) > } > } > }{code} > > The test fails with - > {code:java} > (file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1)) > - field (class: > org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, > type: interface org.apache.spark.sql.execution.datasources.FileIndex) > - object (class > org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet) > java.io.NotSerializableException: > org.apache.spark.sql.execution.datasources.InMemoryFileIndex > Serialization stack: > - object not serializable (class: > org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: > org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1)) > - field (class: > org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, > type: interface org.apache.spark.sql.execution.datasources.FileIndex) > - object (class > org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet) > at > org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41) > at > org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49) > at > org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66) > at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54) > at > org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38) > at > org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32) > at > org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266) > at > org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264) > at > org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32) > at > org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48) > {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-45557) Spark Connect can not be started because of missing user home dir in Docker container
[ https://issues.apache.org/jira/browse/SPARK-45557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818942#comment-17818942 ] Albert Wong commented on SPARK-45557: - Related https://issues.apache.org/jira/browse/SPARK-47105 > Spark Connect can not be started because of missing user home dir in Docker > container > - > > Key: SPARK-45557 > URL: https://issues.apache.org/jira/browse/SPARK-45557 > Project: Spark > Issue Type: Bug > Components: Spark Docker >Affects Versions: 3.4.0, 3.4.1, 3.5.0 >Reporter: Niels Pardon >Priority: Minor > > I was trying to start Spark Connect within a container using the Spark Docker > container images and ran into an issue where Ivy could not pull the Spark > Connect JAR since the user home /home/spark does not exist. > Steps to reproduce: > 1. Start the Spark container with `/bin/bash` as the command: > {code:java} > docker run -it --rm apache/spark:3.5.0 /bin/bash {code} > 2. Try to start Spark Connect within the container: > > {code:java} > /opt/spark/sbin/start-connect-server.sh --packages > org.apache.spark:spark-connect_2.12:3.5.0 {code} > which lead to this output: > > > {code:java} > starting org.apache.spark.sql.connect.service.SparkConnectServer, logging to > /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out > failed to launch: nice -n 0 bash /opt/spark/bin/spark-submit --class > org.apache.spark.sql.connect.service.SparkConnectServer --name Spark Connect > server --packages org.apache.spark:spark-connect_2.12:3.5.0 > at > org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1535) > at > org.apache.spark.util.DependencyUtils$.resolveMavenDependencies(DependencyUtils.scala:185) > at > org.apache.spark.deploy.SparkSubmit.prepareSubmitEnvironment(SparkSubmit.scala:334) > at > org.apache.spark.deploy.SparkSubmit.org$apache$spark$deploy$SparkSubmit$$runMain(SparkSubmit.scala:964) > at > org.apache.spark.deploy.SparkSubmit.doRunMain$1(SparkSubmit.scala:194) > at org.apache.spark.deploy.SparkSubmit.submit(SparkSubmit.scala:217) > at org.apache.spark.deploy.SparkSubmit.doSubmit(SparkSubmit.scala:91) > at > org.apache.spark.deploy.SparkSubmit$$anon$2.doSubmit(SparkSubmit.scala:1120) > at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:1129) > at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala) > full log in > /opt/spark/logs/spark--org.apache.spark.sql.connect.service.SparkConnectServer-1-d8470a71dbd7.out > {code} > where then the full log file looks like this: > {code:java} > Spark Command: /opt/java/openjdk/bin/java -cp > /opt/spark/conf:/opt/spark/jars/* -Xmx1g -XX:+IgnoreUnrecognizedVMOptions > --add-opens=java.base/java.lang=ALL-UNNAMED > --add-opens=java.base/java.lang.invoke=ALL-UNNAMED > --add-opens=java.base/java.lang.reflect=ALL-UNNAMED > --add-opens=java.base/java.io=ALL-UNNAMED > --add-opens=java.base/java.net=ALL-UNNAMED > --add-opens=java.base/java.nio=ALL-UNNAMED > --add-opens=java.base/java.util=ALL-UNNAMED > --add-opens=java.base/java.util.concurrent=ALL-UNNAMED > --add-opens=java.base/java.util.concurrent.atomic=ALL-UNNAMED > --add-opens=java.base/sun.nio.ch=ALL-UNNAMED > --add-opens=java.base/sun.nio.cs=ALL-UNNAMED > --add-opens=java.base/sun.security.action=ALL-UNNAMED > --add-opens=java.base/sun.util.calendar=ALL-UNNAMED > --add-opens=java.security.jgss/sun.security.krb5=ALL-UNNAMED > -Djdk.reflect.useDirectMethodHandle=false org.apache.spark.deploy.SparkSubmit > --class org.apache.spark.sql.connect.service.SparkConnectServer --name Spark > Connect server --packages org.apache.spark:spark-connect_2.12:3.5.0 > spark-internal > > :: loading settings :: url = > jar:file:/opt/spark/jars/ivy-2.5.1.jar!/org/apache/ivy/core/settings/ivysettings.xml > Ivy Default Cache set to: /home/spark/.ivy2/cache > The jars for the packages stored in: /home/spark/.ivy2/jars > org.apache.spark#spark-connect_2.12 added as a dependency > :: resolving dependencies :: > org.apache.spark#spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5;1.0 > confs: [default] > Exception in thread "main" java.io.FileNotFoundException: > /home/spark/.ivy2/cache/resolved-org.apache.spark-spark-submit-parent-f8a04936-e8af-4f37-bdb0-e4026a8a3be5-1.0.xml > (No such file or directory) > at java.base/java.io.FileOutputStream.open0(Native Method) > at java.base/java.io.FileOutputStream.open(Unknown Source) > at java.base/java.io.FileOutputStream.(Unknown Source) > at java.base/java.io.FileOutputStream.(Unknown Source) > at >
[jira] [Updated] (SPARK-47105) Spark Container doesn't have spark group or spark user created
[ https://issues.apache.org/jira/browse/SPARK-47105?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Albert Wong updated SPARK-47105: Component/s: Spark Docker > Spark Container doesn't have spark group or spark user created > -- > > Key: SPARK-47105 > URL: https://issues.apache.org/jira/browse/SPARK-47105 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Docker >Affects Versions: 3.4.1 > Environment: Using container apache/spark-py:latest >Reporter: Albert Wong >Priority: Critical > > I see that > [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19] > is supposed to have a spark user and spark group created but checking the > container, it doesn't have those uid and gid created. Both should have 185 > uid and 185 gid. > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group > root:x:0: > daemon:x:1: > bin:x:2: > sys:x:3: > adm:x:4: > tty:x:5: > disk:x:6: > lp:x:7: > mail:x:8: > news:x:9: > uucp:x:10: > man:x:12: > proxy:x:13: > kmem:x:15: > dialout:x:20: > fax:x:21: > voice:x:22: > cdrom:x:24: > floppy:x:25: > tape:x:26: > sudo:x:27: > audio:x:29: > dip:x:30: > www-data:x:33: > backup:x:34: > operator:x:37: > list:x:38: > irc:x:39: > src:x:40: > gnats:x:41: > shadow:x:42: > utmp:x:43: > video:x:44: > sasl:x:45: > plugdev:x:46: > staff:x:50: > games:x:60: > users:x:100: > nogroup:x:65534: > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd > root:x:0:0:root:/root:/bin/bash > daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin > bin:x:2:2:bin:/bin:/usr/sbin/nologin > sys:x:3:3:sys:/dev:/usr/sbin/nologin > sync:x:4:65534:sync:/bin:/bin/sync > games:x:5:60:games:/usr/games:/usr/sbin/nologin > man:x:6:12:man:/var/cache/man:/usr/sbin/nologin > lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin > mail:x:8:8:mail:/var/mail:/usr/sbin/nologin > news:x:9:9:news:/var/spool/news:/usr/sbin/nologin > uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin > proxy:x:13:13:proxy:/bin:/usr/sbin/nologin > www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin > backup:x:34:34:backup:/var/backups:/usr/sbin/nologin > list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin > irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin > gnats:x:41:41:Gnats Bug-Reporting System > (admin):/var/lib/gnats:/usr/sbin/nologin > nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin > _apt:x:100:65534::/nonexistent:/usr/sbin/nologin > I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd > root:x:0:0:root:/root:/bin/bash > daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin > bin:x:2:2:bin:/bin:/usr/sbin/nologin > sys:x:3:3:sys:/dev:/usr/sbin/nologin > sync:x:4:65534:sync:/bin:/bin/sync > games:x:5:60:games:/usr/games:/usr/sbin/nologin > man:x:6:12:man:/var/cache/man:/usr/sbin/nologin > lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin > mail:x:8:8:mail:/var/mail:/usr/sbin/nologin > news:x:9:9:news:/var/spool/news:/usr/sbin/nologin > uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin > proxy:x:13:13:proxy:/bin:/usr/sbin/nologin > www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin > backup:x:34:34:backup:/var/backups:/usr/sbin/nologin > list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin > irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin > gnats:x:41:41:Gnats Bug-Reporting System > (admin):/var/lib/gnats:/usr/sbin/nologin > nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin > _apt:x:100:65534::/nonexistent:/usr/sbin/nologin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-40513) SPIP: Support Docker Official Image for Spark
[ https://issues.apache.org/jira/browse/SPARK-40513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818935#comment-17818935 ] Albert Wong commented on SPARK-40513: - Related issue. https://issues.apache.org/jira/browse/SPARK-47105 > SPIP: Support Docker Official Image for Spark > - > > Key: SPARK-40513 > URL: https://issues.apache.org/jira/browse/SPARK-40513 > Project: Spark > Issue Type: New Feature > Components: Kubernetes, Spark Docker >Affects Versions: 3.5.0 >Reporter: Yikun Jiang >Assignee: Yikun Jiang >Priority: Major > Labels: SPIP, pull-request-available > Fix For: 3.5.0 > > > This SPIP is proposed to add [Docker Official > Image(DOI)|https://github.com/docker-library/official-images] to ensure the > Spark Docker images meet the quality standards for Docker images, to provide > these Docker images for users who want to use Apache Spark via Docker image. > There are also several [Apache projects that release the Docker Official > Images|https://hub.docker.com/search?q=apache_filter=official], such > as: [flink|https://hub.docker.com/_/flink], > [storm|https://hub.docker.com/_/storm], [solr|https://hub.docker.com/_/solr], > [zookeeper|https://hub.docker.com/_/zookeeper], > [httpd|https://hub.docker.com/_/httpd] (with 50M+ to 1B+ download for each). > From the huge download statistics, we can see the real demands of users, and > from the support of other apache projects, we should also be able to do it. > After support: > * The Dockerfile will still be maintained by the Apache Spark community and > reviewed by Docker. > * The images will be maintained by the Docker community to ensure the > quality standards for Docker images of the Docker community. > It will also reduce the extra docker images maintenance effort (such as > frequently rebuilding, image security update) of the Apache Spark community. > > SPIP DOC: > [https://docs.google.com/document/d/1nN-pKuvt-amUcrkTvYAQ-bJBgtsWb9nAkNoVNRM2S2o] > DISCUSS: [https://lists.apache.org/thread/l1793y5224n8bqkp3s6ltgkykso4htb3] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47104) Spark SQL query fails with NullPointerException
[ https://issues.apache.org/jira/browse/SPARK-47104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818934#comment-17818934 ] Bruce Robbins commented on SPARK-47104: --- It's not a CSV specific issue. You can reproduce with a cached view. The following fails on the master branch, when using {{spark-sql}}: {noformat} create or replace temp view v1(id, name) as values (1, "fred"), (2, "bob"); cache table v1; select name, uuid() as _iid from ( select s.name from v1 s join v1 t on s.name = t.name order by name ) limit 20; {noformat} The exception is: {noformat} java.lang.NullPointerException: Cannot invoke "org.apache.spark.sql.catalyst.util.RandomUUIDGenerator.getNextUUIDUTF8String()" because "this.randomGen_0" is null at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$6(limit.scala:297) at scala.collection.ArrayOps$.map$extension(ArrayOps.scala:934) at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$1(limit.scala:297) at org.apache.spark.sql.execution.SparkPlan.$anonfun$executeQuery$1(SparkPlan.scala:246) at org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151) at org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:243) at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:286) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:390) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:418) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:390) {noformat} It seems that non-deterministic expressions are not getting initialized before being used in the unsafe projection. I can take a look. > Spark SQL query fails with NullPointerException > --- > > Key: SPARK-47104 > URL: https://issues.apache.org/jira/browse/SPARK-47104 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: Chhavi Bansal >Priority: Major > > I am trying to run a very simple SQL query involving join and orderby clause > and then using UUID() function in the outermost select stmt. The query fails > {code:java} > val df = spark.read.format("csv").option("header", > "true").load("src/main/resources/titanic.csv") > df.createOrReplaceTempView("titanic") > val query = spark.sql(" select name, uuid() as _iid from (select s.name from > titanic s join titanic t on s.name = t.name order by name) ;") > query.show() // FAILS{code} > Dataset is a normal csv file with the following columns > {code:java} > PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked > {code} > Below is the error > {code:java} > Exception in thread "main" java.lang.NullPointerException > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown > Source) > at > org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$2(limit.scala:207) > at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) > at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) > at scala.collection.TraversableLike.map(TraversableLike.scala:237) > at scala.collection.TraversableLike.map$(TraversableLike.scala:230) > at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) > at > org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:207) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:338) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:366) > at > org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:338) > at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715) > at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728) > at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at >
[jira] [Updated] (SPARK-47085) Preformance issue on thrift API
[ https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Izek Greenfield updated SPARK-47085: Description: This new complexity was introduced in SPARK-39041. before this PR the code was: {code:java} while (curRow < maxRows && iter.hasNext) { val sparkRow = iter.next() val row = ArrayBuffer[Any]() var curCol = 0 while (curCol < sparkRow.length) { if (sparkRow.isNullAt(curCol)) { row += null } else { addNonNullColumnValue(sparkRow, row, curCol, timeFormatters) } curCol += 1 } resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]]) curRow += 1 }{code} foreach without the _*O(n^2)*_ complexity so this change just return the state to what it was before. In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: {code:scala} ... while (i < rowSize) { val row = rows(I) ... {code} It can be easily converted back into _*O( n )*_ complexity. was: This new complexity was introduced in SPARK-39041. before this PR the code was: {code:java} def toTTableSchema(schema: StructType): TTableSchema = { val tTableSchema = new TTableSchema() schema.zipWithIndex.foreach { case (f, i) => tTableSchema.addToColumns(toTColumnDesc(f, i)) } tTableSchema } {code} foreach without the _*O(n^2)*_ complexity so this change just return the state to what it was before. In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: {code:scala} ... while (i < rowSize) { val row = rows(I) ... {code} It can be easily converted back into _*O( n )*_ complexity. > Preformance issue on thrift API > --- > > Key: SPARK-47085 > URL: https://issues.apache.org/jira/browse/SPARK-47085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Izek Greenfield >Assignee: Izek Greenfield >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This new complexity was introduced in SPARK-39041. > before this PR the code was: > {code:java} > while (curRow < maxRows && iter.hasNext) { > val sparkRow = iter.next() > val row = ArrayBuffer[Any]() > var curCol = 0 > while (curCol < sparkRow.length) { > if (sparkRow.isNullAt(curCol)) { > row += null > } else { > addNonNullColumnValue(sparkRow, row, curCol, timeFormatters) > } > curCol += 1 > } > resultRowSet.addRow(row.toArray.asInstanceOf[Array[Object]]) > curRow += 1 > }{code} > foreach without the _*O(n^2)*_ complexity so this change just return the > state to what it was before. > > In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: > {code:scala} > ... > while (i < rowSize) { > val row = rows(I) > ... > {code} > It can be easily converted back into _*O( n )*_ complexity. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47085) Preformance issue on thrift API
[ https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818932#comment-17818932 ] Izek Greenfield commented on SPARK-47085: - [~dongjoon] I updated the details > Preformance issue on thrift API > --- > > Key: SPARK-47085 > URL: https://issues.apache.org/jira/browse/SPARK-47085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Izek Greenfield >Assignee: Izek Greenfield >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This new complexity was introduced in SPARK-39041. > before this PR the code was: > {code:java} > def toTTableSchema(schema: StructType): TTableSchema = { > val tTableSchema = new TTableSchema() > schema.zipWithIndex.foreach { case (f, i) => > tTableSchema.addToColumns(toTColumnDesc(f, i)) > } > tTableSchema > } {code} > foreach without the _*O(n^2)*_ complexity so this change just return the > state to what it was before. > > In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: > {code:scala} > ... > while (i < rowSize) { > val row = rows(I) > ... > {code} > It can be easily converted back into _*O( n )*_ complexity. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47085) Preformance issue on thrift API
[ https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Izek Greenfield updated SPARK-47085: Description: This new complexity was introduced in SPARK-39041. before this PR the code was: {code:java} def toTTableSchema(schema: StructType): TTableSchema = { val tTableSchema = new TTableSchema() schema.zipWithIndex.foreach { case (f, i) => tTableSchema.addToColumns(toTColumnDesc(f, i)) } tTableSchema } {code} foreach without the _*O(n^2)*_ complexity so this change just return the state to what it was before. In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: {code:scala} ... while (i < rowSize) { val row = rows(I) ... {code} It can be easily converted back into _*O( n )*_ complexity. was: This new complexity was introduced in SPARK-39041. In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: {code:scala} ... while (i < rowSize) { val row = rows(I) ... {code} It can be easily converted back into _*O( n )*_ complexity. > Preformance issue on thrift API > --- > > Key: SPARK-47085 > URL: https://issues.apache.org/jira/browse/SPARK-47085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Izek Greenfield >Assignee: Izek Greenfield >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This new complexity was introduced in SPARK-39041. > before this PR the code was: > {code:java} > def toTTableSchema(schema: StructType): TTableSchema = { > val tTableSchema = new TTableSchema() > schema.zipWithIndex.foreach { case (f, i) => > tTableSchema.addToColumns(toTColumnDesc(f, i)) > } > tTableSchema > } {code} > foreach without the _*O(n^2)*_ complexity so this change just return the > state to what it was before. > > In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: > {code:scala} > ... > while (i < rowSize) { > val row = rows(I) > ... > {code} > It can be easily converted back into _*O( n )*_ complexity. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47106) Plan canonicalization test serializes/deserializes class that is not serializable
Parth Chandra created SPARK-47106: - Summary: Plan canonicalization test serializes/deserializes class that is not serializable Key: SPARK-47106 URL: https://issues.apache.org/jira/browse/SPARK-47106 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.4.1, 3.4.0 Reporter: Parth Chandra The test {code:java} test("SPARK-23731 plans should be canonicalizable after being (de)serialized"){code} serializes and deserializes {code:java} FileSourceScanExec{code} which is not actually serializable. In particular, FileSourceScanExec.relation is not serializable. The test still passes though. The test below derived from the above shows the issue - {code:java} test("verify FileSourceScanExec (de)serialize") { withSQLConf(SQLConf.USE_V1_SOURCE_LIST.key -> "parquet") { withTempPath { path => spark.range(1).write.parquet(path.getAbsolutePath) val df = spark.read.parquet(path.getAbsolutePath) val fileSourceScanExec = df.queryExecution.sparkPlan.collectFirst { case p: FileSourceScanExec => p }.get val serializer = SparkEnv.get.serializer.newInstance() val relation = serializer.serialize(fileSourceScanExec.relation) assert(relation != null) val deserialized = serializer.deserialize[FileSourceScanExec(serializer.serialize(fileSourceScanExec)) assert(deserialized.relation != null) } } }{code} The test fails with - {code:java} (file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1)) - field (class: org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, type: interface org.apache.spark.sql.execution.datasources.FileIndex) - object (class org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet) java.io.NotSerializableException: org.apache.spark.sql.execution.datasources.InMemoryFileIndex Serialization stack: - object not serializable (class: org.apache.spark.sql.execution.datasources.InMemoryFileIndex, value: org.apache.spark.sql.execution.datasources.InMemoryFileIndex(file:/private/var/folders/bz/gg_fqnmj4c17j2c7mdn8ps1mgn/T/spark-d534d738-64f1-4eaa-9d9e-8c33374b60f1)) - field (class: org.apache.spark.sql.execution.datasources.HadoopFsRelation, name: location, type: interface org.apache.spark.sql.execution.datasources.FileIndex) - object (class org.apache.spark.sql.execution.datasources.HadoopFsRelation, parquet) at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41) at org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:49) at org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:115) at org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11(SparkPlanSuite.scala:54) at org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$11$adapted(SparkPlanSuite.scala:48) at org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath(SQLHelper.scala:69) at org.apache.spark.sql.catalyst.plans.SQLHelper.withTempPath$(SQLHelper.scala:66) at org.apache.spark.sql.QueryTest.withTempPath(QueryTest.scala:33) at org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$10(SparkPlanSuite.scala:48) at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf(SQLHelper.scala:54) at org.apache.spark.sql.catalyst.plans.SQLHelper.withSQLConf$(SQLHelper.scala:38) at org.apache.spark.sql.execution.SparkPlanSuite.org$apache$spark$sql$test$SQLTestUtilsBase$$super$withSQLConf(SparkPlanSuite.scala:32) at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf(SQLTestUtils.scala:266) at org.apache.spark.sql.test.SQLTestUtilsBase.withSQLConf$(SQLTestUtils.scala:264) at org.apache.spark.sql.execution.SparkPlanSuite.withSQLConf(SparkPlanSuite.scala:32) at org.apache.spark.sql.execution.SparkPlanSuite.$anonfun$new$9(SparkPlanSuite.scala:48) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47105) Spark Container doesn't have spark group or spark user created
Albert Wong created SPARK-47105: --- Summary: Spark Container doesn't have spark group or spark user created Key: SPARK-47105 URL: https://issues.apache.org/jira/browse/SPARK-47105 Project: Spark Issue Type: Bug Components: Kubernetes Affects Versions: 3.4.1 Environment: Using container apache/spark-py:latest Reporter: Albert Wong I see that [https://github.com/apache/spark-docker/blob/431aa516ba58985c902bf2d2a07bf0eaa1df6740/3.4.1/scala2.12-java11-ubuntu/Dockerfile#L19] is supposed to have a spark user and spark group created but checking the container, it doesn't have those uid and gid created. Both should have 185 uid and 185 gid. I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/group root:x:0: daemon:x:1: bin:x:2: sys:x:3: adm:x:4: tty:x:5: disk:x:6: lp:x:7: mail:x:8: news:x:9: uucp:x:10: man:x:12: proxy:x:13: kmem:x:15: dialout:x:20: fax:x:21: voice:x:22: cdrom:x:24: floppy:x:25: tape:x:26: sudo:x:27: audio:x:29: dip:x:30: www-data:x:33: backup:x:34: operator:x:37: list:x:38: irc:x:39: src:x:40: gnats:x:41: shadow:x:42: utmp:x:43: video:x:44: sasl:x:45: plugdev:x:46: staff:x:50: games:x:60: users:x:100: nogroup:x:65534: I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/usr/sbin/nologin man:x:6:12:man:/var/cache/man:/usr/sbin/nologin lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin mail:x:8:8:mail:/var/mail:/usr/sbin/nologin news:x:9:9:news:/var/spool/news:/usr/sbin/nologin uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin proxy:x:13:13:proxy:/bin:/usr/sbin/nologin www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin backup:x:34:34:backup:/var/backups:/usr/sbin/nologin list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin _apt:x:100:65534::/nonexistent:/usr/sbin/nologin I have no name!@spark-hudi:/opt/spark/bin$ cat /etc/passwd root:x:0:0:root:/root:/bin/bash daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin bin:x:2:2:bin:/bin:/usr/sbin/nologin sys:x:3:3:sys:/dev:/usr/sbin/nologin sync:x:4:65534:sync:/bin:/bin/sync games:x:5:60:games:/usr/games:/usr/sbin/nologin man:x:6:12:man:/var/cache/man:/usr/sbin/nologin lp:x:7:7:lp:/var/spool/lpd:/usr/sbin/nologin mail:x:8:8:mail:/var/mail:/usr/sbin/nologin news:x:9:9:news:/var/spool/news:/usr/sbin/nologin uucp:x:10:10:uucp:/var/spool/uucp:/usr/sbin/nologin proxy:x:13:13:proxy:/bin:/usr/sbin/nologin www-data:x:33:33:www-data:/var/www:/usr/sbin/nologin backup:x:34:34:backup:/var/backups:/usr/sbin/nologin list:x:38:38:Mailing List Manager:/var/list:/usr/sbin/nologin irc:x:39:39:ircd:/run/ircd:/usr/sbin/nologin gnats:x:41:41:Gnats Bug-Reporting System (admin):/var/lib/gnats:/usr/sbin/nologin nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin _apt:x:100:65534::/nonexistent:/usr/sbin/nologin -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-45615) Remove redundant"Auto-application to `()` is deprecated" compile suppression rules.
[ https://issues.apache.org/jira/browse/SPARK-45615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-45615. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45179 [https://github.com/apache/spark/pull/45179] > Remove redundant"Auto-application to `()` is deprecated" compile suppression > rules. > --- > > Key: SPARK-45615 > URL: https://issues.apache.org/jira/browse/SPARK-45615 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > Fix For: 4.0.0 > > > Due to the issue https://github.com/scalatest/scalatest/issues/2297, we need > to wait until we upgrade a scalatest version before removing these > suppression rules. > Maybe 3.2.18 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-45615) Remove redundant"Auto-application to `()` is deprecated" compile suppression rules.
[ https://issues.apache.org/jira/browse/SPARK-45615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-45615: - Assignee: Yang Jie > Remove redundant"Auto-application to `()` is deprecated" compile suppression > rules. > --- > > Key: SPARK-45615 > URL: https://issues.apache.org/jira/browse/SPARK-45615 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Minor > Labels: pull-request-available > > Due to the issue https://github.com/scalatest/scalatest/issues/2297, we need > to wait until we upgrade a scalatest version before removing these > suppression rules. > Maybe 3.2.18 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47098) Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows
[ https://issues.apache.org/jira/browse/SPARK-47098?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-47098. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45175 [https://github.com/apache/spark/pull/45175] > Migrate from AppVeyor to GitHub Actions for SparkR tests on Windows > --- > > Key: SPARK-47098 > URL: https://issues.apache.org/jira/browse/SPARK-47098 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Reduce the tools we use for better maintenance -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-47085) Preformance issue on thrift API
[ https://issues.apache.org/jira/browse/SPARK-47085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818878#comment-17818878 ] Dongjoon Hyun commented on SPARK-47085: --- Hi, [~igreenfi]and [~yao]. Could you provide some background why this is a regression at 3.4.1 and 3.5.0? If this is not a regression at that version, we should change `Affected Versions` to `4.0.0` because this is an improvement. > Preformance issue on thrift API > --- > > Key: SPARK-47085 > URL: https://issues.apache.org/jira/browse/SPARK-47085 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.1, 3.5.0 >Reporter: Izek Greenfield >Assignee: Izek Greenfield >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > This new complexity was introduced in SPARK-39041. > In class `RowSetUtils` there is a loop that has _*O(n^2)*_ complexity: > {code:scala} > ... > while (i < rowSize) { > val row = rows(I) > ... > {code} > It can be easily converted back into _*O( n )*_ complexity. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-46858) Upgrade Pandas to 2.2.0
[ https://issues.apache.org/jira/browse/SPARK-46858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-46858. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 44881 [https://github.com/apache/spark/pull/44881] > Upgrade Pandas to 2.2.0 > --- > > Key: SPARK-46858 > URL: https://issues.apache.org/jira/browse/SPARK-46858 > Project: Spark > Issue Type: Bug > Components: Pandas API on Spark, PySpark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175
[ https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818849#comment-17818849 ] Nikola Mandic commented on SPARK-42328: --- [~maxgekk] Yes, thank you. > Assign name to _LEGACY_ERROR_TEMP_1175 > -- > > Key: SPARK-42328 > URL: https://issues.apache.org/jira/browse/SPARK-42328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175
[ https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818844#comment-17818844 ] Max Gekk commented on SPARK-42328: -- @nikolamand-db Would you like to work on this? > Assign name to _LEGACY_ERROR_TEMP_1175 > -- > > Key: SPARK-42328 > URL: https://issues.apache.org/jira/browse/SPARK-42328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-42328) Assign name to _LEGACY_ERROR_TEMP_1175
[ https://issues.apache.org/jira/browse/SPARK-42328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818844#comment-17818844 ] Max Gekk edited comment on SPARK-42328 at 2/20/24 2:53 PM: --- [~nikolamand-db] Would you like to work on this? was (Author: maxgekk): @nikolamand-db Would you like to work on this? > Assign name to _LEGACY_ERROR_TEMP_1175 > -- > > Key: SPARK-42328 > URL: https://issues.apache.org/jira/browse/SPARK-42328 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Haejoon Lee >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47104) Spark SQL query fails with NullPointerException
Chhavi Bansal created SPARK-47104: - Summary: Spark SQL query fails with NullPointerException Key: SPARK-47104 URL: https://issues.apache.org/jira/browse/SPARK-47104 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: Chhavi Bansal I am trying to run a very simple SQL query involving join and orderby clause and then using UUID() function in the outermost select stmt. The query fails {code:java} val df = spark.read.format("csv").option("header", "true").load("src/main/resources/titanic.csv") df.createOrReplaceTempView("titanic") val query = spark.sql(" select name, uuid() as _iid from (select s.name from titanic s join titanic t on s.name = t.name order by name) ;") query.show() // FAILS{code} Dataset is a normal csv file with the following columns {code:java} PassengerId,Survived,Pclass,Name,Gender,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked {code} Below is the error {code:java} Exception in thread "main" java.lang.NullPointerException at org.apache.spark.sql.catalyst.expressions.GeneratedClass$SpecificUnsafeProjection.apply(Unknown Source) at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.$anonfun$executeCollect$2(limit.scala:207) at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237) at scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) at scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:198) at scala.collection.TraversableLike.map(TraversableLike.scala:237) at scala.collection.TraversableLike.map$(TraversableLike.scala:230) at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:198) at org.apache.spark.sql.execution.TakeOrderedAndProjectExec.executeCollect(limit.scala:207) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.$anonfun$executeCollect$1(AdaptiveSparkPlanExec.scala:338) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.withFinalPlanUpdate(AdaptiveSparkPlanExec.scala:366) at org.apache.spark.sql.execution.adaptive.AdaptiveSparkPlanExec.executeCollect(AdaptiveSparkPlanExec.scala:338) at org.apache.spark.sql.Dataset.collectFromPlan(Dataset.scala:3715) at org.apache.spark.sql.Dataset.$anonfun$head$1(Dataset.scala:2728) at org.apache.spark.sql.Dataset.$anonfun$withAction$1(Dataset.scala:3706) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.Dataset.withAction(Dataset.scala:3704) at org.apache.spark.sql.Dataset.head(Dataset.scala:2728) at org.apache.spark.sql.Dataset.take(Dataset.scala:2935) at org.apache.spark.sql.Dataset.getRows(Dataset.scala:287) at org.apache.spark.sql.Dataset.showString(Dataset.scala:326) at org.apache.spark.sql.Dataset.show(Dataset.scala:808) at org.apache.spark.sql.Dataset.show(Dataset.scala:785) at hyperspace2.sparkPlan$.delayedEndpoint$hyperspace2$sparkPlan$1(sparkPlan.scala:14) at hyperspace2.sparkPlan$delayedInit$body.apply(sparkPlan.scala:6) at scala.Function0.apply$mcV$sp(Function0.scala:39) at scala.Function0.apply$mcV$sp$(Function0.scala:39) at scala.runtime.AbstractFunction0.apply$mcV$sp(AbstractFunction0.scala:17) at scala.App.$anonfun$main$1$adapted(App.scala:80) at scala.collection.immutable.List.foreach(List.scala:392) at scala.App.main(App.scala:80) at scala.App.main$(App.scala:78) at hyperspace2.sparkPlan$.main(sparkPlan.scala:6) at hyperspace2.sparkPlan.main(sparkPlan.scala) {code} Note: # here if I remove order by clause then it produces the correct output. # This happens when I read the dataset using csv file, works fine if I make the dataframe using Seq().toDf # The query fails if I use spark.sql("query").show() but is success when I simple write it to csv file [https://stackoverflow.com/questions/78020267/spark-sql-query-fails-with-nullpointerexception] Please can someone look into why this happens just when using `show()` since this is failing queries in production for me. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-47044) Add JDBC query to explain formatted command
[ https://issues.apache.org/jira/browse/SPARK-47044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-47044: --- Assignee: Uros Stankovic > Add JDBC query to explain formatted command > --- > > Key: SPARK-47044 > URL: https://issues.apache.org/jira/browse/SPARK-47044 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Major > Labels: pull-request-available > > Add generated JDBC query to EXPLAIN FORMATTED command when physical Scan node > should access to JDBC source to create RDD. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47044) Add JDBC query to explain formatted command
[ https://issues.apache.org/jira/browse/SPARK-47044?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-47044. - Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45102 [https://github.com/apache/spark/pull/45102] > Add JDBC query to explain formatted command > --- > > Key: SPARK-47044 > URL: https://issues.apache.org/jira/browse/SPARK-47044 > Project: Spark > Issue Type: Task > Components: Spark Core >Affects Versions: 4.0.0 >Reporter: Uros Stankovic >Assignee: Uros Stankovic >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > > Add generated JDBC query to EXPLAIN FORMATTED command when physical Scan node > should access to JDBC source to create RDD. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47103) Make the default storage level of intermediate datasets for MLlib configurable
[ https://issues.apache.org/jira/browse/SPARK-47103?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47103: --- Labels: pull-request-available (was: ) > Make the default storage level of intermediate datasets for MLlib configurable > -- > > Key: SPARK-47103 > URL: https://issues.apache.org/jira/browse/SPARK-47103 > Project: Spark > Issue Type: Improvement > Components: MLlib >Affects Versions: 4.0.0 >Reporter: Cheng Pan >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47103) Make the default storage level of intermediate datasets for MLlib configurable
Cheng Pan created SPARK-47103: - Summary: Make the default storage level of intermediate datasets for MLlib configurable Key: SPARK-47103 URL: https://issues.apache.org/jira/browse/SPARK-47103 Project: Spark Issue Type: Improvement Components: MLlib Affects Versions: 4.0.0 Reporter: Cheng Pan -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.
[ https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-46992: --- Labels: correctness pull-request-available (was: correctness) > Inconsistent results with 'sort', 'cache', and AQE. > --- > > Key: SPARK-46992 > URL: https://issues.apache.org/jira/browse/SPARK-46992 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.5.0 >Reporter: Denis Tarima >Priority: Critical > Labels: correctness, pull-request-available > > > With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes > {color:#4c9aff}sample{color} results after caching. > Moreover, when cached, {color:#4c9aff}collect{color} returns records as if > it's not cached, which is inconsistent with {color:#4c9aff}count{color} and > {color:#4c9aff}show{color}. > A script to reproduce: > {code:scala} > import spark.implicits._ > val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123) > println("NON CACHED:") > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > println("CACHED:") > df.cache().count() > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > df.unpersist() > {code} > output: > {code:java} > NON CACHED: > count: 2 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 4| > +---+ > CACHED: > count: 3 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 2| > | 3| > +---+ > {code} > BTW, disabling AQE > [{color:#4c9aff}spark.conf.set("spark.databricks.optimizer.adaptive.enabled", > "false"){color}] helps on Databricks clusters, but locally it has no effect, > at least on Spark 3.3.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-46992) Inconsistent results with 'sort', 'cache', and AQE.
[ https://issues.apache.org/jira/browse/SPARK-46992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818784#comment-17818784 ] Jie Han commented on SPARK-46992: - It's because the second collect() reuses qe.executedPlan which is a lazy variable already initialized by the first collect() call. Let's see this code: {code:java} df.collect() // qe.executedPlan firstly initialized here df.cache() df.collect() // reuse the qe.executedPlan{code} > Inconsistent results with 'sort', 'cache', and AQE. > --- > > Key: SPARK-46992 > URL: https://issues.apache.org/jira/browse/SPARK-46992 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.2, 3.5.0 >Reporter: Denis Tarima >Priority: Critical > Labels: correctness > > > With AQE enabled, having {color:#4c9aff}sort{color} in the plan changes > {color:#4c9aff}sample{color} results after caching. > Moreover, when cached, {color:#4c9aff}collect{color} returns records as if > it's not cached, which is inconsistent with {color:#4c9aff}count{color} and > {color:#4c9aff}show{color}. > A script to reproduce: > {code:scala} > import spark.implicits._ > val df = (1 to 4).toDF("id").sort("id").sample(0.4, 123) > println("NON CACHED:") > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > println("CACHED:") > df.cache().count() > println(" count: " + df.count()) > println(" collect: " + df.collect().mkString(" ")) > df.show() > df.unpersist() > {code} > output: > {code:java} > NON CACHED: > count: 2 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 4| > +---+ > CACHED: > count: 3 > collect: [1] [4] > +---+ > | id| > +---+ > | 1| > | 2| > | 3| > +---+ > {code} > BTW, disabling AQE > [{color:#4c9aff}spark.conf.set("spark.databricks.optimizer.adaptive.enabled", > "false"){color}] helps on Databricks clusters, but locally it has no effect, > at least on Spark 3.3.2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47009) Create table with collation
[ https://issues.apache.org/jira/browse/SPARK-47009?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Kandic updated SPARK-47009: -- Epic Link: SPARK-46830 > Create table with collation > --- > > Key: SPARK-47009 > URL: https://issues.apache.org/jira/browse/SPARK-47009 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > > Add support for creating table with columns containing non-default collated > data -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47102) Add COLLATION_ENABLED config flag
[ https://issues.apache.org/jira/browse/SPARK-47102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Kandic updated SPARK-47102: -- Epic Link: SPARK-46830 > Add COLLATION_ENABLED config flag > - > > Key: SPARK-47102 > URL: https://issues.apache.org/jira/browse/SPARK-47102 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Mihailo Milosevic >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47015) Disable partitioning on collated columns
[ https://issues.apache.org/jira/browse/SPARK-47015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stefan Kandic updated SPARK-47015: -- Epic Link: SPARK-46830 > Disable partitioning on collated columns > > > Key: SPARK-47015 > URL: https://issues.apache.org/jira/browse/SPARK-47015 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 4.0.0 >Reporter: Stefan Kandic >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47102) Add COLLATION_ENABLED config flag
Mihailo Milosevic created SPARK-47102: - Summary: Add COLLATION_ENABLED config flag Key: SPARK-47102 URL: https://issues.apache.org/jira/browse/SPARK-47102 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 4.0.0 Reporter: Mihailo Milosevic -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47079) Unable to create PySpark dataframe containing Variant columns
[ https://issues.apache.org/jira/browse/SPARK-47079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47079: --- Labels: pull-request-available (was: ) > Unable to create PySpark dataframe containing Variant columns > - > > Key: SPARK-47079 > URL: https://issues.apache.org/jira/browse/SPARK-47079 > Project: Spark > Issue Type: Bug > Components: Connect, PySpark, SQL >Affects Versions: 3.5.0 >Reporter: Desmond Cheong >Priority: Major > Labels: pull-request-available > > Trying to create a dataframe containing a variant type results in: > AssertionError: Undefined error message parameter for error class: > CANNOT_PARSE_DATATYPE. Parameters: {'error': "Undefined error message > parameter for error class: CANNOT_PARSE_DATATYPE. Parameters: > {'error': 'variant'} > "} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44149) Support DataFrame Merge API
[ https://issues.apache.org/jira/browse/SPARK-44149?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818746#comment-17818746 ] Hussein Awala commented on SPARK-44149: --- Is it duplicated by SPARK-46207 which was fixed by [#44119|https://github.com/apache/spark/pull/44119]? or it's a different Merge support? > Support DataFrame Merge API > --- > > Key: SPARK-44149 > URL: https://issues.apache.org/jira/browse/SPARK-44149 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Huaxin Gao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17818730#comment-17818730 ] Mihailo Milosevic commented on SPARK-43259: --- I want to work on this issue. Raised a PR for same https://github.com/apache/spark/pull/45095 > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules
[ https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot updated SPARK-47101: --- Labels: pull-request-available (was: ) > HiveExternalCatalog.verifyDataSchema does not fully comply with hive column > name rules > -- > > Key: SPARK-47101 > URL: https://issues.apache.org/jira/browse/SPARK-47101 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > Labels: pull-request-available > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules
[ https://issues.apache.org/jira/browse/SPARK-47101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-47101: - Summary: HiveExternalCatalog.verifyDataSchema does not fully comply with hive column name rules (was: HiveExternalCatalog.verifyDataSchema does not fully comply hive column name rules) > HiveExternalCatalog.verifyDataSchema does not fully comply with hive column > name rules > -- > > Key: SPARK-47101 > URL: https://issues.apache.org/jira/browse/SPARK-47101 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.4.2, 3.5.0, 4.0.0 >Reporter: Kent Yao >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-47101) HiveExternalCatalog.verifyDataSchema does not fully comply hive column name rules
Kent Yao created SPARK-47101: Summary: HiveExternalCatalog.verifyDataSchema does not fully comply hive column name rules Key: SPARK-47101 URL: https://issues.apache.org/jira/browse/SPARK-47101 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.5.0, 3.4.2, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43259: -- Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46743) Count bug introduced for scalar subquery when using TEMPORARY VIEW, as compared to using table
[ https://issues.apache.org/jira/browse/SPARK-46743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46743: -- Assignee: (was: Apache Spark) > Count bug introduced for scalar subquery when using TEMPORARY VIEW, as > compared to using table > -- > > Key: SPARK-46743 > URL: https://issues.apache.org/jira/browse/SPARK-46743 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.5.0 >Reporter: Andy Lam >Priority: Major > Labels: pull-request-available > > Using the temp view reproduces COUNT bug, returns nulls instead of 0. > With a table: > {code:java} > scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM > VALUES > | (1, 1), > | (2, 1), > | (3, 3), > | (6, 6), > | (7, 7), > | (9, 9) AS inner_table(a, b)""") > val res6: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null > AS int) AS a, CAST(null as int) AS b ;") > val res7: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM > null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect() > val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], > [0]) {code} > With a view: > > {code:java} > spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, > 1),(3, 3), (6, 6), (7, 7), (9, 9);") > spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), > CAST(null as int);") > spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view > WHERE null_view.a = outer_view.a) FROM outer_view""").collect() > val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], > [null], [null], [null]){code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-46743) Count bug introduced for scalar subquery when using TEMPORARY VIEW, as compared to using table
[ https://issues.apache.org/jira/browse/SPARK-46743?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-46743: -- Assignee: Apache Spark > Count bug introduced for scalar subquery when using TEMPORARY VIEW, as > compared to using table > -- > > Key: SPARK-46743 > URL: https://issues.apache.org/jira/browse/SPARK-46743 > Project: Spark > Issue Type: Bug > Components: Optimizer >Affects Versions: 3.5.0 >Reporter: Andy Lam >Assignee: Apache Spark >Priority: Major > Labels: pull-request-available > > Using the temp view reproduces COUNT bug, returns nulls instead of 0. > With a table: > {code:java} > scala> spark.sql("""CREATE TABLE outer_table USING parquet AS SELECT * FROM > VALUES > | (1, 1), > | (2, 1), > | (3, 3), > | (6, 6), > | (7, 7), > | (9, 9) AS inner_table(a, b)""") > val res6: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("CREATE TABLE null_table USING parquet AS SELECT CAST(null > AS int) AS a, CAST(null as int) AS b ;") > val res7: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("""SELECT ( SELECT COUNT(null_table.a) AS aggAlias FROM > null_table WHERE null_table.a = outer_table.a) FROM outer_table""").collect() > val res8: Array[org.apache.spark.sql.Row] = Array([0], [0], [0], [0], [0], > [0]) {code} > With a view: > > {code:java} > spark.sql("CREATE TEMPORARY VIEW outer_view(a, b) AS VALUES (1, 1), (2, > 1),(3, 3), (6, 6), (7, 7), (9, 9);") > spark.sql("CREATE TEMPORARY VIEW null_view(a, b) AS SELECT CAST(null AS int), > CAST(null as int);") > spark.sql("""SELECT ( SELECT COUNT(null_view.a) AS aggAlias FROM null_view > WHERE null_view.a = outer_view.a) FROM outer_view""").collect() > val res2: Array[org.apache.spark.sql.Row] = Array([null], [null], [null], > [null], [null], [null]){code} > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43259: -- Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43259: -- Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43259: -- Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-47100) Upgrade netty to 4.1.107.Final and netty-tcnative to 2.0.62.Final
[ https://issues.apache.org/jira/browse/SPARK-47100?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-47100. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 45178 [https://github.com/apache/spark/pull/45178] > Upgrade netty to 4.1.107.Final and netty-tcnative to 2.0.62.Final > - > > Key: SPARK-47100 > URL: https://issues.apache.org/jira/browse/SPARK-47100 > Project: Spark > Issue Type: Sub-task > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Labels: pull-request-available > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43259: -- Assignee: Apache Spark > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Assignee: Apache Spark >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43259) Assign a name to the error class _LEGACY_ERROR_TEMP_2024
[ https://issues.apache.org/jira/browse/SPARK-43259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] ASF GitHub Bot reassigned SPARK-43259: -- Assignee: (was: Apache Spark) > Assign a name to the error class _LEGACY_ERROR_TEMP_2024 > > > Key: SPARK-43259 > URL: https://issues.apache.org/jira/browse/SPARK-43259 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.5.0 >Reporter: Max Gekk >Priority: Minor > Labels: pull-request-available, starter > > Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2024* defined in > {*}core/src/main/resources/error/error-classes.json{*}. The name should be > short but complete (look at the example in error-classes.json). > Add a test which triggers the error from user code if such test still doesn't > exist. Check exception fields by using {*}checkError(){*}. The last function > checks valuable error fields only, and avoids dependencies from error text > message. In this way, tech editors can modify error format in > error-classes.json, and don't worry of Spark's internal tests. Migrate other > tests that might trigger the error onto checkError(). > If you cannot reproduce the error from user space (using SQL query), replace > the error by an internal error, see {*}SparkException.internalError(){*}. > Improve the error message format in error-classes.json if the current is not > clear. Propose a solution to users how to avoid and fix such kind of errors. > Please, look at the PR below as examples: > * [https://github.com/apache/spark/pull/38685] > * [https://github.com/apache/spark/pull/38656] > * [https://github.com/apache/spark/pull/38490] -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org