[jira] [Resolved] (SPARK-39488) Simplify the error handling of TempResolvedColumn
[ https://issues.apache.org/jira/browse/SPARK-39488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39488. - Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36809 [https://github.com/apache/spark/pull/36809] > Simplify the error handling of TempResolvedColumn > - > > Key: SPARK-39488 > URL: https://issues.apache.org/jira/browse/SPARK-39488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39488) Simplify the error handling of TempResolvedColumn
[ https://issues.apache.org/jira/browse/SPARK-39488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39488: --- Assignee: Wenchen Fan > Simplify the error handling of TempResolvedColumn > - > > Key: SPARK-39488 > URL: https://issues.apache.org/jira/browse/SPARK-39488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39490) Support `ipFamilyPolicy` and `ipFamilies` in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39490: Assignee: Dongjoon Hyun (was: Apache Spark) > Support `ipFamilyPolicy` and `ipFamilies` in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. > - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] > -- v1.16 [alpha] > -- v1.21 [beta] > -- v1.23 [stable] > To support IPv6-only environment, we need to control this features. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39490) Support `ipFamilyPolicy` and `ipFamilies` in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554911#comment-17554911 ] Apache Spark commented on SPARK-39490: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36887 > Support `ipFamilyPolicy` and `ipFamilies` in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. > - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] > -- v1.16 [alpha] > -- v1.21 [beta] > -- v1.23 [stable] > To support IPv6-only environment, we need to control this features. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39490) Support `ipFamilyPolicy` and `ipFamilies` in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39490: Assignee: Apache Spark (was: Dongjoon Hyun) > Support `ipFamilyPolicy` and `ipFamilies` in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. > - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] > -- v1.16 [alpha] > -- v1.21 [beta] > -- v1.23 [stable] > To support IPv6-only environment, we need to control this features. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39490) Support `ipFamilyPolicy` and `ipFamilies` in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39490: -- Description: K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] -- v1.16 [alpha] -- v1.21 [beta] -- v1.23 [stable] To support IPv6-only environment, we need to control this features. was: K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] -- v1.16 [alpha] -- v1.21 [beta] -- v1.23 [stable] > Support `ipFamilyPolicy` and `ipFamilies` in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. > - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] > -- v1.16 [alpha] > -- v1.21 [beta] > -- v1.23 [stable] > To support IPv6-only environment, we need to control this features. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39490) Support `ipFamilyPolicy` and `ipFamilies` in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39490: -- Description: K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] -- v1.16 [alpha] -- v1.21 [beta] -- v1.23 [stable] > Support `ipFamilyPolicy` and `ipFamilies` in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > > K8s IPv4/IPv6 dual-stack Feature reached `Stable` stage at v1.23. > - [https://kubernetes.io/docs/concepts/services-networking/dual-stack/] > -- v1.16 [alpha] > -- v1.21 [beta] > -- v1.23 [stable] -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39490) Support `ipFamilyPolicy` and `ipFamilies` in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39490: -- Summary: Support `ipFamilyPolicy` and `ipFamilies` in Driver Service (was: Support ipFamilyPolicy and ipFamilies in Driver Service) > Support `ipFamilyPolicy` and `ipFamilies` in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39490) Support ipFamilyPolicy and ipFamilies in Driver Service
[ https://issues.apache.org/jira/browse/SPARK-39490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-39490: - Assignee: Dongjoon Hyun > Support ipFamilyPolicy and ipFamilies in Driver Service > --- > > Key: SPARK-39490 > URL: https://issues.apache.org/jira/browse/SPARK-39490 > Project: Spark > Issue Type: Sub-task > Components: Kubernetes >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39490) Support ipFamilyPolicy and ipFamilies in Driver Service
Dongjoon Hyun created SPARK-39490: - Summary: Support ipFamilyPolicy and ipFamilies in Driver Service Key: SPARK-39490 URL: https://issues.apache.org/jira/browse/SPARK-39490 Project: Spark Issue Type: Sub-task Components: Kubernetes Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39399) proxy-user support not working for Spark on k8s in cluster deploy mode
[ https://issues.apache.org/jira/browse/SPARK-39399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554904#comment-17554904 ] pralabhkumar commented on SPARK-39399: -- ping [~hyukjin.kwon] , please help us on the same or please provide some reference who can take this forward. > proxy-user support not working for Spark on k8s in cluster deploy mode > -- > > Key: SPARK-39399 > URL: https://issues.apache.org/jira/browse/SPARK-39399 > Project: Spark > Issue Type: Bug > Components: Kubernetes, Spark Core >Affects Versions: 3.2.0 >Reporter: Shrikant >Priority: Major > > As part of https://issues.apache.org/jira/browse/SPARK-25355 Proxy user > support was added for Spark on K8s. But the PR only added proxy user on the > spark-submit command to the childArgs. The actual functionality of > authentication using the proxy user is not working in case of cluster deploy > mode for Spark on K8s. > We get AccessControlException when trying to access the kerberized HDFS > through a proxy user. > Spark-Submit: > $SPARK_HOME/bin/spark-submit \ > --master \ > --deploy-mode cluster \ > --name with_proxy_user_di \ > --proxy-user \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.kubernetes.container.image= \ > --conf spark.kubernetes.driver.podTemplateFile=driver.yaml \ > --conf spark.kubernetes.executor.podTemplateFile=executor.yaml \ > --conf spark.kubernetes.driver.limit.cores=1 \ > --conf spark.executor.instances=1 \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark \ > --conf spark.kubernetes.namespace= \ > --conf spark.kubernetes.kerberos.krb5.path=/etc/krb5.conf \ > --conf spark.eventLog.enabled=true \ > --conf spark.eventLog.dir=hdfs:///scaas/shs_logs \--conf > spark.kubernetes.file.upload.path=hdfs:///tmp \--conf > spark.kubernetes.container.image.pullPolicy=Always \ > --conf > spark.driver.extraJavaOptions=-Dlog4j.configuration=file:///opt/log4j/log4j.properties > \ $SPARK_HOME/examples/jars/spark-examples_2.12-3.2.0-1.jar > Driver Logs: > {code:java} > ++ id -u > + myuid=185 > ++ id -g > + mygid=0 > + set +e > ++ getent passwd 185 > + uidentry= > + set -e > + '[' -z '' ']' > + '[' -w /etc/passwd ']' > + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false' > + SPARK_CLASSPATH=':/opt/spark/jars/*' > + env > + grep SPARK_JAVA_OPT_ > + sort -t_ -k4 -n > + sed 's/[^=]*=\(.*\)/\1/g' > + readarray -t SPARK_EXECUTOR_JAVA_OPTS > + '[' -n '' ']' > + '[' -z ']' > + '[' -z ']' > + '[' -n '' ']' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/hadoop/conf::/opt/spark/jars/*' > + '[' -z x ']' > + SPARK_CLASSPATH='/opt/spark/conf:/opt/hadoop/conf::/opt/spark/jars/*' > + case "$1" in > + shift 1 > + CMD=("$SPARK_HOME/bin/spark-submit" --conf > "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client > "$@") > + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf > spark.driver.bindAddress= --deploy-mode client --proxy-user proxy_user > --properties-file /opt/spark/conf/spark.properties --class > org.apache.spark.examples.SparkPi spark-internal > WARNING: An illegal reflective access operation has occurred > WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform > (file:/opt/spark/jars/spark-unsafe_2.12-3.2.0-1.jar) to constructor > java.nio.DirectByteBuffer(long,int) > WARNING: Please consider reporting this to the maintainers of > org.apache.spark.unsafe.Platform > WARNING: Use --illegal-access=warn to enable warnings of further illegal > reflective access operations > WARNING: All illegal access operations will be denied in a future release > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginSuccess with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Rate of successful > kerberos logins and latency (milliseconds)"}, valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.loginFailure with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"Rate of failed kerberos > logins and latency (milliseconds)"}, valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: field > org.apache.hadoop.metrics2.lib.MutableRate > org.apache.hadoop.security.UserGroupInformation$UgiMetrics.getGroups with > annotation @org.apache.hadoop.metrics2.annotation.Metric(about="", > sampleName="Ops", always=false, type=DEFAULT, value={"GetGroups"}, > valueName="Time") > 22/04/26 08:54:38 DEBUG MutableMetricsFactory: fi
[jira] [Commented] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554899#comment-17554899 ] Hyukjin Kwon commented on SPARK-39074: -- Reverted at https://github.com/apache/spark/commit/ae10ff8837385871c3f72b2b7bb97dd235872602 > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39383) Support V2 data sources with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang reassigned SPARK-39383: -- Assignee: Daniel > Support V2 data sources with DEFAULT values > --- > > Key: SPARK-39383 > URL: https://issues.apache.org/jira/browse/SPARK-39383 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39383) Support V2 data sources with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gengliang Wang resolved SPARK-39383. Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36880 [https://github.com/apache/spark/pull/36880] > Support V2 data sources with DEFAULT values > --- > > Key: SPARK-39383 > URL: https://issues.apache.org/jira/browse/SPARK-39383 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-39074: -- Assignee: (was: Enrico Minack) > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > Fix For: 3.4.0 > > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554895#comment-17554895 ] Hyukjin Kwon commented on SPARK-39074: -- Fixed in https://github.com/apache/spark/commit/ae10ff8837385871c3f72b2b7bb97dd235872602 > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39074: Assignee: (was: Apache Spark) > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39074: Assignee: Apache Spark > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Apache Spark >Priority: Minor > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39074: - Fix Version/s: (was: 3.4.0) > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Priority: Minor > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39489) Improve EventLoggingListener and ReplayListener performance by replacing Json4S ASTs with Jackson trees
[ https://issues.apache.org/jira/browse/SPARK-39489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554874#comment-17554874 ] Apache Spark commented on SPARK-39489: -- User 'JoshRosen' has created a pull request for this issue: https://github.com/apache/spark/pull/36885 > Improve EventLoggingListener and ReplayListener performance by replacing > Json4S ASTs with Jackson trees > --- > > Key: SPARK-39489 > URL: https://issues.apache.org/jira/browse/SPARK-39489 > Project: Spark > Issue Type: Improvement > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Josh Rosen >Assignee: Josh Rosen >Priority: Major > > Spark's event log JsonProtocol currently uses Json4s ASTs to generate and > parse JSON. Performance overheads from Json4s account for a significant > proportion of all time spent in JsonProtocol. If we replace Json4s usage with > direct usage of Jackson APIs then we can significantly improve performance > (~2x improvement for writing and reading in my own local microbenchmarks). > This performance improvement translates to faster history server load times > and reduced load on the Spark driver (and reduced likelihood of dropping > events because the listener cannot keep up, therefore reducing the likelihood > of inconsistent Spark UIs). > Reducing our usage of Json4s is also a step towards being able to eventually > remove our dependency on Json4s: Spark's current use of Json4s creates > library conflicts for end users who want to adopt Json4s 4 (see discussion on > PRs for SPARK-36408). If Spark can eventually remove its Json4s dependency > then we will completely eliminate such conflicts. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39489) Improve EventLoggingListener and ReplayListener performance by replacing Json4S ASTs with Jackson trees
Josh Rosen created SPARK-39489: -- Summary: Improve EventLoggingListener and ReplayListener performance by replacing Json4S ASTs with Jackson trees Key: SPARK-39489 URL: https://issues.apache.org/jira/browse/SPARK-39489 Project: Spark Issue Type: Improvement Components: Spark Core Affects Versions: 3.0.0 Reporter: Josh Rosen Assignee: Josh Rosen Spark's event log JsonProtocol currently uses Json4s ASTs to generate and parse JSON. Performance overheads from Json4s account for a significant proportion of all time spent in JsonProtocol. If we replace Json4s usage with direct usage of Jackson APIs then we can significantly improve performance (~2x improvement for writing and reading in my own local microbenchmarks). This performance improvement translates to faster history server load times and reduced load on the Spark driver (and reduced likelihood of dropping events because the listener cannot keep up, therefore reducing the likelihood of inconsistent Spark UIs). Reducing our usage of Json4s is also a step towards being able to eventually remove our dependency on Json4s: Spark's current use of Json4s creates library conflicts for end users who want to adopt Json4s 4 (see discussion on PRs for SPARK-36408). If Spark can eventually remove its Json4s dependency then we will completely eliminate such conflicts. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39488) Simplify the error handling of TempResolvedColumn
[ https://issues.apache.org/jira/browse/SPARK-39488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39488: Assignee: (was: Apache Spark) > Simplify the error handling of TempResolvedColumn > - > > Key: SPARK-39488 > URL: https://issues.apache.org/jira/browse/SPARK-39488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39488) Simplify the error handling of TempResolvedColumn
[ https://issues.apache.org/jira/browse/SPARK-39488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39488: Assignee: Apache Spark > Simplify the error handling of TempResolvedColumn > - > > Key: SPARK-39488 > URL: https://issues.apache.org/jira/browse/SPARK-39488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39488) Simplify the error handling of TempResolvedColumn
[ https://issues.apache.org/jira/browse/SPARK-39488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554851#comment-17554851 ] Apache Spark commented on SPARK-39488: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/36809 > Simplify the error handling of TempResolvedColumn > - > > Key: SPARK-39488 > URL: https://issues.apache.org/jira/browse/SPARK-39488 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: Wenchen Fan >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39488) Simplify the error handling of TempResolvedColumn
Wenchen Fan created SPARK-39488: --- Summary: Simplify the error handling of TempResolvedColumn Key: SPARK-39488 URL: https://issues.apache.org/jira/browse/SPARK-39488 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.4.0 Reporter: Wenchen Fan -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39476) Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
[ https://issues.apache.org/jira/browse/SPARK-39476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-39476. - Fix Version/s: 3.3.1 3.2.2 3.1.3 3.4.0 Resolution: Fixed Issue resolved by pull request 36873 [https://github.com/apache/spark/pull/36873] > Disable Unwrap cast optimize when casting from Long to Float/ Double or from > Integer to Float > - > > Key: SPARK-39476 > URL: https://issues.apache.org/jira/browse/SPARK-39476 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > Fix For: 3.3.1, 3.2.2, 3.1.3, 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39476) Disable Unwrap cast optimize when casting from Long to Float/ Double or from Integer to Float
[ https://issues.apache.org/jira/browse/SPARK-39476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-39476: --- Assignee: EdisonWang > Disable Unwrap cast optimize when casting from Long to Float/ Double or from > Integer to Float > - > > Key: SPARK-39476 > URL: https://issues.apache.org/jira/browse/SPARK-39476 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.0, 3.1.1, 3.1.2, 3.2.0, 3.2.1, 3.3.0 >Reporter: EdisonWang >Assignee: EdisonWang >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39465) Log4j version upgrade to 2.17.2
[ https://issues.apache.org/jira/browse/SPARK-39465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-39465. Resolution: Done > Log4j version upgrade to 2.17.2 > --- > > Key: SPARK-39465 > URL: https://issues.apache.org/jira/browse/SPARK-39465 > Project: Spark > Issue Type: Dependency upgrade > Components: Java API >Affects Versions: 3.2.1 > Environment: Production >Reporter: Chethan G B >Priority: Major > > Hi Team, > There were talks about upgrading log4j to latest version available as part of > security fix. > Wanted to know, if it is already upgraded. > > Note: We are using below dependencies, > > > org.apache.spark > spark-core_2.12 > 3.0.1 > > > org.apache.spark > spark-sql_2.12 > 3.0.1 > > Kindly let us know when the log4j upgrade will be available for users ? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-39465) Log4j version upgrade to 2.17.2
[ https://issues.apache.org/jira/browse/SPARK-39465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reopened SPARK-39465: > Log4j version upgrade to 2.17.2 > --- > > Key: SPARK-39465 > URL: https://issues.apache.org/jira/browse/SPARK-39465 > Project: Spark > Issue Type: Dependency upgrade > Components: Java API >Affects Versions: 3.2.1 > Environment: Production >Reporter: Chethan G B >Priority: Major > > Hi Team, > There were talks about upgrading log4j to latest version available as part of > security fix. > Wanted to know, if it is already upgraded. > > Note: We are using below dependencies, > > > org.apache.spark > spark-core_2.12 > 3.0.1 > > > org.apache.spark > spark-sql_2.12 > 3.0.1 > > Kindly let us know when the log4j upgrade will be available for users ? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-39465) Log4j version upgrade to 2.17.2
[ https://issues.apache.org/jira/browse/SPARK-39465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554844#comment-17554844 ] Josh Rosen edited comment on SPARK-39465 at 6/16/22 1:21 AM: - Spark uses Log4J 2.x starting in Spark 3.3.0+; see SPARK-37814 The migration from Log4J 1.x to Log4J 2.x is too large of a change for us to backport to existing Spark versions (see [related discussion on another ticket|https://issues.apache.org/jira/browse/SPARK-37883?focusedCommentId=17481521&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-17481521]). As a result, if you want to use Log4J 2.x then you will need to upgrade to Spark 3.3.0. The [Spark 3.3.0 release vote just passed yesterday|https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm], so the release should be published in the next couple of days. was (Author: joshrosen): Spark uses Log4J 2.x starting in Spark 3.3.0+; see SPARK-37814 The migration from Log4J 1.x to Log4J 2.x is too large of a change for us to backport to existing Spark versions (see related discussion on another ticket). As a result, if you want to use Log4J 2.x then you will need to upgrade to Spark 3.3.0. The [Spark 3.3.0 release vote just passed yesterday|https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm], so the release should be published in the next couple of days. > Log4j version upgrade to 2.17.2 > --- > > Key: SPARK-39465 > URL: https://issues.apache.org/jira/browse/SPARK-39465 > Project: Spark > Issue Type: Dependency upgrade > Components: Java API >Affects Versions: 3.2.1 > Environment: Production >Reporter: Chethan G B >Priority: Major > > Hi Team, > There were talks about upgrading log4j to latest version available as part of > security fix. > Wanted to know, if it is already upgraded. > > Note: We are using below dependencies, > > > org.apache.spark > spark-core_2.12 > 3.0.1 > > > org.apache.spark > spark-sql_2.12 > 3.0.1 > > Kindly let us know when the log4j upgrade will be available for users ? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39465) Log4j version upgrade to 2.17.2
[ https://issues.apache.org/jira/browse/SPARK-39465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-39465. Resolution: Won't Fix > Log4j version upgrade to 2.17.2 > --- > > Key: SPARK-39465 > URL: https://issues.apache.org/jira/browse/SPARK-39465 > Project: Spark > Issue Type: Dependency upgrade > Components: Java API >Affects Versions: 3.2.1 > Environment: Production >Reporter: Chethan G B >Priority: Major > > Hi Team, > There were talks about upgrading log4j to latest version available as part of > security fix. > Wanted to know, if it is already upgraded. > > Note: We are using below dependencies, > > > org.apache.spark > spark-core_2.12 > 3.0.1 > > > org.apache.spark > spark-sql_2.12 > 3.0.1 > > Kindly let us know when the log4j upgrade will be available for users ? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39465) Log4j version upgrade to 2.17.2
[ https://issues.apache.org/jira/browse/SPARK-39465?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554844#comment-17554844 ] Josh Rosen commented on SPARK-39465: Spark uses Log4J 2.x starting in Spark 3.3.0+; see SPARK-37814 The migration from Log4J 1.x to Log4J 2.x is too large of a change for us to backport to existing Spark versions (see related discussion on another ticket). As a result, if you want to use Log4J 2.x then you will need to upgrade to Spark 3.3.0. The [Spark 3.3.0 release vote just passed yesterday|https://lists.apache.org/thread/zg6k1spw6k1c7brgo6t7qldvsqbmfytm], so the release should be published in the next couple of days. > Log4j version upgrade to 2.17.2 > --- > > Key: SPARK-39465 > URL: https://issues.apache.org/jira/browse/SPARK-39465 > Project: Spark > Issue Type: Dependency upgrade > Components: Java API >Affects Versions: 3.2.1 > Environment: Production >Reporter: Chethan G B >Priority: Major > > Hi Team, > There were talks about upgrading log4j to latest version available as part of > security fix. > Wanted to know, if it is already upgraded. > > Note: We are using below dependencies, > > > org.apache.spark > spark-core_2.12 > 3.0.1 > > > org.apache.spark > spark-sql_2.12 > 3.0.1 > > Kindly let us know when the log4j upgrade will be available for users ? -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39485) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39485: Assignee: (was: Apache Spark) > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39485 > URL: https://issues.apache.org/jira/browse/SPARK-39485 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30
[jira] [Assigned] (SPARK-39485) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39485: Assignee: Apache Spark > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39485 > URL: https://issues.apache.org/jira/browse/SPARK-39485 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Assignee: Apache Spark >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPru
[jira] [Commented] (SPARK-39485) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554840#comment-17554840 ] Apache Spark commented on SPARK-39485: -- User 'koodin9' has created a pull request for this issue: https://github.com/apache/spark/pull/36884 > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39485 > URL: https://issues.apache.org/jira/browse/SPARK-39485 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) >
[jira] [Updated] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-39061: - Fix Version/s: 3.3.1 (was: 3.3.0) > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > Fix For: 3.2.2, 3.3.1 > > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-39061: Assignee: Bruce Robbins > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39061. -- Fix Version/s: 3.3.0 3.2.2 Resolution: Fixed Issue resolved by pull request 36883 [https://github.com/apache/spark/pull/36883] > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Assignee: Bruce Robbins >Priority: Major > Labels: correctness > Fix For: 3.3.0, 3.2.2 > > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38292) Support `na_filter` for pyspark.pandas.read_csv
[ https://issues.apache.org/jira/browse/SPARK-38292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554836#comment-17554836 ] Hyukjin Kwon commented on SPARK-38292: -- please go ahead! > Support `na_filter` for pyspark.pandas.read_csv > --- > > Key: SPARK-38292 > URL: https://issues.apache.org/jira/browse/SPARK-38292 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > pandas support `na_filter` parameter for `read_csv` function. > (https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) > We also want to support this to follow the behavior of pandas. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39482) Add build and test documentation on IPv6
[ https://issues.apache.org/jira/browse/SPARK-39482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39482. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36879 [https://github.com/apache/spark/pull/36879] > Add build and test documentation on IPv6 > > > Key: SPARK-39482 > URL: https://issues.apache.org/jira/browse/SPARK-39482 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39457) Support IPv6-only environment
[ https://issues.apache.org/jira/browse/SPARK-39457?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554830#comment-17554830 ] DB Tsai commented on SPARK-39457: - If there is any IPv6 issue in Hadoop client side, we might hit it once we get Spark fully working on pure IPv6 env. We will test it once we get there. > Support IPv6-only environment > - > > Key: SPARK-39457 > URL: https://issues.apache.org/jira/browse/SPARK-39457 > Project: Spark > Issue Type: New Feature > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: DB Tsai >Priority: Major > Labels: releasenotes > > Spark doesn't fully work in pure IPV6 environment that doesn't have IPV4 at > all. This is an umbrella jira tracking the support of pure IPV6 deployment. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Closed] (SPARK-39486) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SeongHoon Ku closed SPARK-39486. duplicated https://issues.apache.org/jira/browse/SPARK-39485 > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39486 > URL: https://issues.apache.org/jira/browse/SPARK-39486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(Logic
[jira] [Closed] (SPARK-39487) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SeongHoon Ku closed SPARK-39487. duplicated https://issues.apache.org/jira/browse/SPARK-39485 > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39487 > URL: https://issues.apache.org/jira/browse/SPARK-39487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(Logic
[jira] [Resolved] (SPARK-39487) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SeongHoon Ku resolved SPARK-39487. -- Resolution: Duplicate > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39487 > URL: https://issues.apache.org/jira/browse/SPARK-39487 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > o
[jira] [Resolved] (SPARK-39486) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
[ https://issues.apache.org/jira/browse/SPARK-39486?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] SeongHoon Ku resolved SPARK-39486. -- Resolution: Duplicate > When fetching hiveMetastoreJars from path, IsolatedClientLoader should get > hive settings from origLoader. > - > > Key: SPARK-39486 > URL: https://issues.apache.org/jira/browse/SPARK-39486 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1 >Reporter: SeongHoon Ku >Priority: Major > > Hi all, > I made a spark application where deploy-mode is YARN cluster and > spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. > And "spark.yarn.dist.files" was set so that the driver could refer to > hive-related xml files in cluster mode. > {code} > spark.yarn.dist.files > viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml > {code} > application failed with the following error. > {code} > 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering > ApplicationMaster with FAILED (diag message: User class threw exception: > org.apache.spark.sql.AnalysisException: > java.lang.ExceptionInInitializerError: null > at > org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) > at > org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) > at > org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) > at > org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) > at > org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) > at > org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) > at > org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) > at > org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) > at > org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) > at > org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) > at > org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) > at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) > at > org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) > at > org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) > at > org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) > at > o
[jira] [Created] (SPARK-39487) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
SeongHoon Ku created SPARK-39487: Summary: When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader. Key: SPARK-39487 URL: https://issues.apache.org/jira/browse/SPARK-39487 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: SeongHoon Ku Hi all, I made a spark application where deploy-mode is YARN cluster and spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. And "spark.yarn.dist.files" was set so that the driver could refer to hive-related xml files in cluster mode. {code} spark.yarn.dist.files viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml {code} application failed with the following error. {code} 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.ExceptionInInitializerError: null at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) at org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(Q
[jira] [Created] (SPARK-39486) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
SeongHoon Ku created SPARK-39486: Summary: When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader. Key: SPARK-39486 URL: https://issues.apache.org/jira/browse/SPARK-39486 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: SeongHoon Ku Hi all, I made a spark application where deploy-mode is YARN cluster and spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. And "spark.yarn.dist.files" was set so that the driver could refer to hive-related xml files in cluster mode. {code} spark.yarn.dist.files viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml {code} application failed with the following error. {code} 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.ExceptionInInitializerError: null at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) at org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(Q
[jira] [Created] (SPARK-39485) When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader.
SeongHoon Ku created SPARK-39485: Summary: When fetching hiveMetastoreJars from path, IsolatedClientLoader should get hive settings from origLoader. Key: SPARK-39485 URL: https://issues.apache.org/jira/browse/SPARK-39485 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1 Reporter: SeongHoon Ku Hi all, I made a spark application where deploy-mode is YARN cluster and spark.sql.hive.metastore.jars is path and hive metastore version is 2.3.2. And "spark.yarn.dist.files" was set so that the driver could refer to hive-related xml files in cluster mode. {code} spark.yarn.dist.files viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hive-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hivemetastore-site.xml,viewfs:///app/spark-3.2.1-bin-without-hadoop/conf/hiveserver2-site.xml {code} application failed with the following error. {code} 22/06/14 13:51:46 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED (diag message: User class threw exception: org.apache.spark.sql.AnalysisException: java.lang.ExceptionInInitializerError: null at org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:111) at org.apache.spark.sql.hive.HiveExternalCatalog.databaseExists(HiveExternalCatalog.scala:224) at org.apache.spark.sql.internal.SharedState.externalCatalog$lzycompute(SharedState.scala:150) at org.apache.spark.sql.internal.SharedState.externalCatalog(SharedState.scala:140) at org.apache.spark.sql.hive.HiveSessionStateBuilder.externalCatalog(HiveSessionStateBuilder.scala:45) at org.apache.spark.sql.hive.HiveSessionStateBuilder.$anonfun$catalog$1(HiveSessionStateBuilder.scala:60) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog$lzycompute(SessionCatalog.scala:118) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.externalCatalog(SessionCatalog.scala:118) at org.apache.spark.sql.catalyst.catalog.SessionCatalog.listDatabases(SessionCatalog.scala:298) at org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog.listNamespaces(V2SessionCatalog.scala:205) at org.apache.spark.sql.execution.datasources.v2.ShowNamespacesExec.run(ShowNamespacesExec.scala:42) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result$lzycompute(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.result(V2CommandExec.scala:43) at org.apache.spark.sql.execution.datasources.v2.V2CommandExec.executeCollect(V2CommandExec.scala:49) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.$anonfun$applyOrElse$1(QueryExecution.scala:110) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$5(SQLExecution.scala:103) at org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:163) at org.apache.spark.sql.execution.SQLExecution$.$anonfun$withNewExecutionId$1(SQLExecution.scala:90) at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775) at org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:64) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:110) at org.apache.spark.sql.execution.QueryExecution$$anonfun$eagerlyExecuteCommands$1.applyOrElse(QueryExecution.scala:106) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:481) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:82) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:481) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning(AnalysisHelper.scala:267) at org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.transformDownWithPruning$(AnalysisHelper.scala:263) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDownWithPruning(LogicalPlan.scala:30) at org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:457) at org.apache.spark.sql.execution.QueryExecution.eagerlyExecuteCommands(QueryExecution.scala:106) at org.apache.spark.sql.execution.QueryExecution.commandExecuted$lzycompute(QueryExecution.scala:93) at org.apache.spark.sql.execution.QueryExecution.commandExecuted(Q
[jira] [Assigned] (SPARK-39469) Infer date type for CSV schema inference
[ https://issues.apache.org/jira/browse/SPARK-39469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39469: Assignee: (was: Apache Spark) > Infer date type for CSV schema inference > > > Key: SPARK-39469 > URL: https://issues.apache.org/jira/browse/SPARK-39469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Jonathan Cui >Priority: Major > > 1. If a column contains only dates, it should be of “date” type in the > inferred schema > * If the date format and the timestamp format are identical (e.g. both are > /mm/dd), entries will default to being interpreted as Date > 2. If a column contains dates and timestamps, it should be of “timestamp” > type in the inferred schema > > A similar issue was opened in the past but was reverted due to the lack of > strict pattern matching. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39469) Infer date type for CSV schema inference
[ https://issues.apache.org/jira/browse/SPARK-39469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554809#comment-17554809 ] Apache Spark commented on SPARK-39469: -- User 'Jonathancui123' has created a pull request for this issue: https://github.com/apache/spark/pull/36871 > Infer date type for CSV schema inference > > > Key: SPARK-39469 > URL: https://issues.apache.org/jira/browse/SPARK-39469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Jonathan Cui >Priority: Major > > 1. If a column contains only dates, it should be of “date” type in the > inferred schema > * If the date format and the timestamp format are identical (e.g. both are > /mm/dd), entries will default to being interpreted as Date > 2. If a column contains dates and timestamps, it should be of “timestamp” > type in the inferred schema > > A similar issue was opened in the past but was reverted due to the lack of > strict pattern matching. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39469) Infer date type for CSV schema inference
[ https://issues.apache.org/jira/browse/SPARK-39469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554808#comment-17554808 ] Apache Spark commented on SPARK-39469: -- User 'Jonathancui123' has created a pull request for this issue: https://github.com/apache/spark/pull/36871 > Infer date type for CSV schema inference > > > Key: SPARK-39469 > URL: https://issues.apache.org/jira/browse/SPARK-39469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Jonathan Cui >Priority: Major > > 1. If a column contains only dates, it should be of “date” type in the > inferred schema > * If the date format and the timestamp format are identical (e.g. both are > /mm/dd), entries will default to being interpreted as Date > 2. If a column contains dates and timestamps, it should be of “timestamp” > type in the inferred schema > > A similar issue was opened in the past but was reverted due to the lack of > strict pattern matching. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39469) Infer date type for CSV schema inference
[ https://issues.apache.org/jira/browse/SPARK-39469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39469: Assignee: Apache Spark > Infer date type for CSV schema inference > > > Key: SPARK-39469 > URL: https://issues.apache.org/jira/browse/SPARK-39469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Jonathan Cui >Assignee: Apache Spark >Priority: Major > > 1. If a column contains only dates, it should be of “date” type in the > inferred schema > * If the date format and the timestamp format are identical (e.g. both are > /mm/dd), entries will default to being interpreted as Date > 2. If a column contains dates and timestamps, it should be of “timestamp” > type in the inferred schema > > A similar issue was opened in the past but was reverted due to the lack of > strict pattern matching. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39469) Infer date type for CSV schema inference
[ https://issues.apache.org/jira/browse/SPARK-39469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jonathan Cui updated SPARK-39469: - Description: 1. If a column contains only dates, it should be of “date” type in the inferred schema * If the date format and the timestamp format are identical (e.g. both are /mm/dd), entries will default to being interpreted as Date 2. If a column contains dates and timestamps, it should be of “timestamp” type in the inferred schema A similar issue was opened in the past but was reverted due to the lack of strict pattern matching. was: 1. If a column contains only dates, it should be of “date” type in the inferred schema * If the date format and the timestamp format are identical (e.g. both are /mm/dd), entries will default to being interpreted as Date 2. If a column contains dates and timestamps, it should be of “timestamp” type in the inferred schema > Infer date type for CSV schema inference > > > Key: SPARK-39469 > URL: https://issues.apache.org/jira/browse/SPARK-39469 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.1 >Reporter: Jonathan Cui >Priority: Major > > 1. If a column contains only dates, it should be of “date” type in the > inferred schema > * If the date format and the timestamp format are identical (e.g. both are > /mm/dd), entries will default to being interpreted as Date > 2. If a column contains dates and timestamps, it should be of “timestamp” > type in the inferred schema > > A similar issue was opened in the past but was reverted due to the lack of > strict pattern matching. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39061: Assignee: (was: Apache Spark) > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554801#comment-17554801 ] Apache Spark commented on SPARK-39061: -- User 'bersprockets' has created a pull request for this issue: https://github.com/apache/spark/pull/36883 > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39061) Incorrect results or NPE when using Inline function against an array of dynamically created structs
[ https://issues.apache.org/jira/browse/SPARK-39061?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39061: Assignee: Apache Spark > Incorrect results or NPE when using Inline function against an array of > dynamically created structs > --- > > Key: SPARK-39061 > URL: https://issues.apache.org/jira/browse/SPARK-39061 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.1, 3.3.0, 3.4.0 >Reporter: Bruce Robbins >Assignee: Apache Spark >Priority: Major > Labels: correctness > > The following query returns incorrect results: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), null)); > 1 2 > -1-1 > Time taken: 4.053 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > In Hive, the last row is {{NULL, NULL}}: > {noformat} > Beeline version 2.3.9 by Apache Hive > 0: jdbc:hive2://localhost:1> select inline(array(named_struct('a', 1, > 'b', 2), null)); > +---+---+ > | a | b | > +---+---+ > | 1 | 2 | > | NULL | NULL | > +---+---+ > 2 rows selected (1.355 seconds) > 0: jdbc:hive2://localhost:1> > {noformat} > If the struct has string fields, you get a {{NullPointerException}}: > {noformat} > spark-sql> select inline(array(named_struct('a', '1', 'b', '2'), null)); > 22/04/28 16:51:54 ERROR Executor: Exception in task 0.0 in stage 2.0 (TID 2) > java.lang.NullPointerException: null > at > org.apache.spark.sql.catalyst.expressions.codegen.UnsafeWriter.write(UnsafeWriter.java:110) > ~[spark-catalyst_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.generate_doConsume_0$(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown > Source) ~[?:?] > at > org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43) > ~[spark-sql_2.12-3.4.0-SNAPSHOT.jar:3.4.0-SNAPSHOT] > {noformat} > You can work around the issue by casting the null entry of the array: > {noformat} > spark-sql> select inline(array(named_struct('a', 1, 'b', 2), cast(null as > struct))); > 1 2 > NULL NULL > Time taken: 0.068 seconds, Fetched 2 row(s) > spark-sql> > {noformat} > As far as I can tell, this issue only happens with arrays of structs where > the structs are created in an inline table or in a projection. > The fields of the struct are not getting set to {{nullable = true}} when > there is no example in the array where the field is set to {{null}}. As a > result, {{GenerateUnsafeProjection.createCode}} generates bad code: it has no > code to create a row of null columns, so it just creates a row from variables > set with default values. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39468) Improve RpcAddress to add [] to IPv6 if needed
[ https://issues.apache.org/jira/browse/SPARK-39468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554800#comment-17554800 ] Apache Spark commented on SPARK-39468: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36882 > Improve RpcAddress to add [] to IPv6 if needed > -- > > Key: SPARK-39468 > URL: https://issues.apache.org/jira/browse/SPARK-39468 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39484) V2 write for type struct fails to handle case sensitivity on field names during resolution of V2 write command
[ https://issues.apache.org/jira/browse/SPARK-39484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554777#comment-17554777 ] Apache Spark commented on SPARK-39484: -- User 'edgarRd' has created a pull request for this issue: https://github.com/apache/spark/pull/36881 > V2 write for type struct fails to handle case sensitivity on field names > during resolution of V2 write command > -- > > Key: SPARK-39484 > URL: https://issues.apache.org/jira/browse/SPARK-39484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.2.1 > Environment: {{{}master{}}}, {{3.1.1}} >Reporter: Edgar Rodriguez >Priority: Minor > > Summary: > When a V2 write uses an input with a {{struct}} type which contains > differences in the casing of field names, the {{caseSensitive}} config is not > being honored, always doing a strict case sensitive comparison. > Repro: > {code:java} > CREATE TABLE tmp.test_table_to (key int, object struct) USING > ICEBERG; > CREATE TABLE tmp.test_table_from (key int, object struct) USING > HIVE; > INSERT OVERWRITE tmp.test_table_to SELECT 1 as key, object FROM > tmp.test_table_from;{code} > The above results in Exception: > {code:java} > Error in query: unresolved operator 'OverwriteByExpression RelationV2[key#3, > object#4] spark_catalog.tmp.test_table_to, true, false; > 'OverwriteByExpression RelationV2[key#3, object#4] > spark_catalog.tmp.test_table_to, true, false > +- Project [1 AS key#0, object#2] > +- SubqueryAlias spark_catalog.tmp.test_table_from > +- HiveTableRelation [`tmp`.`test_table_from`, > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: > [key#1, object#2], Partition Cols: []]{code} > > If the casing matches in the struct field names, the v2 write works as > expected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39484) V2 write for type struct fails to handle case sensitivity on field names during resolution of V2 write command
[ https://issues.apache.org/jira/browse/SPARK-39484?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554776#comment-17554776 ] Edgar Rodriguez commented on SPARK-39484: - Proposed solution PR: https://github.com/apache/spark/pull/36881 > V2 write for type struct fails to handle case sensitivity on field names > during resolution of V2 write command > -- > > Key: SPARK-39484 > URL: https://issues.apache.org/jira/browse/SPARK-39484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.2.1 > Environment: {{{}master{}}}, {{3.1.1}} >Reporter: Edgar Rodriguez >Priority: Minor > > Summary: > When a V2 write uses an input with a {{struct}} type which contains > differences in the casing of field names, the {{caseSensitive}} config is not > being honored, always doing a strict case sensitive comparison. > Repro: > {code:java} > CREATE TABLE tmp.test_table_to (key int, object struct) USING > ICEBERG; > CREATE TABLE tmp.test_table_from (key int, object struct) USING > HIVE; > INSERT OVERWRITE tmp.test_table_to SELECT 1 as key, object FROM > tmp.test_table_from;{code} > The above results in Exception: > {code:java} > Error in query: unresolved operator 'OverwriteByExpression RelationV2[key#3, > object#4] spark_catalog.tmp.test_table_to, true, false; > 'OverwriteByExpression RelationV2[key#3, object#4] > spark_catalog.tmp.test_table_to, true, false > +- Project [1 AS key#0, object#2] > +- SubqueryAlias spark_catalog.tmp.test_table_from > +- HiveTableRelation [`tmp`.`test_table_from`, > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: > [key#1, object#2], Partition Cols: []]{code} > > If the casing matches in the struct field names, the v2 write works as > expected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39484) V2 write for type struct fails to handle case sensitivity on field names during resolution of V2 write command
[ https://issues.apache.org/jira/browse/SPARK-39484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39484: Assignee: Apache Spark > V2 write for type struct fails to handle case sensitivity on field names > during resolution of V2 write command > -- > > Key: SPARK-39484 > URL: https://issues.apache.org/jira/browse/SPARK-39484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.2.1 > Environment: {{{}master{}}}, {{3.1.1}} >Reporter: Edgar Rodriguez >Assignee: Apache Spark >Priority: Minor > > Summary: > When a V2 write uses an input with a {{struct}} type which contains > differences in the casing of field names, the {{caseSensitive}} config is not > being honored, always doing a strict case sensitive comparison. > Repro: > {code:java} > CREATE TABLE tmp.test_table_to (key int, object struct) USING > ICEBERG; > CREATE TABLE tmp.test_table_from (key int, object struct) USING > HIVE; > INSERT OVERWRITE tmp.test_table_to SELECT 1 as key, object FROM > tmp.test_table_from;{code} > The above results in Exception: > {code:java} > Error in query: unresolved operator 'OverwriteByExpression RelationV2[key#3, > object#4] spark_catalog.tmp.test_table_to, true, false; > 'OverwriteByExpression RelationV2[key#3, object#4] > spark_catalog.tmp.test_table_to, true, false > +- Project [1 AS key#0, object#2] > +- SubqueryAlias spark_catalog.tmp.test_table_from > +- HiveTableRelation [`tmp`.`test_table_from`, > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: > [key#1, object#2], Partition Cols: []]{code} > > If the casing matches in the struct field names, the v2 write works as > expected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39484) V2 write for type struct fails to handle case sensitivity on field names during resolution of V2 write command
[ https://issues.apache.org/jira/browse/SPARK-39484?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39484: Assignee: (was: Apache Spark) > V2 write for type struct fails to handle case sensitivity on field names > during resolution of V2 write command > -- > > Key: SPARK-39484 > URL: https://issues.apache.org/jira/browse/SPARK-39484 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.1.1, 3.2.1 > Environment: {{{}master{}}}, {{3.1.1}} >Reporter: Edgar Rodriguez >Priority: Minor > > Summary: > When a V2 write uses an input with a {{struct}} type which contains > differences in the casing of field names, the {{caseSensitive}} config is not > being honored, always doing a strict case sensitive comparison. > Repro: > {code:java} > CREATE TABLE tmp.test_table_to (key int, object struct) USING > ICEBERG; > CREATE TABLE tmp.test_table_from (key int, object struct) USING > HIVE; > INSERT OVERWRITE tmp.test_table_to SELECT 1 as key, object FROM > tmp.test_table_from;{code} > The above results in Exception: > {code:java} > Error in query: unresolved operator 'OverwriteByExpression RelationV2[key#3, > object#4] spark_catalog.tmp.test_table_to, true, false; > 'OverwriteByExpression RelationV2[key#3, object#4] > spark_catalog.tmp.test_table_to, true, false > +- Project [1 AS key#0, object#2] > +- SubqueryAlias spark_catalog.tmp.test_table_from > +- HiveTableRelation [`tmp`.`test_table_from`, > org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: > [key#1, object#2], Partition Cols: []]{code} > > If the casing matches in the struct field names, the v2 write works as > expected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39484) V2 write for type struct fails to handle case sensitivity on field names during resolution of V2 write command
Edgar Rodriguez created SPARK-39484: --- Summary: V2 write for type struct fails to handle case sensitivity on field names during resolution of V2 write command Key: SPARK-39484 URL: https://issues.apache.org/jira/browse/SPARK-39484 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.1, 3.1.1 Environment: {{{}master{}}}, {{3.1.1}} Reporter: Edgar Rodriguez Summary: When a V2 write uses an input with a {{struct}} type which contains differences in the casing of field names, the {{caseSensitive}} config is not being honored, always doing a strict case sensitive comparison. Repro: {code:java} CREATE TABLE tmp.test_table_to (key int, object struct) USING ICEBERG; CREATE TABLE tmp.test_table_from (key int, object struct) USING HIVE; INSERT OVERWRITE tmp.test_table_to SELECT 1 as key, object FROM tmp.test_table_from;{code} The above results in Exception: {code:java} Error in query: unresolved operator 'OverwriteByExpression RelationV2[key#3, object#4] spark_catalog.tmp.test_table_to, true, false; 'OverwriteByExpression RelationV2[key#3, object#4] spark_catalog.tmp.test_table_to, true, false +- Project [1 AS key#0, object#2] +- SubqueryAlias spark_catalog.tmp.test_table_from +- HiveTableRelation [`tmp`.`test_table_from`, org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe, Data Cols: [key#1, object#2], Partition Cols: []]{code} If the casing matches in the struct field names, the v2 write works as expected. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39383) Support V2 data sources with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554761#comment-17554761 ] Apache Spark commented on SPARK-39383: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/36880 > Support V2 data sources with DEFAULT values > --- > > Key: SPARK-39383 > URL: https://issues.apache.org/jira/browse/SPARK-39383 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39383) Support V2 data sources with DEFAULT values
[ https://issues.apache.org/jira/browse/SPARK-39383?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554760#comment-17554760 ] Apache Spark commented on SPARK-39383: -- User 'dtenedor' has created a pull request for this issue: https://github.com/apache/spark/pull/36880 > Support V2 data sources with DEFAULT values > --- > > Key: SPARK-39383 > URL: https://issues.apache.org/jira/browse/SPARK-39383 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: Daniel >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array
[ https://issues.apache.org/jira/browse/SPARK-39483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554733#comment-17554733 ] Apache Spark commented on SPARK-39483: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36870 > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array > --- > > Key: SPARK-39483 > URL: https://issues.apache.org/jira/browse/SPARK-39483 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array
[ https://issues.apache.org/jira/browse/SPARK-39483?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554734#comment-17554734 ] Apache Spark commented on SPARK-39483: -- User 'xinrong-databricks' has created a pull request for this issue: https://github.com/apache/spark/pull/36870 > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array > --- > > Key: SPARK-39483 > URL: https://issues.apache.org/jira/browse/SPARK-39483 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array
[ https://issues.apache.org/jira/browse/SPARK-39483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39483: Assignee: (was: Apache Spark) > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array > --- > > Key: SPARK-39483 > URL: https://issues.apache.org/jira/browse/SPARK-39483 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Priority: Major > > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array
[ https://issues.apache.org/jira/browse/SPARK-39483?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39483: Assignee: Apache Spark > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array > --- > > Key: SPARK-39483 > URL: https://issues.apache.org/jira/browse/SPARK-39483 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Xinrong Meng >Assignee: Apache Spark >Priority: Major > > Construct the schema from `np.dtype` when `createDataFrame` from a NumPy > array. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39483) Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array
Xinrong Meng created SPARK-39483: Summary: Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array Key: SPARK-39483 URL: https://issues.apache.org/jira/browse/SPARK-39483 Project: Spark Issue Type: Sub-task Components: PySpark Affects Versions: 3.4.0 Reporter: Xinrong Meng Construct the schema from `np.dtype` when `createDataFrame` from a NumPy array. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39482) Add build and test documentation on IPv6
[ https://issues.apache.org/jira/browse/SPARK-39482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39482: Assignee: Dongjoon Hyun (was: Apache Spark) > Add build and test documentation on IPv6 > > > Key: SPARK-39482 > URL: https://issues.apache.org/jira/browse/SPARK-39482 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39482) Add build and test documentation on IPv6
[ https://issues.apache.org/jira/browse/SPARK-39482?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554698#comment-17554698 ] Apache Spark commented on SPARK-39482: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/36879 > Add build and test documentation on IPv6 > > > Key: SPARK-39482 > URL: https://issues.apache.org/jira/browse/SPARK-39482 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39482) Add build and test documentation on IPv6
[ https://issues.apache.org/jira/browse/SPARK-39482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39482: Assignee: Apache Spark (was: Dongjoon Hyun) > Add build and test documentation on IPv6 > > > Key: SPARK-39482 > URL: https://issues.apache.org/jira/browse/SPARK-39482 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39482) Add build and test documentation on IPv6
[ https://issues.apache.org/jira/browse/SPARK-39482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-39482: -- Summary: Add build and test documentation on IPv6 (was: Add IPv6 documentation) > Add build and test documentation on IPv6 > > > Key: SPARK-39482 > URL: https://issues.apache.org/jira/browse/SPARK-39482 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39482) Add IPv6 documentation
[ https://issues.apache.org/jira/browse/SPARK-39482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-39482: - Assignee: Dongjoon Hyun > Add IPv6 documentation > -- > > Key: SPARK-39482 > URL: https://issues.apache.org/jira/browse/SPARK-39482 > Project: Spark > Issue Type: Sub-task > Components: Documentation >Affects Versions: 3.4.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39482) Add IPv6 documentation
Dongjoon Hyun created SPARK-39482: - Summary: Add IPv6 documentation Key: SPARK-39482 URL: https://issues.apache.org/jira/browse/SPARK-39482 Project: Spark Issue Type: Sub-task Components: Documentation Affects Versions: 3.4.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39481) Pandas UDF executed twice if used in projection followed by filter
[ https://issues.apache.org/jira/browse/SPARK-39481?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Timothy Dijamco updated SPARK-39481: Description: In this scenario, a Pandas UDF will be executed twice: # Projection that applies a Pandas UDF # Filter In the {{explain}} output of the example below, the Optimized Logical Plan and Physical Plan contain {{ArrowEvalPython}} twice: {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master('local[1]').getOrCreate() df = spark.createDataFrame( [ [1, 'one'], [2, 'two'], [3, 'three'], ], 'int_col int, string_col string', ) @F.pandas_udf('int') def copy_int_col(s): return s df = df.withColumn('int_col_copy', copy_int_col(df['int_col'])) df = df.filter(F.col('int_col_copy') >= 3) df.explain(True) {code} {code:java} == Parsed Logical Plan == 'Filter ('int_col_copy >= 3) +- Project [int_col#322, string_col#323, copy_int_col(int_col#322) AS int_col_copy#327] +- LogicalRDD [int_col#322, string_col#323], false == Analyzed Logical Plan == int_col: int, string_col: string, int_col_copy: int Filter (int_col_copy#327 >= 3) +- Project [int_col#322, string_col#323, copy_int_col(int_col#322) AS int_col_copy#327] +- LogicalRDD [int_col#322, string_col#323], false == Optimized Logical Plan == Project [int_col#322, string_col#323, pythonUDF0#332 AS int_col_copy#327] +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#332], 200 +- Project [int_col#322, string_col#323] +- Filter (pythonUDF0#331 >= 3) +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#331], 200 +- LogicalRDD [int_col#322, string_col#323], false == Physical Plan == *(3) Project [int_col#322, string_col#323, pythonUDF0#332 AS int_col_copy#327] +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#332], 200 +- *(2) Project [int_col#322, string_col#323] +- *(2) Filter (pythonUDF0#331 >= 3) +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#331], 200 +- *(1) Scan ExistingRDD[int_col#322,string_col#323] {code} If the Pandas UDF is marked as non-deterministic (e.g. {{{}copy_int_col = copy_int_col.asNondeterministic(){}}}), then it is not executed twice. was: In this scenario, a Pandas UDF will be executed twice: # Projection that applies a Pandas UDF # Filter In the {{explain}} output of the example below, the Optimized Logical Plan and Physical Plan contain {{ArrowEvalPython}} twice: {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master('local[1]').getOrCreate() df = spark.createDataFrame( [ [1, 'one'], [2, 'two'], [3, 'three'], ], 'int_col int, string_col string', ) @F.pandas_udf('int') def copy_int_col(s): return s df = df.withColumn('int_col_copy', copy_int_col(df['int_col'])) df = df.filter(F.col('int_col_copy') >= 3) df.explain(True) {code} {code:java} == Parsed Logical Plan == 'Filter ('int_col_copy >= 3) +- Project [int_col#322, string_col#323, copy_int_col(int_col#322) AS int_col_copy#327] +- LogicalRDD [int_col#322, string_col#323], false == Analyzed Logical Plan == int_col: int, string_col: string, int_col_copy: int Filter (int_col_copy#327 >= 3) +- Project [int_col#322, string_col#323, copy_int_col(int_col#322) AS int_col_copy#327] +- LogicalRDD [int_col#322, string_col#323], false == Optimized Logical Plan == Project [int_col#322, string_col#323, pythonUDF0#332 AS int_col_copy#327] +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#332], 200 +- Project [int_col#322, string_col#323] +- Filter (pythonUDF0#331 >= 3) +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#331], 200 +- LogicalRDD [int_col#322, string_col#323], false == Physical Plan == *(3) Project [int_col#322, string_col#323, pythonUDF0#332 AS int_col_copy#327] +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#332], 200 +- *(2) Project [int_col#322, string_col#323] +- *(2) Filter (pythonUDF0#331 >= 3) +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#331], 200 +- *(1) Scan ExistingRDD[int_col#322,string_col#323] {code} If the Pandas UDF is marked as non-deterministic (e.g. {{{}copy_int_col = copy_int_col.asNondeterministic(){}}}, then it is not executed twice. > Pandas UDF executed twice if used in projection followed by filter > -- > > Key: SPARK-39481 > URL: https://issues.apache.org/jira/browse/SPARK-39481 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.2.1 >Reporter: Timothy Dijamco >Priority: Minor > > In this scenario, a Pandas UDF will be executed twice: >
[jira] [Created] (SPARK-39481) Pandas UDF executed twice if used in projection followed by filter
Timothy Dijamco created SPARK-39481: --- Summary: Pandas UDF executed twice if used in projection followed by filter Key: SPARK-39481 URL: https://issues.apache.org/jira/browse/SPARK-39481 Project: Spark Issue Type: Bug Components: PySpark Affects Versions: 3.2.1 Reporter: Timothy Dijamco In this scenario, a Pandas UDF will be executed twice: # Projection that applies a Pandas UDF # Filter In the {{explain}} output of the example below, the Optimized Logical Plan and Physical Plan contain {{ArrowEvalPython}} twice: {code:python} from pyspark.sql import SparkSession import pyspark.sql.functions as F spark = SparkSession.builder.master('local[1]').getOrCreate() df = spark.createDataFrame( [ [1, 'one'], [2, 'two'], [3, 'three'], ], 'int_col int, string_col string', ) @F.pandas_udf('int') def copy_int_col(s): return s df = df.withColumn('int_col_copy', copy_int_col(df['int_col'])) df = df.filter(F.col('int_col_copy') >= 3) df.explain(True) {code} {code:java} == Parsed Logical Plan == 'Filter ('int_col_copy >= 3) +- Project [int_col#322, string_col#323, copy_int_col(int_col#322) AS int_col_copy#327] +- LogicalRDD [int_col#322, string_col#323], false == Analyzed Logical Plan == int_col: int, string_col: string, int_col_copy: int Filter (int_col_copy#327 >= 3) +- Project [int_col#322, string_col#323, copy_int_col(int_col#322) AS int_col_copy#327] +- LogicalRDD [int_col#322, string_col#323], false == Optimized Logical Plan == Project [int_col#322, string_col#323, pythonUDF0#332 AS int_col_copy#327] +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#332], 200 +- Project [int_col#322, string_col#323] +- Filter (pythonUDF0#331 >= 3) +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#331], 200 +- LogicalRDD [int_col#322, string_col#323], false == Physical Plan == *(3) Project [int_col#322, string_col#323, pythonUDF0#332 AS int_col_copy#327] +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#332], 200 +- *(2) Project [int_col#322, string_col#323] +- *(2) Filter (pythonUDF0#331 >= 3) +- ArrowEvalPython [copy_int_col(int_col#322)], [pythonUDF0#331], 200 +- *(1) Scan ExistingRDD[int_col#322,string_col#323] {code} If the Pandas UDF is marked as non-deterministic (e.g. {{{}copy_int_col = copy_int_col.asNondeterministic(){}}}, then it is not executed twice. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39480: Assignee: Apache Spark > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png, image-2022-06-15-22-53-35-937.png, > image-2022-06-15-22-54-07-288.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! > > We integrated our bit-packing decode implementation into parquet-mr, tested > the parquet batch reader ability from Spark VectorizedParquetRecordReader > which get parquet column data by the batch way. We construct parquet file > with different row count and column count, the column data type is Int32, the > maximum int value is 127 which satisfies bit pack encode with bit width=7, > the count of the row is from 10k to 100 million and the count of the column > is from 1 to 4. > !image-2022-06-15-22-52-56-792.png|width=328,height=167! > !image-2022-06-15-22-53-35-937.png|width=354,height=175! > !image-2022-06-15-22-54-07-288.png|width=352,height=173! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39480: Assignee: (was: Apache Spark) > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png, image-2022-06-15-22-53-35-937.png, > image-2022-06-15-22-54-07-288.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! > > We integrated our bit-packing decode implementation into parquet-mr, tested > the parquet batch reader ability from Spark VectorizedParquetRecordReader > which get parquet column data by the batch way. We construct parquet file > with different row count and column count, the column data type is Int32, the > maximum int value is 127 which satisfies bit pack encode with bit width=7, > the count of the row is from 10k to 100 million and the count of the column > is from 1 to 4. > !image-2022-06-15-22-52-56-792.png|width=328,height=167! > !image-2022-06-15-22-53-35-937.png|width=354,height=175! > !image-2022-06-15-22-54-07-288.png|width=352,height=173! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554672#comment-17554672 ] Apache Spark commented on SPARK-39480: -- User 'Fang-Xie' has created a pull request for this issue: https://github.com/apache/spark/pull/36878 > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png, image-2022-06-15-22-53-35-937.png, > image-2022-06-15-22-54-07-288.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! > > We integrated our bit-packing decode implementation into parquet-mr, tested > the parquet batch reader ability from Spark VectorizedParquetRecordReader > which get parquet column data by the batch way. We construct parquet file > with different row count and column count, the column data type is Int32, the > maximum int value is 127 which satisfies bit pack encode with bit width=7, > the count of the row is from 10k to 100 million and the count of the column > is from 1 to 4. > !image-2022-06-15-22-52-56-792.png|width=328,height=167! > !image-2022-06-15-22-53-35-937.png|width=354,height=175! > !image-2022-06-15-22-54-07-288.png|width=352,height=173! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Description: Current Spark use Parquet-mr as parquet reader/writer library, but the built-in bit-packing en/decode is not efficient enough. Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector in Open JDK18 brings prominent performance improvement. Due to Vector API is added to OpenJDK since 16, So this optimization request JDK16 or higher. *Below are our test results* Functional test is based on open-source parquet-mr Bit-pack decoding function: *_public final void unpack8Values(final byte[] in, final int inPos, final int[] out, final int outPos)_* __ compared with our implementation with vector API *_public final void unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final int outPos)_* We tested 10 pairs (open source parquet bit unpacking vs ours optimized vectorized SIMD implementation) decode function with bit width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: !image-2022-06-15-22-50-12-759.png|width=513,height=259! We integrated our bit-packing decode implementation into parquet-mr, tested the parquet batch reader ability from Spark VectorizedParquetRecordReader which get parquet column data by the batch way. We construct parquet file with different row count and column count, the column data type is Int32, the maximum int value is 127 which satisfies bit pack encode with bit width=7, the count of the row is from 10k to 100 million and the count of the column is from 1 to 4. !image-2022-06-15-22-52-56-792.png|width=328,height=167! !image-2022-06-15-22-53-35-937.png|width=354,height=175! !image-2022-06-15-22-54-07-288.png|width=352,height=173! was: Current Spark use Parquet-mr as parquet reader/writer library, but the built-in bit-packing en/decode is not efficient enough. Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector in Open JDK18 brings prominent performance improvement. Due to Vector API is added to OpenJDK since 16, So this optimization request JDK16 or higher. *Below are our test results* Functional test is based on open-source parquet-mr Bit-pack decoding function: *_public final void unpack8Values(final byte[] in, final int inPos, final int[] out, final int outPos)_* __ compared with our implementation with vector API *_public final void unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final int outPos)_* We tested 10 pairs (open source parquet bit unpacking vs ours optimized vectorized SIMD implementation) decode function with bit width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: !image-2022-06-15-22-50-12-759.png|width=513,height=259! > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png, image-2022-06-15-22-53-35-937.png, > image-2022-06-15-22-54-07-288.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! > > We integrated our bit-packing decode implementation into parquet-mr, tested > the parquet batch reader ability from Spark VectorizedParquetRecordReader > which get parquet column data by the batch way. We construct parquet file > with different row count and column count, the column data type is Int32, the > maximum int value is 127 which satisfies bit pack encode with bit width=7, > the count of the row is from 10k to 100 million and the count of the column > is from 1 to 4. > !image-2022-06-15-22-52-56-792.png|width=328,height=167! > !i
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Attachment: image-2022-06-15-22-54-07-288.png > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png, image-2022-06-15-22-53-35-937.png, > image-2022-06-15-22-54-07-288.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Attachment: image-2022-06-15-22-53-35-937.png > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png, image-2022-06-15-22-53-35-937.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Attachment: image-2022-06-15-22-52-56-792.png > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png, > image-2022-06-15-22-52-56-792.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Attachment: (was: image-2022-06-15-22-48-46-554.png) > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-50-12-759.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Description: Current Spark use Parquet-mr as parquet reader/writer library, but the built-in bit-packing en/decode is not efficient enough. Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector in Open JDK18 brings prominent performance improvement. Due to Vector API is added to OpenJDK since 16, So this optimization request JDK16 or higher. *Below are our test results* Functional test is based on open-source parquet-mr Bit-pack decoding function: *_public final void unpack8Values(final byte[] in, final int inPos, final int[] out, final int outPos)_* __ compared with our implementation with vector API *_public final void unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final int outPos)_* We tested 10 pairs (open source parquet bit unpacking vs ours optimized vectorized SIMD implementation) decode function with bit width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: !image-2022-06-15-22-50-12-759.png|width=513,height=259! was: Current Spark use Parquet-mr as parquet reader/writer library, but the built-in bit-packing en/decode is not efficient enough. Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector in Open JDK18 brings prominent performance improvement. Due to Vector API is added to OpenJDK since 16, So this optimization request JDK16 or higher. *Below are our test results* Functional test is based on open-source parquet-mr Bit-pack decoding function: *_public final void unpack8Values(final byte[] in, final int inPos, final int[] out, final int outPos)_* __ compared with our implementation with vector API *_public final void unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final int outPos)_* We tested 10 pairs (open source parquet bit unpacking vs ours optimized vectorized SIMD implementation) decode function with bit width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-48-46-554.png, > image-2022-06-15-22-50-12-759.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > !image-2022-06-15-22-50-12-759.png|width=513,height=259! -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Attachment: image-2022-06-15-22-50-12-759.png > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-48-46-554.png, > image-2022-06-15-22-50-12-759.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554639#comment-17554639 ] Fang-Xie commented on SPARK-39480: -- !image-2022-06-15-22-48-46-554.png! > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-48-46-554.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480 ] Fang-Xie deleted comment on SPARK-39480: -- was (Author: JIRAUSER288151): !image-2022-06-15-22-48-46-554.png! > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-48-46-554.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39480) Parquet bit-packing de/encode optimization
[ https://issues.apache.org/jira/browse/SPARK-39480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Xie updated SPARK-39480: - Attachment: image-2022-06-15-22-48-46-554.png > Parquet bit-packing de/encode optimization > -- > > Key: SPARK-39480 > URL: https://issues.apache.org/jira/browse/SPARK-39480 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Fang-Xie >Priority: Major > Fix For: 3.3.0 > > Attachments: image-2022-06-15-22-48-46-554.png > > > Current Spark use Parquet-mr as parquet reader/writer library, but the > built-in bit-packing en/decode is not efficient enough. > Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector > in Open JDK18 brings prominent performance improvement. > Due to Vector API is added to OpenJDK since 16, So this optimization request > JDK16 or higher. > *Below are our test results* > Functional test is based on open-source parquet-mr Bit-pack decoding > function: *_public final void unpack8Values(final byte[] in, final int inPos, > final int[] out, final int outPos)_* __ > compared with our implementation with vector API *_public final void > unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final > int outPos)_* > We tested 10 pairs (open source parquet bit unpacking vs ours optimized > vectorized SIMD implementation) decode function with bit > width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-39480) Parquet bit-packing de/encode optimization
Fang-Xie created SPARK-39480: Summary: Parquet bit-packing de/encode optimization Key: SPARK-39480 URL: https://issues.apache.org/jira/browse/SPARK-39480 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Fang-Xie Fix For: 3.3.0 Current Spark use Parquet-mr as parquet reader/writer library, but the built-in bit-packing en/decode is not efficient enough. Our optimization for Parquet bit-packing en/decode with jdk.incubator.vector in Open JDK18 brings prominent performance improvement. Due to Vector API is added to OpenJDK since 16, So this optimization request JDK16 or higher. *Below are our test results* Functional test is based on open-source parquet-mr Bit-pack decoding function: *_public final void unpack8Values(final byte[] in, final int inPos, final int[] out, final int outPos)_* __ compared with our implementation with vector API *_public final void unpack8Values_vec(final byte[] in, final int inPos, final int[] out, final int outPos)_* We tested 10 pairs (open source parquet bit unpacking vs ours optimized vectorized SIMD implementation) decode function with bit width=\{1,2,3,4,5,6,7,8,9,10}, below are test results: -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-39074: - Priority: Minor (was: Major) > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Minor > Fix For: 3.4.0 > > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen reassigned SPARK-39074: Assignee: Enrico Minack > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39074) Fail on uploading test files, not when downloading them
[ https://issues.apache.org/jira/browse/SPARK-39074?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-39074. -- Fix Version/s: 3.4.0 Resolution: Fixed Issue resolved by pull request 36413 [https://github.com/apache/spark/pull/36413] > Fail on uploading test files, not when downloading them > --- > > Key: SPARK-39074 > URL: https://issues.apache.org/jira/browse/SPARK-39074 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.4.0 >Reporter: Enrico Minack >Assignee: Enrico Minack >Priority: Major > Fix For: 3.4.0 > > > The CI workflow "Report test results" fails when there are no artifacts to be > downloaded from the triggering workflow. In some situations, the triggering > workflow is not skipped, but all test jobs are skipped in case no code > changes are detected. > In that situation, no test files are uploaded, which makes the triggered > workflow fail. > Downloading no test files can have two reasons: > 1. No tests have been executed or no test files have been generated. > 2. No code has been built and tested deliberately. > You want to be notified in the first situation to fix the CI. Therefore, CI > should fail when code is built and tests are run but no test result files are > been found. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-38292) Support `na_filter` for pyspark.pandas.read_csv
[ https://issues.apache.org/jira/browse/SPARK-38292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554551#comment-17554551 ] pralabhkumar commented on SPARK-38292: -- [~itholic] I would like to work on this . > Support `na_filter` for pyspark.pandas.read_csv > --- > > Key: SPARK-38292 > URL: https://issues.apache.org/jira/browse/SPARK-38292 > Project: Spark > Issue Type: Improvement > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Haejoon Lee >Priority: Major > > pandas support `na_filter` parameter for `read_csv` function. > (https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html) > We also want to support this to follow the behavior of pandas. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-39477) Remove "Number of queries" info from the golden files of SQLQueryTestSuite
[ https://issues.apache.org/jira/browse/SPARK-39477?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-39477. -- Fix Version/s: 3.4.0 Resolution: Fixed Fixed in https://github.com/apache/spark/pull/36875 > Remove "Number of queries" info from the golden files of SQLQueryTestSuite > -- > > Key: SPARK-39477 > URL: https://issues.apache.org/jira/browse/SPARK-39477 > Project: Spark > Issue Type: Test > Components: Tests >Affects Versions: 3.4.0 >Reporter: Gengliang Wang >Assignee: Gengliang Wang >Priority: Minor > Fix For: 3.4.0 > > -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39479) DS V2 supports push down math functions(non ANSI)
[ https://issues.apache.org/jira/browse/SPARK-39479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554463#comment-17554463 ] Apache Spark commented on SPARK-39479: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36877 > DS V2 supports push down math functions(non ANSI) > - > > Key: SPARK-39479 > URL: https://issues.apache.org/jira/browse/SPARK-39479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark have a lot math functions which is not defined in ANSI > standard. > But these functions is commonly used. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39479) DS V2 supports push down math functions(non ANSI)
[ https://issues.apache.org/jira/browse/SPARK-39479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39479: Assignee: Apache Spark > DS V2 supports push down math functions(non ANSI) > - > > Key: SPARK-39479 > URL: https://issues.apache.org/jira/browse/SPARK-39479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, Spark have a lot math functions which is not defined in ANSI > standard. > But these functions is commonly used. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-39479) DS V2 supports push down math functions(non ANSI)
[ https://issues.apache.org/jira/browse/SPARK-39479?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-39479: Assignee: (was: Apache Spark) > DS V2 supports push down math functions(non ANSI) > - > > Key: SPARK-39479 > URL: https://issues.apache.org/jira/browse/SPARK-39479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark have a lot math functions which is not defined in ANSI > standard. > But these functions is commonly used. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-39479) DS V2 supports push down math functions(non ANSI)
[ https://issues.apache.org/jira/browse/SPARK-39479?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17554462#comment-17554462 ] Apache Spark commented on SPARK-39479: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/36877 > DS V2 supports push down math functions(non ANSI) > - > > Key: SPARK-39479 > URL: https://issues.apache.org/jira/browse/SPARK-39479 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.4.0 >Reporter: jiaan.geng >Priority: Major > > Currently, Spark have a lot math functions which is not defined in ANSI > standard. > But these functions is commonly used. -- This message was sent by Atlassian Jira (v8.20.7#820007) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org