[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471014#comment-17471014 ] jingxiong zhong commented on SPARK-37708: - [~hyukjin.kwon]In the end, we found that the operating system was different and that python would not run in the image.If we use centos system, it can work normally > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37708: Assignee: Apache Spark > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Assignee: Apache Spark >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37708: Assignee: (was: Apache Spark) > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471012#comment-17471012 ] Apache Spark commented on SPARK-37708: -- User 'zhongjingxiong' has created a pull request for this issue: https://github.com/apache/spark/pull/35142 > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s
[ https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471013#comment-17471013 ] Apache Spark commented on SPARK-37708: -- User 'zhongjingxiong' has created a pull request for this issue: https://github.com/apache/spark/pull/35142 > pyspark adding third-party Dependencies on k8s > -- > > Key: SPARK-37708 > URL: https://issues.apache.org/jira/browse/SPARK-37708 > Project: Spark > Issue Type: Bug > Components: Kubernetes, PySpark >Affects Versions: 3.2.0 > Environment: pyspark3.2 >Reporter: jingxiong zhong >Priority: Major > > I have a question about that how do I add my Python dependencies to Spark > Job, as following > {code:sh} > spark-submit \ > --archives s3a://path/python3.6.9.tgz#python3.6.9 \ > --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \ > --conf "spark.pyspark.python=python3.6.9/bin/python3" \ > --name "piroottest" \ > ./examples/src/main/python/pi.py 10 > {code} > this can't run my job sucessfully,it throw error > {code:sh} > Traceback (most recent call last): > File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in > > from pyspark.sql import SparkSession > File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in > > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, > in > File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in > > async def _ag(): > File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", > line 7, in > from _ctypes import Union, Structure, Array > ImportError: libffi.so.6: cannot open shared object file: No such file or > directory > {code} > Or is there another way to add Python dependencies? -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37843: - Assignee: Dongjoon Hyun (was: Apache Spark) > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37843. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35141 [https://github.com/apache/spark/pull/35141] > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470963#comment-17470963 ] Apache Spark commented on SPARK-37843: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/35141 > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37843: Assignee: Apache Spark > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37843: Assignee: Apache Spark > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
[ https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37843: Assignee: (was: Apache Spark) > Suppress NoSuchFieldError at setMDCForTask > -- > > Key: SPARK-37843 > URL: https://issues.apache.org/jira/browse/SPARK-37843 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > > {code} > 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task > launch worker-0" java.lang.NoSuchFieldError: mdc > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.slf4j.MDC.(MDC.java:108) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) > 00:57:11 2022-01-07 15:57:11.693 - stderr>at > java.base/java.lang.Thread.run(Thread.java:833) > {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask
Dongjoon Hyun created SPARK-37843: - Summary: Suppress NoSuchFieldError at setMDCForTask Key: SPARK-37843 URL: https://issues.apache.org/jira/browse/SPARK-37843 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 3.3.0 Reporter: Dongjoon Hyun {code} 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task launch worker-0" java.lang.NoSuchFieldError: mdc 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11) 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38) 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59) 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99) 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.slf4j.MDC.(MDC.java:108) 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750) 00:57:11 2022-01-07 15:57:11.693 - stderr> at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441) 00:57:11 2022-01-07 15:57:11.693 - stderr> at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) 00:57:11 2022-01-07 15:57:11.693 - stderr> at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) 00:57:11 2022-01-07 15:57:11.693 - stderr> at java.base/java.lang.Thread.run(Thread.java:833) {code} -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470870#comment-17470870 ] Apache Spark commented on SPARK-37829: -- User 'cdegroc' has created a pull request for this issue: https://github.com/apache/spark/pull/35140 > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470867#comment-17470867 ] Clément de Groc commented on SPARK-37829: - Opened two equivalent PRs: one with a fix and one with a revert. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470866#comment-17470866 ] Clément de Groc commented on SPARK-37829: - CC [~cloud_fan]. Tagging you as you're the original author and you might have more context and/or ideas on how to best solve this. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException
[ https://issues.apache.org/jira/browse/SPARK-37771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470842#comment-17470842 ] Steve Loughran commented on SPARK-37771: probably related to HADOOP-17372, which makes sure the hive classloader isn't picked up for class lookups in the config try with hadoop 3.3.1 binaries > Race condition in withHiveState and limited logic in IsolatedClientLoader > result in ClassNotFoundException > -- > > Key: SPARK-37771 > URL: https://issues.apache.org/jira/browse/SPARK-37771 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.1.0, 3.1.2, 3.2.0 >Reporter: Ivan Sadikov >Priority: Major > > There is a race condition between creating a Hive client and loading classes > that do not appear in shared prefixes config. For example, we confirmed that > the code fails for the following configuration: > {code:java} > spark.sql.hive.metastore.version 0.13.0 > spark.sql.hive.metastore.jars maven > spark.sql.hive.metastore.sharedPrefixes com.amazonaws prefix> > spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem{code} > And code: > {code:java} > -- Prerequisite commands to set up the table > -- drop table if exists ivan_test_2; > -- create table ivan_test_2 (a int, part string) using csv location > 's3://bucket/hive-test' partitioned by (part); > -- insert into ivan_test_2 values (1, 'a'); > -- Command that triggers failure > ALTER TABLE ivan_test_2 ADD PARTITION (part='b') LOCATION > 's3://bucket/hive-test'{code} > > Stacktrace (line numbers might differ): > {code:java} > 21/12/22 04:37:05 DEBUG IsolatedClientLoader: shared class: > org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider > 21/12/22 04:37:05 DEBUG IsolatedClientLoader: shared class: > org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider > 21/12/22 04:37:05 DEBUG IsolatedClientLoader: hive class: > com.amazonaws.auth.EnvironmentVariableCredentialsProvider - null > 21/12/22 04:37:05 ERROR S3AFileSystem: Failed to initialize S3AFileSystem for > path s3://bucket/hive-test > java.io.IOException: From option fs.s3a.aws.credentials.provider > java.lang.ClassNotFoundException: Class > com.amazonaws.auth.EnvironmentVariableCredentialsProvider not found > at > org.apache.hadoop.fs.s3a.S3AUtils.loadAWSProviderClasses(S3AUtils.java:725) > at > org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:688) > at > org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:411) > at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469) > at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174) > at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574) > at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521) > at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540) > at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365) > at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:112) > at > org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:144) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createLocationForAddedPartition(HiveMetaStore.java:1993) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:1865) > at > org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:1910) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105) > at com.sun.proxy.$Proxy58.add_partitions_req(Unknown Source) > at > org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:457) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at > org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89) > at com.sun.proxy.$Proxy59.add_partitions(Unknown Source) > at > org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:1514) > at > org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:773) >
[jira] [Assigned] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37829: Assignee: Apache Spark > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Assignee: Apache Spark >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37829: Assignee: (was: Apache Spark) > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470824#comment-17470824 ] Apache Spark commented on SPARK-37829: -- User 'cdegroc' has created a pull request for this issue: https://github.com/apache/spark/pull/35139 > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clément de Groc updated SPARK-37829: Description: Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with {{null}} values in Spark 3+. The issue can be reproduced with [the following test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. The problem only arises when working with DataFrames: Datasets of case classes work as expected as demonstrated by [this other test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. I couldn't find an explanation for this change in the Migration guide so I'm assuming this is a bug. A {{git bisect}} pointed me to [that commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. Reverting the commit solves the problem. A similar solution, but without reverting, is shown [here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3]. Happy to help if you think of another approach / can provide some guidance. was: Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with {{null}} values in Spark 3+. The issue can be reproduced with [the following test|https://github.com/cdegroc/spark/commit/a499805bc7f40e741bfa9badec2588972e53d604] that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. The problem only arises when working with DataFrames: Datasets of case classes work as expected as demonstrated by [this other test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. I couldn't find an explanation for this change in the Migration guide so I'm assuming this is a bug. A {{git bisect}} pointed me to [that commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. Reverting the commit solves the problem. A similar solution, but without reverting, is shown [here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3]. Happy to help if you think of another approach / can provide some guidance. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values
[ https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Clément de Groc updated SPARK-37829: Description: Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with {{null}} values in Spark 3+. The issue can be reproduced with [the following test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. The problem only arises when working with DataFrames: Datasets of case classes work as expected as demonstrated by [this other test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. I couldn't find an explanation for this change in the Migration guide so I'm assuming this is a bug. A {{git bisect}} pointed me to [that commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. Reverting the commit solves the problem. A similar solution, but without reverting, is shown [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. Happy to help if you think of another approach / can provide some guidance. was: Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with {{null}} values in Spark 3+. The issue can be reproduced with [the following test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. The problem only arises when working with DataFrames: Datasets of case classes work as expected as demonstrated by [this other test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. I couldn't find an explanation for this change in the Migration guide so I'm assuming this is a bug. A {{git bisect}} pointed me to [that commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. Reverting the commit solves the problem. A similar solution, but without reverting, is shown [here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3]. Happy to help if you think of another approach / can provide some guidance. > An outer-join using joinWith on DataFrames returns Rows with null fields > instead of null values > --- > > Key: SPARK-37829 > URL: https://issues.apache.org/jira/browse/SPARK-37829 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0 >Reporter: Clément de Groc >Priority: Major > > Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return > missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with > {{null}} values in Spark 3+. > The issue can be reproduced with [the following > test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5] > that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0. > The problem only arises when working with DataFrames: Datasets of case > classes work as expected as demonstrated by [this other > test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223]. > I couldn't find an explanation for this change in the Migration guide so I'm > assuming this is a bug. > A {{git bisect}} pointed me to [that > commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59]. > Reverting the commit solves the problem. > A similar solution, but without reverting, is shown > [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a]. > Happy to help if you think of another approach / can provide some guidance. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35703) Relax constraint for Spark bucket join and remove HashClusteredDistribution
[ https://issues.apache.org/jira/browse/SPARK-35703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470796#comment-17470796 ] Apache Spark commented on SPARK-35703: -- User 'cloud-fan' has created a pull request for this issue: https://github.com/apache/spark/pull/35138 > Relax constraint for Spark bucket join and remove HashClusteredDistribution > --- > > Key: SPARK-35703 > URL: https://issues.apache.org/jira/browse/SPARK-35703 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: Chao Sun >Assignee: Chao Sun >Priority: Major > Fix For: 3.3.0 > > > Currently Spark has {{HashClusteredDistribution}} and > {{ClusteredDistribution}}. The only difference between the two is that the > former is more strict when deciding whether bucket join is allowed to avoid > shuffle: comparing to the latter, it requires *exact* match between the > clustering keys from the output partitioning (i.e., {{HashPartitioning}}) and > the join keys. However, this is unnecessary, as we should be able to avoid > shuffle when the set of clustering keys is a subset of join keys, just like > {{ClusteredDistribution}}. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37840) Dynamically update the loaded Hive UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470793#comment-17470793 ] Rakesh Raushan commented on SPARK-37840: We can dynamically update our UDF jars after loading them. I will try to raise a PR soon for this. > Dynamically update the loaded Hive UDF JAR > -- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37417) Inline type hints for python/pyspark/ml/linalg/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470783#comment-17470783 ] Maciej Szymkiewicz commented on SPARK-37417: {{ml}} and {{mllib}} linalg types should be consistent, so let's wait with this one until the other one is resolved. > Inline type hints for python/pyspark/ml/linalg/__init__.py > -- > > Key: SPARK-37417 > URL: https://issues.apache.org/jira/browse/SPARK-37417 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/linalg/__init__.pyi to > python/pyspark/ml/linalg/__init__.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37415) Inline type hints for python/pyspark/ml/util.py
[ https://issues.apache.org/jira/browse/SPARK-37415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470768#comment-17470768 ] Maciej Szymkiewicz commented on SPARK-37415: I am going to handle this one, once prerequisites are resolved. > Inline type hints for python/pyspark/ml/util.py > --- > > Key: SPARK-37415 > URL: https://issues.apache.org/jira/browse/SPARK-37415 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/util.pyi to > python/pyspark/ml/util.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37416) Inline type hints for python/pyspark/ml/wrapper.py
[ https://issues.apache.org/jira/browse/SPARK-37416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470767#comment-17470767 ] Maciej Szymkiewicz commented on SPARK-37416: I am going to handle this one, once prerequisites are resolved. > Inline type hints for python/pyspark/ml/wrapper.py > -- > > Key: SPARK-37416 > URL: https://issues.apache.org/jira/browse/SPARK-37416 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/wrapper.pyi to > python/pyspark/ml/wrapper.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37837) Enable black formatter in dev Python scripts
[ https://issues.apache.org/jira/browse/SPARK-37837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37837. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35127 [https://github.com/apache/spark/pull/35127] > Enable black formatter in dev Python scripts > > > Key: SPARK-37837 > URL: https://issues.apache.org/jira/browse/SPARK-37837 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0 > > > black formatter is only enabled for python/pyspark to minimize side-effects > e.g., reformating auto generated or thrid-party Python scripts. > This JIRA aims to enable black formatter in dev directory where there's no > generated Python scripts to exclude. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37837) Enable black formatter in dev Python scripts
[ https://issues.apache.org/jira/browse/SPARK-37837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37837: - Assignee: Hyukjin Kwon > Enable black formatter in dev Python scripts > > > Key: SPARK-37837 > URL: https://issues.apache.org/jira/browse/SPARK-37837 > Project: Spark > Issue Type: Improvement > Components: Project Infra >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > black formatter is only enabled for python/pyspark to minimize side-effects > e.g., reformating auto generated or thrid-party Python scripts. > This JIRA aims to enable black formatter in dev directory where there's no > generated Python scripts to exclude. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37419) Inline type hints for python/pyspark/ml/param/shared.py
[ https://issues.apache.org/jira/browse/SPARK-37419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz resolved SPARK-37419. Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 34674 [https://github.com/apache/spark/pull/34674] > Inline type hints for python/pyspark/ml/param/shared.py > --- > > Key: SPARK-37419 > URL: https://issues.apache.org/jira/browse/SPARK-37419 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > Fix For: 3.3.0 > > > Inline type hints from python/pyspark/ml/param/shared.pyi to > python/pyspark/ml/param/shared.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37419) Inline type hints for python/pyspark/ml/param/shared.py
[ https://issues.apache.org/jira/browse/SPARK-37419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37419: -- Assignee: Maciej Szymkiewicz > Inline type hints for python/pyspark/ml/param/shared.py > --- > > Key: SPARK-37419 > URL: https://issues.apache.org/jira/browse/SPARK-37419 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/param/shared.pyi to > python/pyspark/ml/param/shared.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37836) Enable more flake8 rules for PEP 8 compliance
[ https://issues.apache.org/jira/browse/SPARK-37836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-37836. -- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35126 [https://github.com/apache/spark/pull/35126] > Enable more flake8 rules for PEP 8 compliance > - > > Key: SPARK-37836 > URL: https://issues.apache.org/jira/browse/SPARK-37836 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > Fix For: 3.3.0 > > > Most of disabled linter rules here: > https://github.com/apache/spark/blob/master/dev/tox.ini#L19-L31 > should be enabled to comply PEP8. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37836) Enable more flake8 rules for PEP 8 compliance
[ https://issues.apache.org/jira/browse/SPARK-37836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-37836: Assignee: Hyukjin Kwon > Enable more flake8 rules for PEP 8 compliance > - > > Key: SPARK-37836 > URL: https://issues.apache.org/jira/browse/SPARK-37836 > Project: Spark > Issue Type: Improvement > Components: Project Infra, PySpark >Affects Versions: 3.3.0 >Reporter: Hyukjin Kwon >Assignee: Hyukjin Kwon >Priority: Major > > Most of disabled linter rules here: > https://github.com/apache/spark/blob/master/dev/tox.ini#L19-L31 > should be enabled to comply PEP8. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37397) Inline type hints for python/pyspark/ml/base.py
[ https://issues.apache.org/jira/browse/SPARK-37397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470654#comment-17470654 ] Maciej Szymkiewicz commented on SPARK-37397: I'll handle this once SPARK-37418 prerequisites are resolved. > Inline type hints for python/pyspark/ml/base.py > --- > > Key: SPARK-37397 > URL: https://issues.apache.org/jira/browse/SPARK-37397 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/base.pyi to > python/pyspark/ml/base.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37397) Inline type hints for python/pyspark/ml/base.py
[ https://issues.apache.org/jira/browse/SPARK-37397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37397: -- Assignee: (was: Maciej Szymkiewicz) > Inline type hints for python/pyspark/ml/base.py > --- > > Key: SPARK-37397 > URL: https://issues.apache.org/jira/browse/SPARK-37397 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/base.pyi to > python/pyspark/ml/base.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37418) Inline type hints for python/pyspark/ml/param/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37418: Assignee: (was: Apache Spark) > Inline type hints for python/pyspark/ml/param/__init__.py > - > > Key: SPARK-37418 > URL: https://issues.apache.org/jira/browse/SPARK-37418 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/param/__init__.pyi to > python/pyspark/ml/param/__init__.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37418) Inline type hints for python/pyspark/ml/param/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470651#comment-17470651 ] Apache Spark commented on SPARK-37418: -- User 'zero323' has created a pull request for this issue: https://github.com/apache/spark/pull/35136 > Inline type hints for python/pyspark/ml/param/__init__.py > - > > Key: SPARK-37418 > URL: https://issues.apache.org/jira/browse/SPARK-37418 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/param/__init__.pyi to > python/pyspark/ml/param/__init__.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37397) Inline type hints for python/pyspark/ml/base.py
[ https://issues.apache.org/jira/browse/SPARK-37397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Maciej Szymkiewicz reassigned SPARK-37397: -- Assignee: Maciej Szymkiewicz > Inline type hints for python/pyspark/ml/base.py > --- > > Key: SPARK-37397 > URL: https://issues.apache.org/jira/browse/SPARK-37397 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Maciej Szymkiewicz >Priority: Major > > Inline type hints from python/pyspark/ml/base.pyi to > python/pyspark/ml/base.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37418) Inline type hints for python/pyspark/ml/param/__init__.py
[ https://issues.apache.org/jira/browse/SPARK-37418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37418: Assignee: Apache Spark > Inline type hints for python/pyspark/ml/param/__init__.py > - > > Key: SPARK-37418 > URL: https://issues.apache.org/jira/browse/SPARK-37418 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.3.0 >Reporter: Maciej Szymkiewicz >Assignee: Apache Spark >Priority: Major > > Inline type hints from python/pyspark/ml/param/__init__.pyi to > python/pyspark/ml/param/__init__.py. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37193) DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins
[ https://issues.apache.org/jira/browse/SPARK-37193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37193: Assignee: Apache Spark (was: Eugene Koifman) > DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer > joins > -- > > Key: SPARK-37193 > URL: https://issues.apache.org/jira/browse/SPARK-37193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Eugene Koifman >Assignee: Apache Spark >Priority: Major > Fix For: 3.3.0 > > > {{DynamicJoinSelection.shouldDemoteBroadcastHashJoin}} will prevent AQE from > converting Sort merge join into a broadcast join because SMJ is faster when > the side that would be broadcast has a lot of empty partitions. > This makes sense for inner joins which can short circuit if one side is > empty. > For (left,right) outer join, the streaming side still has to be processed so > demoting broadcast join doesn't have the same advantage. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37193) DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins
[ https://issues.apache.org/jira/browse/SPARK-37193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37193: Assignee: Eugene Koifman (was: Apache Spark) > DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer > joins > -- > > Key: SPARK-37193 > URL: https://issues.apache.org/jira/browse/SPARK-37193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.3.0 > > > {{DynamicJoinSelection.shouldDemoteBroadcastHashJoin}} will prevent AQE from > converting Sort merge join into a broadcast join because SMJ is faster when > the side that would be broadcast has a lot of empty partitions. > This makes sense for inner joins which can short circuit if one side is > empty. > For (left,right) outer join, the streaming side still has to be processed so > demoting broadcast join doesn't have the same advantage. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-37193) DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins
[ https://issues.apache.org/jira/browse/SPARK-37193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reopened SPARK-37193: - > DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer > joins > -- > > Key: SPARK-37193 > URL: https://issues.apache.org/jira/browse/SPARK-37193 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.1 >Reporter: Eugene Koifman >Assignee: Eugene Koifman >Priority: Major > Fix For: 3.3.0 > > > {{DynamicJoinSelection.shouldDemoteBroadcastHashJoin}} will prevent AQE from > converting Sort merge join into a broadcast join because SMJ is faster when > the side that would be broadcast has a lot of empty partitions. > This makes sense for inner joins which can short circuit if one side is > empty. > For (left,right) outer join, the streaming side still has to be processed so > demoting broadcast join doesn't have the same advantage. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35442) Support propagate empty relation through aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470581#comment-17470581 ] Apache Spark commented on SPARK-35442: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/35135 > Support propagate empty relation through aggregate > -- > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35442) Support propagate empty relation through aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35442: Assignee: Apache Spark > Support propagate empty relation through aggregate > -- > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Assignee: Apache Spark >Priority: Minor > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-35442) Support propagate empty relation through aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-35442: Assignee: (was: Apache Spark) > Support propagate empty relation through aggregate > -- > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-35442) Support propagate empty relation through aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470580#comment-17470580 ] Apache Spark commented on SPARK-35442: -- User 'ulysses-you' has created a pull request for this issue: https://github.com/apache/spark/pull/35135 > Support propagate empty relation through aggregate > -- > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470508#comment-17470508 ] Hyukjin Kwon commented on SPARK-37833: -- Reverted at https://github.com/apache/spark/commit/213c299cc615afafc1b8b244aa84ddefc99bd614 and https://github.com/apache/spark/commit/11950d02b7a51552a824119f7764c2fede9c4c0d > Add `precondition` job for skip the main GitHub Action jobs > --- > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon updated SPARK-37833: - Fix Version/s: (was: 3.3.0) > Add `precondition` job for skip the main GitHub Action jobs > --- > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reopened SPARK-37833: -- Assignee: (was: Dongjoon Hyun) > Add `precondition` job for skip the main GitHub Action jobs > --- > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35442) Support propagate empty relation through aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-35442: -- Summary: Support propagate empty relation through aggregate (was: Eliminate unnecessary join through Aggregate) > Support propagate empty relation through aggregate > -- > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-35442) Eliminate unnecessary join through Aggregate
[ https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] XiDuo You updated SPARK-35442: -- Description: The Aggregate in AQE is different with others, the `LogicalQueryStage` looks like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this case specially. Logically, if the Aggregate grouping expression is not empty, we can eliminate it safely. was: If Aggregate and Join have the same output partitioning, the plan will look like: {code:java} SortMergeJoin Sort HashAggregate Shuffle Sort xxx{code} Currently `EliminateUnnecessaryJoin` doesn't support optimize this case. Logically, if the Aggregate grouping expression is not empty, we can eliminate it safely. > Eliminate unnecessary join through Aggregate > > > Key: SPARK-35442 > URL: https://issues.apache.org/jira/browse/SPARK-35442 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.2.0 >Reporter: XiDuo You >Priority: Minor > > The Aggregate in AQE is different with others, the `LogicalQueryStage` looks > like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this > case specially. > Logically, if the Aggregate grouping expression is not empty, we can > eliminate it safely. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470496#comment-17470496 ] Apache Spark commented on SPARK-37833: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/35133 > Add `precondition` job for skip the main GitHub Action jobs > --- > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470495#comment-17470495 ] Apache Spark commented on SPARK-37833: -- User 'HyukjinKwon' has created a pull request for this issue: https://github.com/apache/spark/pull/35133 > Add `precondition` job for skip the main GitHub Action jobs > --- > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code
[ https://issues.apache.org/jira/browse/SPARK-37842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37842: Assignee: Apache Spark > Use `multi-catch` to simplify duplicate exception handling behavior in Java > code > > > Key: SPARK-37842 > URL: https://issues.apache.org/jira/browse/SPARK-37842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Assignee: Apache Spark >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code
[ https://issues.apache.org/jira/browse/SPARK-37842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470494#comment-17470494 ] Apache Spark commented on SPARK-37842: -- User 'LuciferYang' has created a pull request for this issue: https://github.com/apache/spark/pull/35134 > Use `multi-catch` to simplify duplicate exception handling behavior in Java > code > > > Key: SPARK-37842 > URL: https://issues.apache.org/jira/browse/SPARK-37842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code
[ https://issues.apache.org/jira/browse/SPARK-37842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37842: Assignee: (was: Apache Spark) > Use `multi-catch` to simplify duplicate exception handling behavior in Java > code > > > Key: SPARK-37842 > URL: https://issues.apache.org/jira/browse/SPARK-37842 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code
Yang Jie created SPARK-37842: Summary: Use `multi-catch` to simplify duplicate exception handling behavior in Java code Key: SPARK-37842 URL: https://issues.apache.org/jira/browse/SPARK-37842 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file
[ https://issues.apache.org/jira/browse/SPARK-37841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37841: Assignee: Apache Spark > BasicWriteTaskStatsTracker should not try get status for a skipped file > --- > > Key: SPARK-37841 > URL: https://issues.apache.org/jira/browse/SPARK-37841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Major > > https://github.com/apache/spark/pull/35117#issuecomment-1007171965 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file
[ https://issues.apache.org/jira/browse/SPARK-37841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37841: Assignee: (was: Apache Spark) > BasicWriteTaskStatsTracker should not try get status for a skipped file > --- > > Key: SPARK-37841 > URL: https://issues.apache.org/jira/browse/SPARK-37841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Priority: Major > > https://github.com/apache/spark/pull/35117#issuecomment-1007171965 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file
[ https://issues.apache.org/jira/browse/SPARK-37841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470485#comment-17470485 ] Apache Spark commented on SPARK-37841: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/35132 > BasicWriteTaskStatsTracker should not try get status for a skipped file > --- > > Key: SPARK-37841 > URL: https://issues.apache.org/jira/browse/SPARK-37841 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.0, 3.3.0 >Reporter: Kent Yao >Priority: Major > > https://github.com/apache/spark/pull/35117#issuecomment-1007171965 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file
Kent Yao created SPARK-37841: Summary: BasicWriteTaskStatsTracker should not try get status for a skipped file Key: SPARK-37841 URL: https://issues.apache.org/jira/browse/SPARK-37841 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.2.0, 3.3.0 Reporter: Kent Yao https://github.com/apache/spark/pull/35117#issuecomment-1007171965 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-37833) Add `precondition` jobs for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-37833. --- Fix Version/s: 3.3.0 Resolution: Fixed Issue resolved by pull request 35121 [https://github.com/apache/spark/pull/35121] > Add `precondition` jobs for skip the main GitHub Action jobs > > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-37833: -- Summary: Add `precondition` job for skip the main GitHub Action jobs (was: Add `precondition` jobs for skip the main GitHub Action jobs) > Add `precondition` job for skip the main GitHub Action jobs > --- > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.3.0 > > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37833) Add `precondition` jobs for skip the main GitHub Action jobs
[ https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-37833: - Assignee: Dongjoon Hyun > Add `precondition` jobs for skip the main GitHub Action jobs > > > Key: SPARK-37833 > URL: https://issues.apache.org/jira/browse/SPARK-37833 > Project: Spark > Issue Type: Improvement > Components: Project Infra, Tests >Affects Versions: 3.3.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37827) Put the some built-in table properties into V1Table.propertie to adapt to V2 command
[ https://issues.apache.org/jira/browse/SPARK-37827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37827: Assignee: (was: Apache Spark) > Put the some built-in table properties into V1Table.propertie to adapt to V2 > command > > > Key: SPARK-37827 > URL: https://issues.apache.org/jira/browse/SPARK-37827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > At now, we have no built-in table properties in V1Table.propertie, So we > could not get the correct result for some V2 command to run for V1Table. > eg: `SHOW CREATE TABLE`(V2), we get the provider,location,comment,options > from the table.properties. but now we have nothing in table.proerties. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37827) Put the some built-in table properties into V1Table.propertie to adapt to V2 command
[ https://issues.apache.org/jira/browse/SPARK-37827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37827: Assignee: Apache Spark > Put the some built-in table properties into V1Table.propertie to adapt to V2 > command > > > Key: SPARK-37827 > URL: https://issues.apache.org/jira/browse/SPARK-37827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Assignee: Apache Spark >Priority: Major > > At now, we have no built-in table properties in V1Table.propertie, So we > could not get the correct result for some V2 command to run for V1Table. > eg: `SHOW CREATE TABLE`(V2), we get the provider,location,comment,options > from the table.properties. but now we have nothing in table.proerties. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37827) Put the some built-in table properties into V1Table.propertie to adapt to V2 command
[ https://issues.apache.org/jira/browse/SPARK-37827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470461#comment-17470461 ] Apache Spark commented on SPARK-37827: -- User 'Peng-Lei' has created a pull request for this issue: https://github.com/apache/spark/pull/35131 > Put the some built-in table properties into V1Table.propertie to adapt to V2 > command > > > Key: SPARK-37827 > URL: https://issues.apache.org/jira/browse/SPARK-37827 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.3.0 >Reporter: PengLei >Priority: Major > > At now, we have no built-in table properties in V1Table.propertie, So we > could not get the correct result for some V2 command to run for V1Table. > eg: `SHOW CREATE TABLE`(V2), we get the provider,location,comment,options > from the table.properties. but now we have nothing in table.proerties. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37839: Assignee: (was: Apache Spark) > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, DS V2 supports complete aggregate push-down AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-37839: Assignee: Apache Spark > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Assignee: Apache Spark >Priority: Major > > Currently, DS V2 supports complete aggregate push-down AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470450#comment-17470450 ] Apache Spark commented on SPARK-37839: -- User 'beliefer' has created a pull request for this issue: https://github.com/apache/spark/pull/35130 > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, DS V2 supports complete aggregate push-down AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-37840) Dynamically update the loaded Spark UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470446#comment-17470446 ] melin commented on SPARK-37840: --- [~cloud_fan] > Dynamically update the loaded Spark UDF JAR > --- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37840) Dynamically update the loaded Hive UDF JAR
[ https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] melin updated SPARK-37840: -- Summary: Dynamically update the loaded Hive UDF JAR (was: Dynamically update the loaded Spark UDF JAR) > Dynamically update the loaded Hive UDF JAR > -- > > Key: SPARK-37840 > URL: https://issues.apache.org/jira/browse/SPARK-37840 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.3.0 >Reporter: melin >Priority: Major > > In the production environment, spark ThriftServer needs to be restarted if > jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37840) Dynamically update the loaded Spark UDF JAR
melin created SPARK-37840: - Summary: Dynamically update the loaded Spark UDF JAR Key: SPARK-37840 URL: https://issues.apache.org/jira/browse/SPARK-37840 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 3.3.0 Reporter: melin In the production environment, spark ThriftServer needs to be restarted if jar files are updated after UDF files are loaded。 -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37839: --- Description: Currently, DS V2 supports complete aggregate push-down AVG. But, supports partial aggregate push-down for AVG is very useful. (was: Currently, DS V2 supports complete aggregate push-down for AVG. But, supports partial aggregate push-down for AVG is very useful.) > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, DS V2 supports complete aggregate push-down AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG
[ https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] jiaan.geng updated SPARK-37839: --- Summary: DS V2 supports partial aggregate push-down AVG (was: DS V2 supports partial aggregate push-down for AVG) > DS V2 supports partial aggregate push-down AVG > -- > > Key: SPARK-37839 > URL: https://issues.apache.org/jira/browse/SPARK-37839 > Project: Spark > Issue Type: New Feature > Components: SQL >Affects Versions: 3.3.0 >Reporter: jiaan.geng >Priority: Major > > Currently, DS V2 supports complete aggregate push-down for AVG. But, supports > partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-37839) DS V2 supports partial aggregate push-down for AVG
jiaan.geng created SPARK-37839: -- Summary: DS V2 supports partial aggregate push-down for AVG Key: SPARK-37839 URL: https://issues.apache.org/jira/browse/SPARK-37839 Project: Spark Issue Type: New Feature Components: SQL Affects Versions: 3.3.0 Reporter: jiaan.geng Currently, DS V2 supports complete aggregate push-down for AVG. But, supports partial aggregate push-down for AVG is very useful. -- This message was sent by Atlassian Jira (v8.20.1#820001) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org