date:20220107

[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s

2022-01-07 Thread jingxiong zhong (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471014#comment-17471014
 ] 

jingxiong zhong commented on SPARK-37708:
-

[~hyukjin.kwon]In the end, we found that the operating system was different and 
that python would not run in the image.If we use centos system, it can work 
normally

> pyspark adding third-party Dependencies on k8s
> --
>
> Key: SPARK-37708
> URL: https://issues.apache.org/jira/browse/SPARK-37708
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.2.0
> Environment: pyspark3.2
>Reporter: jingxiong zhong
>Priority: Major
>
> I have a question about that how do I add my Python dependencies to Spark 
> Job, as following
> {code:sh}
> spark-submit \
> --archives s3a://path/python3.6.9.tgz#python3.6.9 \
> --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \
> --conf "spark.pyspark.python=python3.6.9/bin/python3" \
> --name "piroottest" \
> ./examples/src/main/python/pi.py 10
> {code}
> this can't run my job sucessfully，it throw error
> {code:sh}
> Traceback (most recent call last):
>   File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in 
> 
> from pyspark.sql import SparkSession
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in 
> 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, 
> in 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in 
> 
> async def _ag():
>   File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", 
> line 7, in 
> from _ctypes import Union, Structure, Array
> ImportError: libffi.so.6: cannot open shared object file: No such file or 
> directory
> {code}
> Or is there another way to add Python dependencies？



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37708) pyspark adding third-party Dependencies on k8s

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37708:


Assignee: Apache Spark

> pyspark adding third-party Dependencies on k8s
> --
>
> Key: SPARK-37708
> URL: https://issues.apache.org/jira/browse/SPARK-37708
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.2.0
> Environment: pyspark3.2
>Reporter: jingxiong zhong
>Assignee: Apache Spark
>Priority: Major
>
> I have a question about that how do I add my Python dependencies to Spark 
> Job, as following
> {code:sh}
> spark-submit \
> --archives s3a://path/python3.6.9.tgz#python3.6.9 \
> --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \
> --conf "spark.pyspark.python=python3.6.9/bin/python3" \
> --name "piroottest" \
> ./examples/src/main/python/pi.py 10
> {code}
> this can't run my job sucessfully，it throw error
> {code:sh}
> Traceback (most recent call last):
>   File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in 
> 
> from pyspark.sql import SparkSession
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in 
> 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, 
> in 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in 
> 
> async def _ag():
>   File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", 
> line 7, in 
> from _ctypes import Union, Structure, Array
> ImportError: libffi.so.6: cannot open shared object file: No such file or 
> directory
> {code}
> Or is there another way to add Python dependencies？



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37708) pyspark adding third-party Dependencies on k8s

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37708:


Assignee: (was: Apache Spark)

> pyspark adding third-party Dependencies on k8s
> --
>
> Key: SPARK-37708
> URL: https://issues.apache.org/jira/browse/SPARK-37708
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.2.0
> Environment: pyspark3.2
>Reporter: jingxiong zhong
>Priority: Major
>
> I have a question about that how do I add my Python dependencies to Spark 
> Job, as following
> {code:sh}
> spark-submit \
> --archives s3a://path/python3.6.9.tgz#python3.6.9 \
> --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \
> --conf "spark.pyspark.python=python3.6.9/bin/python3" \
> --name "piroottest" \
> ./examples/src/main/python/pi.py 10
> {code}
> this can't run my job sucessfully，it throw error
> {code:sh}
> Traceback (most recent call last):
>   File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in 
> 
> from pyspark.sql import SparkSession
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in 
> 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, 
> in 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in 
> 
> async def _ag():
>   File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", 
> line 7, in 
> from _ctypes import Union, Structure, Array
> ImportError: libffi.so.6: cannot open shared object file: No such file or 
> directory
> {code}
> Or is there another way to add Python dependencies？



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471012#comment-17471012
 ] 

Apache Spark commented on SPARK-37708:
--

User 'zhongjingxiong' has created a pull request for this issue:
https://github.com/apache/spark/pull/35142

> pyspark adding third-party Dependencies on k8s
> --
>
> Key: SPARK-37708
> URL: https://issues.apache.org/jira/browse/SPARK-37708
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.2.0
> Environment: pyspark3.2
>Reporter: jingxiong zhong
>Priority: Major
>
> I have a question about that how do I add my Python dependencies to Spark 
> Job, as following
> {code:sh}
> spark-submit \
> --archives s3a://path/python3.6.9.tgz#python3.6.9 \
> --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \
> --conf "spark.pyspark.python=python3.6.9/bin/python3" \
> --name "piroottest" \
> ./examples/src/main/python/pi.py 10
> {code}
> this can't run my job sucessfully，it throw error
> {code:sh}
> Traceback (most recent call last):
>   File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in 
> 
> from pyspark.sql import SparkSession
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in 
> 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, 
> in 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in 
> 
> async def _ag():
>   File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", 
> line 7, in 
> from _ctypes import Union, Structure, Array
> ImportError: libffi.so.6: cannot open shared object file: No such file or 
> directory
> {code}
> Or is there another way to add Python dependencies？



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37708) pyspark adding third-party Dependencies on k8s

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37708?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17471013#comment-17471013
 ] 

Apache Spark commented on SPARK-37708:
--

User 'zhongjingxiong' has created a pull request for this issue:
https://github.com/apache/spark/pull/35142

> pyspark adding third-party Dependencies on k8s
> --
>
> Key: SPARK-37708
> URL: https://issues.apache.org/jira/browse/SPARK-37708
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 3.2.0
> Environment: pyspark3.2
>Reporter: jingxiong zhong
>Priority: Major
>
> I have a question about that how do I add my Python dependencies to Spark 
> Job, as following
> {code:sh}
> spark-submit \
> --archives s3a://path/python3.6.9.tgz#python3.6.9 \
> --conf "spark.pyspark.driver.python=python3.6.9/bin/python3" \
> --conf "spark.pyspark.python=python3.6.9/bin/python3" \
> --name "piroottest" \
> ./examples/src/main/python/pi.py 10
> {code}
> this can't run my job sucessfully，it throw error
> {code:sh}
> Traceback (most recent call last):
>   File "/tmp/spark-63b77184-6e89-4121-bc32-6a1b793e0c85/pi.py", line 21, in 
> 
> from pyspark.sql import SparkSession
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/__init__.py", line 121, in 
> 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/__init__.py", line 42, 
> in 
>   File "/opt/spark/python/lib/pyspark.zip/pyspark/sql/types.py", line 27, in 
> 
> async def _ag():
>   File "/opt/spark/work-dir/python3.6.9/lib/python3.6/ctypes/__init__.py", 
> line 7, in 
> from _ctypes import Union, Structure, Array
> ImportError: libffi.so.6: cannot open shared object file: No such file or 
> directory
> {code}
> Or is there another way to add Python dependencies？



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37843:
-

Assignee: Dongjoon Hyun  (was: Apache Spark)

> Suppress NoSuchFieldError at setMDCForTask
> --
>
> Key: SPARK-37843
> URL: https://issues.apache.org/jira/browse/SPARK-37843
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>
> {code}
> 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
> launch worker-0" java.lang.NoSuchFieldError: mdc
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.(MDC.java:108)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37843.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35141
[https://github.com/apache/spark/pull/35141]

> Suppress NoSuchFieldError at setMDCForTask
> --
>
> Key: SPARK-37843
> URL: https://issues.apache.org/jira/browse/SPARK-37843
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> {code}
> 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
> launch worker-0" java.lang.NoSuchFieldError: mdc
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.(MDC.java:108)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470963#comment-17470963
 ] 

Apache Spark commented on SPARK-37843:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/35141

> Suppress NoSuchFieldError at setMDCForTask
> --
>
> Key: SPARK-37843
> URL: https://issues.apache.org/jira/browse/SPARK-37843
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
> launch worker-0" java.lang.NoSuchFieldError: mdc
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.(MDC.java:108)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37843:


Assignee: Apache Spark

> Suppress NoSuchFieldError at setMDCForTask
> --
>
> Key: SPARK-37843
> URL: https://issues.apache.org/jira/browse/SPARK-37843
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
> launch worker-0" java.lang.NoSuchFieldError: mdc
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.(MDC.java:108)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37843:


Assignee: Apache Spark

> Suppress NoSuchFieldError at setMDCForTask
> --
>
> Key: SPARK-37843
> URL: https://issues.apache.org/jira/browse/SPARK-37843
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>
> {code}
> 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
> launch worker-0" java.lang.NoSuchFieldError: mdc
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.(MDC.java:108)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37843?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37843:


Assignee: (was: Apache Spark)

> Suppress NoSuchFieldError at setMDCForTask
> --
>
> Key: SPARK-37843
> URL: https://issues.apache.org/jira/browse/SPARK-37843
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>
> {code}
> 00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
> launch worker-0" java.lang.NoSuchFieldError: mdc
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.slf4j.MDC.(MDC.java:108)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
> 00:57:11 2022-01-07 15:57:11.693 - stderr>at 
> java.base/java.lang.Thread.run(Thread.java:833)
> {code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37843) Suppress NoSuchFieldError at setMDCForTask

2022-01-07 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-37843:
-

 Summary: Suppress NoSuchFieldError at setMDCForTask
 Key: SPARK-37843
 URL: https://issues.apache.org/jira/browse/SPARK-37843
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Dongjoon Hyun


{code}
00:57:11 2022-01-07 15:57:11.693 - stderr> Exception in thread "Executor task 
launch worker-0" java.lang.NoSuchFieldError: mdc
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.apache.log4j.MDCFriend.fixForJava9(MDCFriend.java:11)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.slf4j.impl.Log4jMDCAdapter.(Log4jMDCAdapter.java:38)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.slf4j.impl.StaticMDCBinder.getMDCA(StaticMDCBinder.java:59)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.slf4j.MDC.bwCompatibleGetMDCAdapterFromBinder(MDC.java:99)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.slf4j.MDC.(MDC.java:108)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.apache.spark.executor.Executor.org$apache$spark$executor$Executor$$setMDCForTask(Executor.scala:750)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:441)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635)
00:57:11 2022-01-07 15:57:11.693 - stderr>  at 
java.base/java.lang.Thread.run(Thread.java:833)
{code}



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470870#comment-17470870
 ] 

Apache Spark commented on SPARK-37829:
--

User 'cdegroc' has created a pull request for this issue:
https://github.com/apache/spark/pull/35140

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470867#comment-17470867
 ] 

Clément de Groc commented on SPARK-37829:
-

Opened two equivalent PRs: one with a fix and one with a revert.

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Jira



[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470866#comment-17470866
 ] 

Clément de Groc commented on SPARK-37829:
-

CC [~cloud_fan]. Tagging you as you're the original author and you might have 
more context and/or ideas on how to best solve this.

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37771) Race condition in withHiveState and limited logic in IsolatedClientLoader result in ClassNotFoundException

2022-01-07 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37771?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470842#comment-17470842
 ] 

Steve Loughran commented on SPARK-37771:


probably related to HADOOP-17372, which makes sure the hive classloader isn't 
picked up for class lookups in the config

try with hadoop 3.3.1 binaries

> Race condition in withHiveState and limited logic in IsolatedClientLoader 
> result in ClassNotFoundException
> --
>
> Key: SPARK-37771
> URL: https://issues.apache.org/jira/browse/SPARK-37771
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0, 3.1.2, 3.2.0
>Reporter: Ivan Sadikov
>Priority: Major
>
> There is a race condition between creating a Hive client and loading classes 
> that do not appear in shared prefixes config. For example, we confirmed that 
> the code fails for the following configuration:
> {code:java}
> spark.sql.hive.metastore.version 0.13.0
> spark.sql.hive.metastore.jars maven
> spark.sql.hive.metastore.sharedPrefixes  com.amazonaws prefix>
> spark.hadoop.fs.s3a.impl org.apache.hadoop.fs.s3a.S3AFileSystem{code}
> And code: 
> {code:java}
> -- Prerequisite commands to set up the table
> -- drop table if exists ivan_test_2;
> -- create table ivan_test_2 (a int, part string) using csv location 
> 's3://bucket/hive-test' partitioned by (part);
> -- insert into ivan_test_2 values (1, 'a'); 
> -- Command that triggers failure
> ALTER TABLE ivan_test_2 ADD PARTITION (part='b') LOCATION 
> 's3://bucket/hive-test'{code}
>  
> Stacktrace (line numbers might differ):
> {code:java}
> 21/12/22 04:37:05 DEBUG IsolatedClientLoader: shared class: 
> org.apache.hadoop.fs.s3a.TemporaryAWSCredentialsProvider
> 21/12/22 04:37:05 DEBUG IsolatedClientLoader: shared class: 
> org.apache.hadoop.fs.s3a.SimpleAWSCredentialsProvider
> 21/12/22 04:37:05 DEBUG IsolatedClientLoader: hive class: 
> com.amazonaws.auth.EnvironmentVariableCredentialsProvider - null
> 21/12/22 04:37:05 ERROR S3AFileSystem: Failed to initialize S3AFileSystem for 
> path s3://bucket/hive-test
> java.io.IOException: From option fs.s3a.aws.credentials.provider 
> java.lang.ClassNotFoundException: Class 
> com.amazonaws.auth.EnvironmentVariableCredentialsProvider not found
>     at 
> org.apache.hadoop.fs.s3a.S3AUtils.loadAWSProviderClasses(S3AUtils.java:725)
>     at 
> org.apache.hadoop.fs.s3a.S3AUtils.createAWSCredentialProviderSet(S3AUtils.java:688)
>     at 
> org.apache.hadoop.fs.s3a.S3AFileSystem.initialize(S3AFileSystem.java:411)
>     at org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:3469)
>     at org.apache.hadoop.fs.FileSystem.access$300(FileSystem.java:174)
>     at org.apache.hadoop.fs.FileSystem$Cache.getInternal(FileSystem.java:3574)
>     at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:3521)
>     at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:540)
>     at org.apache.hadoop.fs.Path.getFileSystem(Path.java:365)
>     at org.apache.hadoop.hive.metastore.Warehouse.getFs(Warehouse.java:112)
>     at 
> org.apache.hadoop.hive.metastore.Warehouse.getDnsPath(Warehouse.java:144)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createLocationForAddedPartition(HiveMetaStore.java:1993)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_core(HiveMetaStore.java:1865)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.add_partitions_req(HiveMetaStore.java:1910)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingHMSHandler.invoke(RetryingHMSHandler.java:105)
>     at com.sun.proxy.$Proxy58.add_partitions_req(Unknown Source)
>     at 
> org.apache.hadoop.hive.metastore.HiveMetaStoreClient.add_partitions(HiveMetaStoreClient.java:457)
>     at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>     at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
>     at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>     at java.lang.reflect.Method.invoke(Method.java:498)
>     at 
> org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.invoke(RetryingMetaStoreClient.java:89)
>     at com.sun.proxy.$Proxy59.add_partitions(Unknown Source)
>     at 
> org.apache.hadoop.hive.ql.metadata.Hive.createPartitions(Hive.java:1514)
>     at 
> org.apache.spark.sql.hive.client.Shim_v0_13.createPartitions(HiveShim.scala:773)
>

[jira] [Assigned] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37829:


Assignee: Apache Spark

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Assignee: Apache Spark
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37829:


Assignee: (was: Apache Spark)

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470824#comment-17470824
 ] 

Apache Spark commented on SPARK-37829:
--

User 'cdegroc' has created a pull request for this issue:
https://github.com/apache/spark/pull/35139

> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clément de Groc updated SPARK-37829:

Description: 
Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
{{null}} values in Spark 3+.

The issue can be reproduced with [the following 
test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
 that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.

The problem only arises when working with DataFrames: Datasets of case classes 
work as expected as demonstrated by [this other 
test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].

I couldn't find an explanation for this change in the Migration guide so I'm 
assuming this is a bug.

A {{git bisect}} pointed me to [that 
commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].

Reverting the commit solves the problem.

A similar solution,  but without reverting, is shown 
[here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3].

Happy to help if you think of another approach / can provide some guidance.

  was:
Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
{{null}} values in Spark 3+.

The issue can be reproduced with [the following 
test|https://github.com/cdegroc/spark/commit/a499805bc7f40e741bfa9badec2588972e53d604]
 that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.

The problem only arises when working with DataFrames: Datasets of case classes 
work as expected as demonstrated by [this other 
test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].

I couldn't find an explanation for this change in the Migration guide so I'm 
assuming this is a bug.

A {{git bisect}} pointed me to [that 
commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].

Reverting the commit solves the problem.

A similar solution,  but without reverting, is shown 
[here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3].

Happy to help if you think of another approach / can provide some guidance.


> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37829) An outer-join using joinWith on DataFrames returns Rows with null fields instead of null values

2022-01-07 Thread Jira



 [ 
https://issues.apache.org/jira/browse/SPARK-37829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Clément de Groc updated SPARK-37829:

Description: 
Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
{{null}} values in Spark 3+.

The issue can be reproduced with [the following 
test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
 that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.

The problem only arises when working with DataFrames: Datasets of case classes 
work as expected as demonstrated by [this other 
test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].

I couldn't find an explanation for this change in the Migration guide so I'm 
assuming this is a bug.

A {{git bisect}} pointed me to [that 
commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].

Reverting the commit solves the problem.

A similar solution,  but without reverting, is shown 
[here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].

Happy to help if you think of another approach / can provide some guidance.

  was:
Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
{{null}} values in Spark 3+.

The issue can be reproduced with [the following 
test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
 that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.

The problem only arises when working with DataFrames: Datasets of case classes 
work as expected as demonstrated by [this other 
test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].

I couldn't find an explanation for this change in the Migration guide so I'm 
assuming this is a bug.

A {{git bisect}} pointed me to [that 
commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].

Reverting the commit solves the problem.

A similar solution,  but without reverting, is shown 
[here|https://github.com/cdegroc/spark/commit/3838dd5617dae51e8b323b07df7cb36e3e6728c3].

Happy to help if you think of another approach / can provide some guidance.


> An outer-join using joinWith on DataFrames returns Rows with null fields 
> instead of null values
> ---
>
> Key: SPARK-37829
> URL: https://issues.apache.org/jira/browse/SPARK-37829
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0, 3.0.1, 3.0.2, 3.0.3, 3.1.0, 3.1.1, 3.1.2, 3.2.0
>Reporter: Clément de Groc
>Priority: Major
>
> Doing an outer-join using {{joinWith}} on {{{}DataFrame{}}}s used to return 
> missing values as {{null}} in Spark 2.4.8, but returns them as {{Rows}} with 
> {{null}} values in Spark 3+.
> The issue can be reproduced with [the following 
> test|https://github.com/cdegroc/spark/commit/79f4d6a1ec6c69b10b72dbc8f92ab6490d5ef5e5]
>  that succeeds on Spark 2.4.8 but fails starting from Spark 3.0.0.
> The problem only arises when working with DataFrames: Datasets of case 
> classes work as expected as demonstrated by [this other 
> test|https://github.com/apache/spark/blob/v3.0.0/sql/core/src/test/scala/org/apache/spark/sql/DatasetSuite.scala#L1200-L1223].
> I couldn't find an explanation for this change in the Migration guide so I'm 
> assuming this is a bug.
> A {{git bisect}} pointed me to [that 
> commit|https://github.com/apache/spark/commit/cd92f25be5a221e0d4618925f7bc9dfd3bb8cb59].
> Reverting the commit solves the problem.
> A similar solution,  but without reverting, is shown 
> [here|https://github.com/cdegroc/spark/commit/684c675bf070876a475a9b225f6c2f92edce4c8a].
> Happy to help if you think of another approach / can provide some guidance.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35703) Relax constraint for Spark bucket join and remove HashClusteredDistribution

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35703?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470796#comment-17470796
 ] 

Apache Spark commented on SPARK-35703:
--

User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/35138

> Relax constraint for Spark bucket join and remove HashClusteredDistribution
> ---
>
> Key: SPARK-35703
> URL: https://issues.apache.org/jira/browse/SPARK-35703
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Chao Sun
>Assignee: Chao Sun
>Priority: Major
> Fix For: 3.3.0
>
>
> Currently Spark has {{HashClusteredDistribution}} and 
> {{ClusteredDistribution}}. The only difference between the two is that the 
> former is more strict when deciding whether bucket join is allowed to avoid 
> shuffle: comparing to the latter, it requires *exact* match between the 
> clustering keys from the output partitioning (i.e., {{HashPartitioning}}) and 
> the join keys. However, this is unnecessary, as we should be able to avoid 
> shuffle when the set of clustering keys is a subset of join keys, just like 
> {{ClusteredDistribution}}. 



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37840) Dynamically update the loaded Hive UDF JAR

2022-01-07 Thread Rakesh Raushan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470793#comment-17470793
 ] 

Rakesh Raushan commented on SPARK-37840:


We can dynamically update our UDF jars after loading them. I will try to raise 
a PR soon for this.

> Dynamically update the loaded Hive UDF JAR
> --
>
> Key: SPARK-37840
> URL: https://issues.apache.org/jira/browse/SPARK-37840
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> In the production environment, spark ThriftServer needs to be restarted if 
> jar files are updated after UDF files are loaded。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37417) Inline type hints for python/pyspark/ml/linalg/init.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37417?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470783#comment-17470783
 ] 

Maciej Szymkiewicz commented on SPARK-37417:


{{ml}} and {{mllib}} linalg types should be consistent, so let's wait with this 
one until the other one is resolved.

> Inline type hints for python/pyspark/ml/linalg/__init__.py
> --
>
> Key: SPARK-37417
> URL: https://issues.apache.org/jira/browse/SPARK-37417
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/linalg/__init__.pyi to 
> python/pyspark/ml/linalg/__init__.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37415) Inline type hints for python/pyspark/ml/util.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37415?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470768#comment-17470768
 ] 

Maciej Szymkiewicz commented on SPARK-37415:


I am going to handle this one, once prerequisites are resolved.

> Inline type hints for python/pyspark/ml/util.py
> ---
>
> Key: SPARK-37415
> URL: https://issues.apache.org/jira/browse/SPARK-37415
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/util.pyi to 
> python/pyspark/ml/util.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37416) Inline type hints for python/pyspark/ml/wrapper.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37416?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470767#comment-17470767
 ] 

Maciej Szymkiewicz commented on SPARK-37416:


I am going to handle this one, once prerequisites are resolved.

> Inline type hints for python/pyspark/ml/wrapper.py
> --
>
> Key: SPARK-37416
> URL: https://issues.apache.org/jira/browse/SPARK-37416
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/wrapper.pyi to 
> python/pyspark/ml/wrapper.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37837) Enable black formatter in dev Python scripts

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37837.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35127
[https://github.com/apache/spark/pull/35127]

> Enable black formatter in dev Python scripts
> 
>
> Key: SPARK-37837
> URL: https://issues.apache.org/jira/browse/SPARK-37837
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> black formatter is only enabled for python/pyspark to minimize side-effects 
> e.g., reformating auto generated or thrid-party Python scripts.
> This JIRA aims to enable black formatter in dev directory where there's no 
> generated Python scripts to exclude.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37837) Enable black formatter in dev Python scripts

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37837?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37837:
-

Assignee: Hyukjin Kwon

> Enable black formatter in dev Python scripts
> 
>
> Key: SPARK-37837
> URL: https://issues.apache.org/jira/browse/SPARK-37837
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> black formatter is only enabled for python/pyspark to minimize side-effects 
> e.g., reformating auto generated or thrid-party Python scripts.
> This JIRA aims to enable black formatter in dev directory where there's no 
> generated Python scripts to exclude.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37419) Inline type hints for python/pyspark/ml/param/shared.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz resolved SPARK-37419.

Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 34674
[https://github.com/apache/spark/pull/34674]

> Inline type hints for python/pyspark/ml/param/shared.py
> ---
>
> Key: SPARK-37419
> URL: https://issues.apache.org/jira/browse/SPARK-37419
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.3.0
>
>
> Inline type hints from python/pyspark/ml/param/shared.pyi to 
> python/pyspark/ml/param/shared.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37419) Inline type hints for python/pyspark/ml/param/shared.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37419:
--

Assignee: Maciej Szymkiewicz

> Inline type hints for python/pyspark/ml/param/shared.py
> ---
>
> Key: SPARK-37419
> URL: https://issues.apache.org/jira/browse/SPARK-37419
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/param/shared.pyi to 
> python/pyspark/ml/param/shared.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37836) Enable more flake8 rules for PEP 8 compliance

2022-01-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-37836.
--
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35126
[https://github.com/apache/spark/pull/35126]

> Enable more flake8 rules for PEP 8 compliance
> -
>
> Key: SPARK-37836
> URL: https://issues.apache.org/jira/browse/SPARK-37836
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.3.0
>
>
> Most of disabled linter rules here:
> https://github.com/apache/spark/blob/master/dev/tox.ini#L19-L31
> should be enabled to comply PEP8.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37836) Enable more flake8 rules for PEP 8 compliance

2022-01-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37836?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-37836:


Assignee: Hyukjin Kwon

> Enable more flake8 rules for PEP 8 compliance
> -
>
> Key: SPARK-37836
> URL: https://issues.apache.org/jira/browse/SPARK-37836
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, PySpark
>Affects Versions: 3.3.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Most of disabled linter rules here:
> https://github.com/apache/spark/blob/master/dev/tox.ini#L19-L31
> should be enabled to comply PEP8.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37397) Inline type hints for python/pyspark/ml/base.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37397?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470654#comment-17470654
 ] 

Maciej Szymkiewicz commented on SPARK-37397:


I'll handle this once SPARK-37418 prerequisites are resolved.

> Inline type hints for python/pyspark/ml/base.py
> ---
>
> Key: SPARK-37397
> URL: https://issues.apache.org/jira/browse/SPARK-37397
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/base.pyi to 
> python/pyspark/ml/base.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37397) Inline type hints for python/pyspark/ml/base.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37397:
--

Assignee: (was: Maciej Szymkiewicz)

> Inline type hints for python/pyspark/ml/base.py
> ---
>
> Key: SPARK-37397
> URL: https://issues.apache.org/jira/browse/SPARK-37397
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/base.pyi to 
> python/pyspark/ml/base.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37418) Inline type hints for python/pyspark/ml/param/init.py

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37418:


Assignee: (was: Apache Spark)

> Inline type hints for python/pyspark/ml/param/__init__.py
> -
>
> Key: SPARK-37418
> URL: https://issues.apache.org/jira/browse/SPARK-37418
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/param/__init__.pyi to 
> python/pyspark/ml/param/__init__.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37418) Inline type hints for python/pyspark/ml/param/init.py

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37418?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470651#comment-17470651
 ] 

Apache Spark commented on SPARK-37418:
--

User 'zero323' has created a pull request for this issue:
https://github.com/apache/spark/pull/35136

> Inline type hints for python/pyspark/ml/param/__init__.py
> -
>
> Key: SPARK-37418
> URL: https://issues.apache.org/jira/browse/SPARK-37418
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/param/__init__.pyi to 
> python/pyspark/ml/param/__init__.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37397) Inline type hints for python/pyspark/ml/base.py

2022-01-07 Thread Maciej Szymkiewicz (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37397?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Maciej Szymkiewicz reassigned SPARK-37397:
--

Assignee: Maciej Szymkiewicz

> Inline type hints for python/pyspark/ml/base.py
> ---
>
> Key: SPARK-37397
> URL: https://issues.apache.org/jira/browse/SPARK-37397
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Major
>
> Inline type hints from python/pyspark/ml/base.pyi to 
> python/pyspark/ml/base.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37418) Inline type hints for python/pyspark/ml/param/init.py

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37418:


Assignee: Apache Spark

> Inline type hints for python/pyspark/ml/param/__init__.py
> -
>
> Key: SPARK-37418
> URL: https://issues.apache.org/jira/browse/SPARK-37418
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.3.0
>Reporter: Maciej Szymkiewicz
>Assignee: Apache Spark
>Priority: Major
>
> Inline type hints from python/pyspark/ml/param/__init__.pyi to 
> python/pyspark/ml/param/__init__.py.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37193) DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37193:


Assignee: Apache Spark  (was: Eugene Koifman)

> DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer 
> joins
> --
>
> Key: SPARK-37193
> URL: https://issues.apache.org/jira/browse/SPARK-37193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Eugene Koifman
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.3.0
>
>
> {{DynamicJoinSelection.shouldDemoteBroadcastHashJoin}} will prevent AQE from 
> converting Sort merge join into a broadcast join because SMJ is faster when 
> the side that would be broadcast has a lot of empty partitions.
>  This makes sense for inner joins which can short circuit if one side is 
> empty.
>  For (left,right) outer join, the streaming side still has to be processed so 
> demoting broadcast join doesn't have the same advantage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37193) DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37193:


Assignee: Eugene Koifman  (was: Apache Spark)

> DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer 
> joins
> --
>
> Key: SPARK-37193
> URL: https://issues.apache.org/jira/browse/SPARK-37193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.3.0
>
>
> {{DynamicJoinSelection.shouldDemoteBroadcastHashJoin}} will prevent AQE from 
> converting Sort merge join into a broadcast join because SMJ is faster when 
> the side that would be broadcast has a lot of empty partitions.
>  This makes sense for inner joins which can short circuit if one side is 
> empty.
>  For (left,right) outer join, the streaming side still has to be processed so 
> demoting broadcast join doesn't have the same advantage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-37193) DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer joins

2022-01-07 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37193?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reopened SPARK-37193:
-

> DynamicJoinSelection.shouldDemoteBroadcastHashJoin should not apply to outer 
> joins
> --
>
> Key: SPARK-37193
> URL: https://issues.apache.org/jira/browse/SPARK-37193
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.1
>Reporter: Eugene Koifman
>Assignee: Eugene Koifman
>Priority: Major
> Fix For: 3.3.0
>
>
> {{DynamicJoinSelection.shouldDemoteBroadcastHashJoin}} will prevent AQE from 
> converting Sort merge join into a broadcast join because SMJ is faster when 
> the side that would be broadcast has a lot of empty partitions.
>  This makes sense for inner joins which can short circuit if one side is 
> empty.
>  For (left,right) outer join, the streaming side still has to be processed so 
> demoting broadcast join doesn't have the same advantage.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35442) Support propagate empty relation through aggregate

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470581#comment-17470581
 ] 

Apache Spark commented on SPARK-35442:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/35135

> Support propagate empty relation through aggregate
> --
>
> Key: SPARK-35442
> URL: https://issues.apache.org/jira/browse/SPARK-35442
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Minor
>
> The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
> like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this 
> case specially.
> Logically, if the Aggregate grouping expression is not empty, we can 
> eliminate it safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35442) Support propagate empty relation through aggregate

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35442:


Assignee: Apache Spark

> Support propagate empty relation through aggregate
> --
>
> Key: SPARK-35442
> URL: https://issues.apache.org/jira/browse/SPARK-35442
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Minor
>
> The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
> like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this 
> case specially.
> Logically, if the Aggregate grouping expression is not empty, we can 
> eliminate it safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-35442) Support propagate empty relation through aggregate

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-35442:


Assignee: (was: Apache Spark)

> Support propagate empty relation through aggregate
> --
>
> Key: SPARK-35442
> URL: https://issues.apache.org/jira/browse/SPARK-35442
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Minor
>
> The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
> like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this 
> case specially.
> Logically, if the Aggregate grouping expression is not empty, we can 
> eliminate it safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-35442) Support propagate empty relation through aggregate

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470580#comment-17470580
 ] 

Apache Spark commented on SPARK-35442:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/35135

> Support propagate empty relation through aggregate
> --
>
> Key: SPARK-35442
> URL: https://issues.apache.org/jira/browse/SPARK-35442
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Minor
>
> The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
> like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this 
> case specially.
> Logically, if the Aggregate grouping expression is not empty, we can 
> eliminate it safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs

2022-01-07 Thread Hyukjin Kwon (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470508#comment-17470508
 ] 

Hyukjin Kwon commented on SPARK-37833:
--

Reverted at 
https://github.com/apache/spark/commit/213c299cc615afafc1b8b244aa84ddefc99bd614 
and 
https://github.com/apache/spark/commit/11950d02b7a51552a824119f7764c2fede9c4c0d

> Add `precondition` job for skip the main GitHub Action jobs
> ---
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs

2022-01-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-37833:
-
Fix Version/s: (was: 3.3.0)

> Add `precondition` job for skip the main GitHub Action jobs
> ---
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Reopened] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs

2022-01-07 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-37833:
--
  Assignee: (was: Dongjoon Hyun)

> Add `precondition` job for skip the main GitHub Action jobs
> ---
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35442) Support propagate empty relation through aggregate

2022-01-07 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-35442:
--
Summary: Support propagate empty relation through aggregate  (was: 
Eliminate unnecessary join through Aggregate)

> Support propagate empty relation through aggregate
> --
>
> Key: SPARK-35442
> URL: https://issues.apache.org/jira/browse/SPARK-35442
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Minor
>
> The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
> like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this 
> case specially.
> Logically, if the Aggregate grouping expression is not empty, we can 
> eliminate it safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-35442) Eliminate unnecessary join through Aggregate

2022-01-07 Thread XiDuo You (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-35442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

XiDuo You updated SPARK-35442:
--
Description: 
The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this case 
specially.


Logically, if the Aggregate grouping expression is not empty, we can eliminate 
it safely.


  was:
If Aggregate and Join have the same output partitioning, the plan will look 
like:
{code:java}
 SortMergeJoin
   Sort
 HashAggregate
   Shuffle
   Sort
 xxx{code}
 

Currently `EliminateUnnecessaryJoin` doesn't support optimize this case. 
Logically, if the Aggregate grouping expression is not empty, we can eliminate 
it safely.

 


> Eliminate unnecessary join through Aggregate
> 
>
> Key: SPARK-35442
> URL: https://issues.apache.org/jira/browse/SPARK-35442
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: XiDuo You
>Priority: Minor
>
> The Aggregate in AQE is different with others, the `LogicalQueryStage` looks 
> like `LogicalQueryStage(Aggregate, BaseAggregate)`. We should handle this 
> case specially.
> Logically, if the Aggregate grouping expression is not empty, we can 
> eliminate it safely.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470496#comment-17470496
 ] 

Apache Spark commented on SPARK-37833:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/35133

> Add `precondition` job for skip the main GitHub Action jobs
> ---
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470495#comment-17470495
 ] 

Apache Spark commented on SPARK-37833:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/35133

> Add `precondition` job for skip the main GitHub Action jobs
> ---
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37842:


Assignee: Apache Spark

> Use `multi-catch` to simplify duplicate exception handling behavior in Java 
> code
> 
>
> Key: SPARK-37842
> URL: https://issues.apache.org/jira/browse/SPARK-37842
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37842?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470494#comment-17470494
 ] 

Apache Spark commented on SPARK-37842:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/35134

> Use `multi-catch` to simplify duplicate exception handling behavior in Java 
> code
> 
>
> Key: SPARK-37842
> URL: https://issues.apache.org/jira/browse/SPARK-37842
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37842?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37842:


Assignee: (was: Apache Spark)

> Use `multi-catch` to simplify duplicate exception handling behavior in Java 
> code
> 
>
> Key: SPARK-37842
> URL: https://issues.apache.org/jira/browse/SPARK-37842
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37842) Use `multi-catch` to simplify duplicate exception handling behavior in Java code

2022-01-07 Thread Yang Jie (Jira)

Yang Jie created SPARK-37842:


 Summary: Use `multi-catch` to simplify duplicate exception 
handling behavior in Java code
 Key: SPARK-37842
 URL: https://issues.apache.org/jira/browse/SPARK-37842
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37841:


Assignee: Apache Spark

> BasicWriteTaskStatsTracker should not try get status for a skipped file
> ---
>
> Key: SPARK-37841
> URL: https://issues.apache.org/jira/browse/SPARK-37841
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/apache/spark/pull/35117#issuecomment-1007171965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37841:


Assignee: (was: Apache Spark)

> BasicWriteTaskStatsTracker should not try get status for a skipped file
> ---
>
> Key: SPARK-37841
> URL: https://issues.apache.org/jira/browse/SPARK-37841
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Priority: Major
>
> https://github.com/apache/spark/pull/35117#issuecomment-1007171965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470485#comment-17470485
 ] 

Apache Spark commented on SPARK-37841:
--

User 'yaooqinn' has created a pull request for this issue:
https://github.com/apache/spark/pull/35132

> BasicWriteTaskStatsTracker should not try get status for a skipped file
> ---
>
> Key: SPARK-37841
> URL: https://issues.apache.org/jira/browse/SPARK-37841
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0
>Reporter: Kent Yao
>Priority: Major
>
> https://github.com/apache/spark/pull/35117#issuecomment-1007171965



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37841) BasicWriteTaskStatsTracker should not try get status for a skipped file

2022-01-07 Thread Kent Yao (Jira)

Kent Yao created SPARK-37841:


 Summary: BasicWriteTaskStatsTracker should not try get status for 
a skipped file
 Key: SPARK-37841
 URL: https://issues.apache.org/jira/browse/SPARK-37841
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.2.0, 3.3.0
Reporter: Kent Yao


https://github.com/apache/spark/pull/35117#issuecomment-1007171965





--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-37833) Add `precondition` jobs for skip the main GitHub Action jobs

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-37833.
---
Fix Version/s: 3.3.0
   Resolution: Fixed

Issue resolved by pull request 35121
[https://github.com/apache/spark/pull/35121]

> Add `precondition` jobs for skip the main GitHub Action jobs
> 
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37833) Add `precondition` job for skip the main GitHub Action jobs

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-37833:
--
Summary: Add `precondition` job for skip the main GitHub Action jobs  (was: 
Add `precondition` jobs for skip the main GitHub Action jobs)

> Add `precondition` job for skip the main GitHub Action jobs
> ---
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.3.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37833) Add `precondition` jobs for skip the main GitHub Action jobs

2022-01-07 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37833?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-37833:
-

Assignee: Dongjoon Hyun

> Add `precondition` jobs for skip the main GitHub Action jobs
> 
>
> Key: SPARK-37833
> URL: https://issues.apache.org/jira/browse/SPARK-37833
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra, Tests
>Affects Versions: 3.3.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37827) Put the some built-in table properties into V1Table.propertie to adapt to V2 command

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37827:


Assignee: (was: Apache Spark)

> Put the some built-in table properties into V1Table.propertie to adapt to V2 
> command
> 
>
> Key: SPARK-37827
> URL: https://issues.apache.org/jira/browse/SPARK-37827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
>
> At now, we have no built-in table properties in V1Table.propertie, So we 
> could not get the correct result for some V2 command to run for V1Table.
> eg: `SHOW CREATE TABLE`(V2), we get the provider,location,comment,options 
> from the table.properties. but now we have nothing in table.proerties.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37827) Put the some built-in table properties into V1Table.propertie to adapt to V2 command

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37827?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37827:


Assignee: Apache Spark

> Put the some built-in table properties into V1Table.propertie to adapt to V2 
> command
> 
>
> Key: SPARK-37827
> URL: https://issues.apache.org/jira/browse/SPARK-37827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Assignee: Apache Spark
>Priority: Major
>
> At now, we have no built-in table properties in V1Table.propertie, So we 
> could not get the correct result for some V2 command to run for V1Table.
> eg: `SHOW CREATE TABLE`(V2), we get the provider,location,comment,options 
> from the table.properties. but now we have nothing in table.proerties.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37827) Put the some built-in table properties into V1Table.propertie to adapt to V2 command

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37827?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470461#comment-17470461
 ] 

Apache Spark commented on SPARK-37827:
--

User 'Peng-Lei' has created a pull request for this issue:
https://github.com/apache/spark/pull/35131

> Put the some built-in table properties into V1Table.propertie to adapt to V2 
> command
> 
>
> Key: SPARK-37827
> URL: https://issues.apache.org/jira/browse/SPARK-37827
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: PengLei
>Priority: Major
>
> At now, we have no built-in table properties in V1Table.propertie, So we 
> could not get the correct result for some V2 command to run for V1Table.
> eg: `SHOW CREATE TABLE`(V2), we get the provider,location,comment,options 
> from the table.properties. but now we have nothing in table.proerties.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37839:


Assignee: (was: Apache Spark)

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, DS V2 supports complete aggregate push-down AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-01-07 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-37839:


Assignee: Apache Spark

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>
> Currently, DS V2 supports complete aggregate push-down AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-01-07 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470450#comment-17470450
 ] 

Apache Spark commented on SPARK-37839:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/35130

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, DS V2 supports complete aggregate push-down AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-37840) Dynamically update the loaded Spark UDF JAR

2022-01-07 Thread melin (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17470446#comment-17470446
 ] 

melin commented on SPARK-37840:
---

[~cloud_fan] 

> Dynamically update the loaded Spark UDF JAR
> ---
>
> Key: SPARK-37840
> URL: https://issues.apache.org/jira/browse/SPARK-37840
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> In the production environment, spark ThriftServer needs to be restarted if 
> jar files are updated after UDF files are loaded。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37840) Dynamically update the loaded Hive UDF JAR

2022-01-07 Thread melin (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37840?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

melin updated SPARK-37840:
--
Summary: Dynamically update the loaded Hive UDF JAR  (was: Dynamically 
update the loaded Spark UDF JAR)

> Dynamically update the loaded Hive UDF JAR
> --
>
> Key: SPARK-37840
> URL: https://issues.apache.org/jira/browse/SPARK-37840
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: melin
>Priority: Major
>
> In the production environment, spark ThriftServer needs to be restarted if 
> jar files are updated after UDF files are loaded。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37840) Dynamically update the loaded Spark UDF JAR

2022-01-07 Thread melin (Jira)

melin created SPARK-37840:
-

 Summary: Dynamically update the loaded Spark UDF JAR
 Key: SPARK-37840
 URL: https://issues.apache.org/jira/browse/SPARK-37840
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.3.0
Reporter: melin


In the production environment, spark ThriftServer needs to be restarted if jar 
files are updated after UDF files are loaded。



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-01-07 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37839:
---
Description: Currently, DS V2 supports complete aggregate push-down AVG. 
But, supports partial aggregate push-down for AVG is very useful.  (was: 
Currently, DS V2 supports complete aggregate push-down for AVG. But, supports 
partial aggregate push-down for AVG is very useful.)

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, DS V2 supports complete aggregate push-down AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-37839) DS V2 supports partial aggregate push-down AVG

2022-01-07 Thread jiaan.geng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-37839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiaan.geng updated SPARK-37839:
---
Summary: DS V2 supports partial aggregate push-down AVG  (was: DS V2 
supports partial aggregate push-down for AVG)

> DS V2 supports partial aggregate push-down AVG
> --
>
> Key: SPARK-37839
> URL: https://issues.apache.org/jira/browse/SPARK-37839
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: jiaan.geng
>Priority: Major
>
> Currently, DS V2 supports complete aggregate push-down for AVG. But, supports 
> partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-37839) DS V2 supports partial aggregate push-down for AVG

2022-01-07 Thread jiaan.geng (Jira)

jiaan.geng created SPARK-37839:
--

 Summary: DS V2 supports partial aggregate push-down for AVG
 Key: SPARK-37839
 URL: https://issues.apache.org/jira/browse/SPARK-37839
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.3.0
Reporter: jiaan.geng


Currently, DS V2 supports complete aggregate push-down for AVG. But, supports 
partial aggregate push-down for AVG is very useful.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

75 matches

Mail list logo