[jira] [Assigned] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample

2022-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40239:
-

Assignee: Ruifeng Zheng

> Remove duplicated 'fraction' validation in RDD.sample
> -
>
> Key: SPARK-40239
> URL: https://issues.apache.org/jira/browse/SPARK-40239
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40239) Remove duplicated 'fraction' validation in RDD.sample

2022-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40239.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37682
[https://github.com/apache/spark/pull/37682]

> Remove duplicated 'fraction' validation in RDD.sample
> -
>
> Key: SPARK-40239
> URL: https://issues.apache.org/jira/browse/SPARK-40239
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Trivial
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key

2022-08-27 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-40149:

Priority: Blocker  (was: Major)

> Star expansion after outer join asymmetrically includes joining key
> ---
>
> Key: SPARK-40149
> URL: https://issues.apache.org/jira/browse/SPARK-40149
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Otakar Truněček
>Priority: Blocker
>
> When star expansion is used on left side of a join, the result will include 
> joining key, while on the right side of join it doesn't. I would expect the 
> behaviour to be symmetric (either include on both sides or on neither). 
> Example:
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as f
> spark = SparkSession.builder.getOrCreate()
> df_left = spark.range(5).withColumn('val', f.lit('left'))
> df_right = spark.range(3, 7).withColumn('val', f.lit('right'))
> df_merged = (
> df_left
> .alias('left')
> .join(df_right.alias('right'), on='id', how='full_outer')
> .withColumn('left_all', f.struct('left.*'))
> .withColumn('right_all', f.struct('right.*'))
> )
> df_merged.show()
> {code}
> result:
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|   {0, left}|   {null}|
> |  1|left| null|   {1, left}|   {null}|
> |  2|left| null|   {2, left}|   {null}|
> |  3|left|right|   {3, left}|  {right}|
> |  4|left|right|   {4, left}|  {right}|
> |  5|null|right|{null, null}|  {right}|
> |  6|null|right|{null, null}|  {right}|
> +---++-++-+
> {code}
> This behaviour started with release 3.2.0. Previously the key was not 
> included on either side. 
> Result from Spark 3.1.3
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|  {left}|   {null}|
> |  6|null|right|  {null}|  {right}|
> |  5|null|right|  {null}|  {right}|
> |  1|left| null|  {left}|   {null}|
> |  3|left|right|  {left}|  {right}|
> |  2|left| null|  {left}|   {null}|
> |  4|left|right|  {left}|  {right}|
> +---++-++-+ {code}
> I have a gut feeling this is related to these issues:
> https://issues.apache.org/jira/browse/SPARK-39376
> https://issues.apache.org/jira/browse/SPARK-34527
> https://issues.apache.org/jira/browse/SPARK-38603
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40149) Star expansion after outer join asymmetrically includes joining key

2022-08-27 Thread Xiao Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-40149:

Target Version/s: 3.4.0

> Star expansion after outer join asymmetrically includes joining key
> ---
>
> Key: SPARK-40149
> URL: https://issues.apache.org/jira/browse/SPARK-40149
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.2.1, 3.3.0, 3.2.2
>Reporter: Otakar Truněček
>Priority: Blocker
>
> When star expansion is used on left side of a join, the result will include 
> joining key, while on the right side of join it doesn't. I would expect the 
> behaviour to be symmetric (either include on both sides or on neither). 
> Example:
> {code:python}
> from pyspark.sql import SparkSession
> import pyspark.sql.functions as f
> spark = SparkSession.builder.getOrCreate()
> df_left = spark.range(5).withColumn('val', f.lit('left'))
> df_right = spark.range(3, 7).withColumn('val', f.lit('right'))
> df_merged = (
> df_left
> .alias('left')
> .join(df_right.alias('right'), on='id', how='full_outer')
> .withColumn('left_all', f.struct('left.*'))
> .withColumn('right_all', f.struct('right.*'))
> )
> df_merged.show()
> {code}
> result:
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|   {0, left}|   {null}|
> |  1|left| null|   {1, left}|   {null}|
> |  2|left| null|   {2, left}|   {null}|
> |  3|left|right|   {3, left}|  {right}|
> |  4|left|right|   {4, left}|  {right}|
> |  5|null|right|{null, null}|  {right}|
> |  6|null|right|{null, null}|  {right}|
> +---++-++-+
> {code}
> This behaviour started with release 3.2.0. Previously the key was not 
> included on either side. 
> Result from Spark 3.1.3
> {code:java}
> +---++-++-+
> | id| val|  val|left_all|right_all|
> +---++-++-+
> |  0|left| null|  {left}|   {null}|
> |  6|null|right|  {null}|  {right}|
> |  5|null|right|  {null}|  {right}|
> |  1|left| null|  {left}|   {null}|
> |  3|left|right|  {left}|  {right}|
> |  2|left| null|  {left}|   {null}|
> |  4|left|right|  {left}|  {right}|
> +---++-++-+ {code}
> I have a gut feeling this is related to these issues:
> https://issues.apache.org/jira/browse/SPARK-39376
> https://issues.apache.org/jira/browse/SPARK-34527
> https://issues.apache.org/jira/browse/SPARK-38603
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40156) url_decode() exposes a Java error

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40156?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17586140#comment-17586140
 ] 

Apache Spark commented on SPARK-40156:
--

User 'ming95' has created a pull request for this issue:
https://github.com/apache/spark/pull/37695

> url_decode() exposes a Java error
> -
>
> Key: SPARK-40156
> URL: https://issues.apache.org/jira/browse/SPARK-40156
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> Given a badly encode string Spark returns a Java error.
> It should the return an ERROR_CLASS
> spark-sql> SELECT url_decode('http%3A%2F%2spark.apache.org');
> 22/08/20 17:17:20 ERROR SparkSQLDriver: Failed in [SELECT 
> url_decode('http%3A%2F%2spark.apache.org')]
> java.lang.IllegalArgumentException: URLDecoder: Illegal hex characters in 
> escape (%) pattern - Error at index 1 in: "2s"
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:232)
>  at java.base/java.net.URLDecoder.decode(URLDecoder.java:142)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec$.decode(urlExpressions.scala:113)
>  at 
> org.apache.spark.sql.catalyst.expressions.UrlCodec.decode(urlExpressions.scala)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first

2022-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-40240:
-

Assignee: Ruifeng Zheng

> PySpark rdd.takeSample should validate `num > maxSampleSize` at first
> -
>
> Key: SPARK-40240
> URL: https://issues.apache.org/jira/browse/SPARK-40240
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40240) PySpark rdd.takeSample should validate `num > maxSampleSize` at first

2022-08-27 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-40240.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37683
[https://github.com/apache/spark/pull/37683]

> PySpark rdd.takeSample should validate `num > maxSampleSize` at first
> -
>
> Key: SPARK-40240
> URL: https://issues.apache.org/jira/browse/SPARK-40240
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40124) Update TPCDS v1.4 q32 for Plan Stability tests

2022-08-27 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-40124:
--
Fix Version/s: 3.2.3

> Update TPCDS v1.4 q32 for Plan Stability tests
> --
>
> Key: SPARK-40124
> URL: https://issues.apache.org/jira/browse/SPARK-40124
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Kapil Singh
>Assignee: Kapil Singh
>Priority: Major
> Fix For: 3.4.0, 3.3.1, 3.2.3
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40234) Clean only MDC items set by Spark

2022-08-27 Thread L. C. Hsieh (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

L. C. Hsieh resolved SPARK-40234.
-
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 37680
[https://github.com/apache/spark/pull/37680]

> Clean only MDC items set by Spark
> -
>
> Key: SPARK-40234
> URL: https://issues.apache.org/jira/browse/SPARK-40234
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: L. C. Hsieh
>Assignee: L. C. Hsieh
>Priority: Major
> Fix For: 3.4.0
>
>
> Since SPARK-8981, Spark executor adds MDC support. Before setting MDC items, 
> the executor cleans up all MDC items. But it causes an issue for other MDC 
> items not set by Spark but from users at other places. It causes these custom 
> MDC items not shown in executor log.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40246:


Assignee: (was: Apache Spark)

> Logging isn't configurable via log4j2 with hadoop-provided profile
> --
>
> Key: SPARK-40246
> URL: https://issues.apache.org/jira/browse/SPARK-40246
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> When building Spark with -Phadoop-provided (or using the 3.3.0 build without 
> Hadoop), there is no slf implementation provided for log4j2, so the default 
> log4j2 properties are ignored and logging isn't configurable via 
> SparkContext.setLogLevel.
> Reproduction on a fresh Ubuntu container:
>  
> {noformat}
> apt-get update
> apt-get install -y wget
> wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
> wget 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz
> tar -xvf hadoop-3.3.4.tar.gz -C /opt
> tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt
> export HADOOP_HOME=/opt/hadoop-3.3.4/
> export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/
> apt install -y openjdk-11-jre-headless python3
> export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
> export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
> $SPARK_HOME/bin/pyspark
> {noformat}
> The default log level starts at INFO and you can't change it with 
> sc.setLogLevel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17586104#comment-17586104
 ] 

Apache Spark commented on SPARK-40246:
--

User 'Kimahriman' has created a pull request for this issue:
https://github.com/apache/spark/pull/37694

> Logging isn't configurable via log4j2 with hadoop-provided profile
> --
>
> Key: SPARK-40246
> URL: https://issues.apache.org/jira/browse/SPARK-40246
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> When building Spark with -Phadoop-provided (or using the 3.3.0 build without 
> Hadoop), there is no slf implementation provided for log4j2, so the default 
> log4j2 properties are ignored and logging isn't configurable via 
> SparkContext.setLogLevel.
> Reproduction on a fresh Ubuntu container:
>  
> {noformat}
> apt-get update
> apt-get install -y wget
> wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
> wget 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz
> tar -xvf hadoop-3.3.4.tar.gz -C /opt
> tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt
> export HADOOP_HOME=/opt/hadoop-3.3.4/
> export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/
> apt install -y openjdk-11-jre-headless python3
> export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
> export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
> $SPARK_HOME/bin/pyspark
> {noformat}
> The default log level starts at INFO and you can't change it with 
> sc.setLogLevel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40246:


Assignee: Apache Spark

> Logging isn't configurable via log4j2 with hadoop-provided profile
> --
>
> Key: SPARK-40246
> URL: https://issues.apache.org/jira/browse/SPARK-40246
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Assignee: Apache Spark
>Priority: Major
>
> When building Spark with -Phadoop-provided (or using the 3.3.0 build without 
> Hadoop), there is no slf implementation provided for log4j2, so the default 
> log4j2 properties are ignored and logging isn't configurable via 
> SparkContext.setLogLevel.
> Reproduction on a fresh Ubuntu container:
>  
> {noformat}
> apt-get update
> apt-get install -y wget
> wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
> wget 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz
> tar -xvf hadoop-3.3.4.tar.gz -C /opt
> tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt
> export HADOOP_HOME=/opt/hadoop-3.3.4/
> export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/
> apt install -y openjdk-11-jre-headless python3
> export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
> export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
> $SPARK_HOME/bin/pyspark
> {noformat}
> The default log level starts at INFO and you can't change it with 
> sc.setLogLevel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile

2022-08-27 Thread Adam Binford (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40246?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Adam Binford updated SPARK-40246:
-
Component/s: Build
 (was: Spark Core)

> Logging isn't configurable via log4j2 with hadoop-provided profile
> --
>
> Key: SPARK-40246
> URL: https://issues.apache.org/jira/browse/SPARK-40246
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.3.0
>Reporter: Adam Binford
>Priority: Major
>
> When building Spark with -Phadoop-provided (or using the 3.3.0 build without 
> Hadoop), there is no slf implementation provided for log4j2, so the default 
> log4j2 properties are ignored and logging isn't configurable via 
> SparkContext.setLogLevel.
> Reproduction on a fresh Ubuntu container:
>  
> {noformat}
> apt-get update
> apt-get install -y wget
> wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
> wget 
> https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz
> tar -xvf hadoop-3.3.4.tar.gz -C /opt
> tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt
> export HADOOP_HOME=/opt/hadoop-3.3.4/
> export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/
> apt install -y openjdk-11-jre-headless python3
> export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
> export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
> $SPARK_HOME/bin/pyspark
> {noformat}
> The default log level starts at INFO and you can't change it with 
> sc.setLogLevel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40246) Logging isn't configurable via log4j2 with hadoop-provided profile

2022-08-27 Thread Adam Binford (Jira)
Adam Binford created SPARK-40246:


 Summary: Logging isn't configurable via log4j2 with 
hadoop-provided profile
 Key: SPARK-40246
 URL: https://issues.apache.org/jira/browse/SPARK-40246
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.0
Reporter: Adam Binford


When building Spark with -Phadoop-provided (or using the 3.3.0 build without 
Hadoop), there is no slf implementation provided for log4j2, so the default 
log4j2 properties are ignored and logging isn't configurable via 
SparkContext.setLogLevel.

Reproduction on a fresh Ubuntu container:

 
{noformat}
apt-get update
apt-get install -y wget
wget https://dlcdn.apache.org/hadoop/common/hadoop-3.3.4/hadoop-3.3.4.tar.gz
wget 
https://dlcdn.apache.org/spark/spark-3.3.0/spark-3.3.0-bin-without-hadoop.tgz
tar -xvf hadoop-3.3.4.tar.gz -C /opt
tar -xvf spark-3.3.0-bin-without-hadoop.tgz -C /opt
export HADOOP_HOME=/opt/hadoop-3.3.4/
export SPARK_HOME=/opt/spark-3.3.0-bin-without-hadoop/
apt install -y openjdk-11-jre-headless python3
export JAVA_HOME=/usr/lib/jvm/java-11-openjdk-amd64/
export SPARK_DIST_CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath)
$SPARK_HOME/bin/pyspark
{noformat}
The default log level starts at INFO and you can't change it with sc.setLogLevel



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40245:


Assignee: Apache Spark

> Fix FileScan equality check when partition or data filter columns are not read
> --
>
> Key: SPARK-40245
> URL: https://issues.apache.org/jira/browse/SPARK-40245
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40245:


Assignee: (was: Apache Spark)

> Fix FileScan equality check when partition or data filter columns are not read
> --
>
> Key: SPARK-40245
> URL: https://issues.apache.org/jira/browse/SPARK-40245
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17586097#comment-17586097
 ] 

Apache Spark commented on SPARK-40245:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/37693

> Fix FileScan equality check when partition or data filter columns are not read
> --
>
> Key: SPARK-40245
> URL: https://issues.apache.org/jira/browse/SPARK-40245
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40245) Fix FileScan equality check when partition or data filter columns are not read

2022-08-27 Thread Peter Toth (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40245?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-40245:
---
Summary: Fix FileScan equality check when partition or data filter columns 
are not read  (was: Fix FileScan canonicalization when partition or data filter 
columns are not read)

> Fix FileScan equality check when partition or data filter columns are not read
> --
>
> Key: SPARK-40245
> URL: https://issues.apache.org/jira/browse/SPARK-40245
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Peter Toth
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40245) Fix FileScan canonicalization when partition or data filter columns are not read

2022-08-27 Thread Peter Toth (Jira)
Peter Toth created SPARK-40245:
--

 Summary: Fix FileScan canonicalization when partition or data 
filter columns are not read
 Key: SPARK-40245
 URL: https://issues.apache.org/jira/browse/SPARK-40245
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Peter Toth






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40241) Correct the link of GenericUDTF

2022-08-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang resolved SPARK-40241.
-
Fix Version/s: 3.3.1
   3.1.4
   3.2.3
   3.4.0
   Resolution: Fixed

Issue resolved by pull request 37685
[https://github.com/apache/spark/pull/37685]

> Correct the link of GenericUDTF
> ---
>
> Key: SPARK-40241
> URL: https://issues.apache.org/jira/browse/SPARK-40241
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Trivial
> Fix For: 3.3.1, 3.1.4, 3.2.3, 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40241) Correct the link of GenericUDTF

2022-08-27 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang reassigned SPARK-40241:
---

Assignee: Ruifeng Zheng

> Correct the link of GenericUDTF
> ---
>
> Key: SPARK-40241
> URL: https://issues.apache.org/jira/browse/SPARK-40241
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585717#comment-17585717
 ] 

Apache Spark commented on SPARK-40244:
--

User 'mukever' has created a pull request for this issue:
https://github.com/apache/spark/pull/37692

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585716#comment-17585716
 ] 

Apache Spark commented on SPARK-40244:
--

User 'mukever' has created a pull request for this issue:
https://github.com/apache/spark/pull/37692

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585712#comment-17585712
 ] 

Apache Spark commented on SPARK-40244:
--

User 'mukever' has created a pull request for this issue:
https://github.com/apache/spark/pull/37691

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585711#comment-17585711
 ] 

Apache Spark commented on SPARK-40244:
--

User 'mukever' has created a pull request for this issue:
https://github.com/apache/spark/pull/37690

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585710#comment-17585710
 ] 

Apache Spark commented on SPARK-40244:
--

User 'mukever' has created a pull request for this issue:
https://github.com/apache/spark/pull/37690

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40244:


Assignee: (was: Apache Spark)

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40244:


Assignee: Apache Spark

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Assignee: Apache Spark
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585706#comment-17585706
 ] 

Apache Spark commented on SPARK-40244:
--

User 'mukever' has created a pull request for this issue:
https://github.com/apache/spark/pull/37689

> Correct the property name of data source option for csv
> ---
>
> Key: SPARK-40244
> URL: https://issues.apache.org/jira/browse/SPARK-40244
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation
>Affects Versions: 3.3.0
>Reporter: 陈志祥
>Priority: Trivial
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40244) Correct the property name of data source option for csv

2022-08-27 Thread Jira
陈志祥 created SPARK-40244:
---

 Summary: Correct the property name of data source option for csv
 Key: SPARK-40244
 URL: https://issues.apache.org/jira/browse/SPARK-40244
 Project: Spark
  Issue Type: Improvement
  Components: Documentation
Affects Versions: 3.3.0
Reporter: 陈志祥






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40243) Enhance Hive UDF support documentation

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585704#comment-17585704
 ] 

Apache Spark commented on SPARK-40243:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/37688

> Enhance Hive UDF support documentation
> --
>
> Key: SPARK-40243
> URL: https://issues.apache.org/jira/browse/SPARK-40243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40243) Enhance Hive UDF support documentation

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40243:


Assignee: (was: Apache Spark)

> Enhance Hive UDF support documentation
> --
>
> Key: SPARK-40243
> URL: https://issues.apache.org/jira/browse/SPARK-40243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-40243) Enhance Hive UDF support documentation

2022-08-27 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-40243:


Assignee: Apache Spark

> Enhance Hive UDF support documentation
> --
>
> Key: SPARK-40243
> URL: https://issues.apache.org/jira/browse/SPARK-40243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40243) Enhance Hive UDF support documentation

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585703#comment-17585703
 ] 

Apache Spark commented on SPARK-40243:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/37688

> Enhance Hive UDF support documentation
> --
>
> Key: SPARK-40243
> URL: https://issues.apache.org/jira/browse/SPARK-40243
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40039) Introducing a streaming checkpoint file manager based on Hadoop's Abortable interface

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585701#comment-17585701
 ] 

Apache Spark commented on SPARK-40039:
--

User 'attilapiros' has created a pull request for this issue:
https://github.com/apache/spark/pull/37687

> Introducing a streaming checkpoint file manager based on Hadoop's Abortable 
> interface
> -
>
> Key: SPARK-40039
> URL: https://issues.apache.org/jira/browse/SPARK-40039
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Attila Zsolt Piros
>Priority: Major
>
> Currently on S3 the checkpoint file manager (called 
> FileContextBasedCheckpointFileManager) is based on rename. So when a file is 
> opened for an atomic stream a temporary file used instead and when the stream 
> is committed the file is renamed.
> But on S3 a rename will be a file copy. So it has some serious performance 
> implication.
> But on Hadoop 3 there is new interface introduce called *Abortable* and 
> *S3AFileSystem* has this capability which is implemented by on top S3's 
> multipart upload. So when the file is committed a POST is sent 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/API_CompleteMultipartUpload.html])
>  and when aborted a DELETE will be send 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/API_AbortMultipartUpload.html])



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40243) Enhance Hive UDF support documentation

2022-08-27 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-40243:
---

 Summary: Enhance Hive UDF support documentation
 Key: SPARK-40243
 URL: https://issues.apache.org/jira/browse/SPARK-40243
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.4.0
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40242) Only return all physical plans after summitting pyspark script with several spark sql blocks inside

2022-08-27 Thread Liang Fenjie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Fenjie updated SPARK-40242:
-
Description: 
Backgroud:

    In industry development environment, we got used to write several spark-sql 
blocks in one pyspark script. Without really submitting to cluster as running 
application, getting all the physical plans of a application is too hard to do 
within a parameter statement of "spark-submit" command.

 

Wish:

    I wish to add a parameter "–genPlan" as for "spark-submit" command, which 
can return all the physical plans of a application instead of submitting to run 
indeed. Also, other approach to settle the matter is wellcome.

  was:
Backgroud:

    In industry development environment, we got used to write several spark-sql 
blocks in one pyspark script. Without really submitting to cluster as running 
application, getting all the physical plans of a application is too hard to do 
within a parameter statement of "spark-submit" command.

 

Wish:

    I wish to add a parameter "–genPlan" as for "spark-submit" command, which 
can return all the physical plans of a application instead of submitting to run 
indeed. Also, other approach to meet the issue is wellcome.


> Only return all physical plans after summitting pyspark script with several 
> spark sql blocks inside
> ---
>
> Key: SPARK-40242
> URL: https://issues.apache.org/jira/browse/SPARK-40242
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Submit, SQL
>Affects Versions: 2.1.3
>Reporter: Liang Fenjie
>Priority: Major
>
> Backgroud:
>     In industry development environment, we got used to write several 
> spark-sql blocks in one pyspark script. Without really submitting to cluster 
> as running application, getting all the physical plans of a application is 
> too hard to do within a parameter statement of "spark-submit" command.
>  
> Wish:
>     I wish to add a parameter "–genPlan" as for "spark-submit" command, which 
> can return all the physical plans of a application instead of submitting to 
> run indeed. Also, other approach to settle the matter is wellcome.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40242) Only return all physical plans after summitting pyspark script with several spark sql blocks inside

2022-08-27 Thread Liang Fenjie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang Fenjie updated SPARK-40242:
-
Description: 
Backgroud:

    In industry development environment, we got used to write several spark-sql 
blocks in one pyspark script. Without really submitting to cluster as running 
application, getting all the physical plans of a application is too hard to do 
within a parameter statement of "spark-submit" command.

 

Wish:

    I wish to add a parameter "–genPlan" as for "spark-submit" command, which 
can return all the physical plans of a application instead of submitting to run 
indeed. Also, other approach to meet the issue is wellcome.

  was:
Backgroud:

    In industry development environment, we got used to write several spark-sql 
blocks in one pyspark script. Without really submitting to cluster as running 
application, getting all the physical plans of a application is too hard to do 
within a parameter statement of "spark-submit" command.

 

Wish:

I wish to add a parameter "–genPlan" as for "spark-submit" command, which can 
return all the physical plans of a application instead of submitting to run 
indeed.


> Only return all physical plans after summitting pyspark script with several 
> spark sql blocks inside
> ---
>
> Key: SPARK-40242
> URL: https://issues.apache.org/jira/browse/SPARK-40242
> Project: Spark
>  Issue Type: Wish
>  Components: Spark Submit, SQL
>Affects Versions: 2.1.3
>Reporter: Liang Fenjie
>Priority: Major
>
> Backgroud:
>     In industry development environment, we got used to write several 
> spark-sql blocks in one pyspark script. Without really submitting to cluster 
> as running application, getting all the physical plans of a application is 
> too hard to do within a parameter statement of "spark-submit" command.
>  
> Wish:
>     I wish to add a parameter "–genPlan" as for "spark-submit" command, which 
> can return all the physical plans of a application instead of submitting to 
> run indeed. Also, other approach to meet the issue is wellcome.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585680#comment-17585680
 ] 

Apache Spark commented on SPARK-40142:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37686

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40142) Make pyspark.sql.functions examples self-contained

2022-08-27 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40142?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17585681#comment-17585681
 ] 

Apache Spark commented on SPARK-40142:
--

User 'khalidmammadov' has created a pull request for this issue:
https://github.com/apache/spark/pull/37686

> Make pyspark.sql.functions examples self-contained
> --
>
> Key: SPARK-40142
> URL: https://issues.apache.org/jira/browse/SPARK-40142
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-40242) Only return all physical plans after summitting pyspark script with several spark sql blocks inside

2022-08-27 Thread Liang Fenjie (Jira)
Liang Fenjie created SPARK-40242:


 Summary: Only return all physical plans after summitting pyspark 
script with several spark sql blocks inside
 Key: SPARK-40242
 URL: https://issues.apache.org/jira/browse/SPARK-40242
 Project: Spark
  Issue Type: Wish
  Components: Spark Submit, SQL
Affects Versions: 2.1.3
Reporter: Liang Fenjie


Backgroud:

    In industry development environment, we got used to write several spark-sql 
blocks in one pyspark script. Without really submitting to cluster as running 
application, getting all the physical plans of a application is too hard to do 
within a parameter statement of "spark-submit" command.

 

Wish:

I wish to add a parameter "–genPlan" as for "spark-submit" command, which can 
return all the physical plans of a application instead of submitting to run 
indeed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org