[jira] [Assigned] (SPARK-42530) Remove Hadoop 2 from PySpark installation guide

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42530?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42530:


Assignee: Apache Spark

> Remove Hadoop 2 from PySpark installation guide
> ---
>
> Key: SPARK-42530
> URL: https://issues.apache.org/jira/browse/SPARK-42530
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation, PySpark
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-40822) Use stable derived-column-alias algorithm, suitable for CREATE VIEW

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-40822?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692308#comment-17692308
 ] 

Apache Spark commented on SPARK-40822:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40126

> Use stable derived-column-alias algorithm, suitable for CREATE VIEW 
> 
>
> Key: SPARK-40822
> URL: https://issues.apache.org/jira/browse/SPARK-40822
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Priority: Major
>
> Spark has the ability derive column aliases for expressions if no alias was 
> provided by the user.
> E.g.
> CREATE TABLE T(c1 INT, c2 INT);
> SELECT c1, `(c1 + 1)`, c3 FROM (SELECT c1, c1 + 1, c1 * c2 AS c3 FROM T);
> This is a valuable feature. However, the current implementation works by 
> pretty printing the expression from the logical plan.  This has multiple 
> downsides:
>  * The derived names can be unintuitive. For example the brackets in `(c1 + 
> 1)` or outright ugly, such as:
> SELECT `substr(hello, 1, 2147483647)` FROM (SELECT substr('hello', 1)) AS T;
>  * We cannot guarantee stability across versions since the logical lan of an 
> expression may change.
> The later is a major reason why we cannot allow CREATE VIEW without a column 
> list except in "trivial" cases.
> CREATE VIEW v AS SELECT c1, c1 + 1, c1 * c2 AS c3 FROM T;
> Not allowed to create a permanent view `spark_catalog`.`default`.`v` without 
> explicitly assigning an alias for expression (c1 + 1).
> There are two way we can go about fixing this:
>  # Stop deriving column aliases from the expression. Instead generate unique 
> names such as `_col_1` based on their position in the select list. This is 
> ugly and takes away the "nice" headers on result sets
>  # Move the derivation of the name upstream. That is instead of pretty 
> printing the logical plan we pretty print the lexer output, or a sanitized 
> version of the expression as typed.
> The statement as typed is stable by definition. The lexer is stable because i 
> has no reason to change. And if it ever did we have a better chance to manage 
> the change.
> In this feature we propose the following semantic:
>  # If the column alias can be trivially derived (some of these can stack), do 
> so:
>  ** a (qualified) column reference => the unqualified column identifier
> cat.sch.tab.col => col
>  ** A field reference => the fieldname
> struct.field1.field2 => field2
>  ** A cast(column AS type) => column
> cast(col1 AS INT) => col1
>  ** A map lookup with literal key => keyname
> map.key => key
> map['key'] => key
>  ** A parameter less function => unqualified function name
> current_schema() => current_schema
>  # Take the lexer tokens of the expression, eliminate comments, and append 
> them.
> foo(tab1.c1 + /* this is a plus*/
> 1) => `foo(tab1.c1+1)`
>  
> Of course we wan this change under a config.
> If the config is set we can allow CREATE VIEW to exploit this and use the 
> derived expressions.
> PS: The exact mechanics of formatting the name is very much debatable. 
> E.g.spaces between token, squeezing out comments - upper casing - preserving 
> quotes or double quotes...)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42468) Implement agg by (String, String)*

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42468?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692307#comment-17692307
 ] 

Apache Spark commented on SPARK-42468:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40125

> Implement agg by (String, String)*
> --
>
> Key: SPARK-42468
> URL: https://issues.apache.org/jira/browse/SPARK-42468
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-37980) Extend METADATA column to support row indices for file based data sources

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-37980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692170#comment-17692170
 ] 

Apache Spark commented on SPARK-37980:
--

User 'olaky' has created a pull request for this issue:
https://github.com/apache/spark/pull/40124

> Extend METADATA column to support row indices for file based data sources
> -
>
> Key: SPARK-37980
> URL: https://issues.apache.org/jira/browse/SPARK-37980
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.3.0
>Reporter: Prakhar Jain
>Assignee: Ala Luszczak
>Priority: Major
> Fix For: 3.4.0
>
>
> Spark recently added hidden metadata column support for File based 
> datasources as part of  SPARK-37273.
> We should extend it to support ROW_INDEX/ROW_POSITION also.
>  
> Meaning of  ROW_POSITION:
> ROW_INDEX/ROW_POSITION is basically an index of a row within a file. E.g. 5th 
> row in the file will have ROW_INDEX 5.
>  
> Use cases: 
> Row Indexes can be used in a variety of ways. A (fileName, rowIndex) tuple 
> uniquely identifies row in a table. This information can be used to mark rows 
> e.g. this can be used by indexer etc.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42272) Use available ephemeral port for Spark Connect server in testing

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692143#comment-17692143
 ] 

Apache Spark commented on SPARK-42272:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40123

> Use available ephemeral port for Spark Connect server in testing
> 
>
> Key: SPARK-42272
> URL: https://issues.apache.org/jira/browse/SPARK-42272
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently Spark Connect tests cannot run in parallel, and requires to set the 
> parallelism as 1
> {code}
> python/run-tests --module pyspark-connect --parallelism 1
> {code}
> The main reason is because of the port being used is hardcorded as the 
> default 15002. We should better search available port, and use it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42272) Use available ephemeral port for Spark Connect server in testing

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692145#comment-17692145
 ] 

Apache Spark commented on SPARK-42272:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40123

> Use available ephemeral port for Spark Connect server in testing
> 
>
> Key: SPARK-42272
> URL: https://issues.apache.org/jira/browse/SPARK-42272
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Tests
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently Spark Connect tests cannot run in parallel, and requires to set the 
> parallelism as 1
> {code}
> python/run-tests --module pyspark-connect --parallelism 1
> {code}
> The main reason is because of the port being used is hardcorded as the 
> default 15002. We should better search available port, and use it



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42349) Support pandas cogroup with multiple df

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692129#comment-17692129
 ] 

Apache Spark commented on SPARK-42349:
--

User 'santosh-d3vpl3x' has created a pull request for this issue:
https://github.com/apache/spark/pull/40122

> Support pandas cogroup with multiple df
> ---
>
> Key: SPARK-42349
> URL: https://issues.apache.org/jira/browse/SPARK-42349
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Santosh Pingale
>Priority: Trivial
>
> Currently pyspark support `cogroup.applyInPandas` with only 2 dataframes. The 
> improvement request is to support multiple dataframes with variable arity. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42349) Support pandas cogroup with multiple df

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42349?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692128#comment-17692128
 ] 

Apache Spark commented on SPARK-42349:
--

User 'santosh-d3vpl3x' has created a pull request for this issue:
https://github.com/apache/spark/pull/40122

> Support pandas cogroup with multiple df
> ---
>
> Key: SPARK-42349
> URL: https://issues.apache.org/jira/browse/SPARK-42349
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Santosh Pingale
>Priority: Trivial
>
> Currently pyspark support `cogroup.applyInPandas` with only 2 dataframes. The 
> improvement request is to support multiple dataframes with variable arity. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42528) Optimize PercentileHeap

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42528:


Assignee: Apache Spark  (was: Alkis Evlogimenos)

> Optimize PercentileHeap
> ---
>
> Key: SPARK-42528
> URL: https://issues.apache.org/jira/browse/SPARK-42528
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Alkis Evlogimenos
>Assignee: Apache Spark
>Priority: Major
> Fix For: 3.4.0
>
>
> It is not fast enough when used inside the scheduler for estimations which 
> slows down scheduling rate and as a result query execution time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42528) Optimize PercentileHeap

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42528?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42528:


Assignee: Alkis Evlogimenos  (was: Apache Spark)

> Optimize PercentileHeap
> ---
>
> Key: SPARK-42528
> URL: https://issues.apache.org/jira/browse/SPARK-42528
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Alkis Evlogimenos
>Assignee: Alkis Evlogimenos
>Priority: Major
> Fix For: 3.4.0
>
>
> It is not fast enough when used inside the scheduler for estimations which 
> slows down scheduling rate and as a result query execution time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42528) Optimize PercentileHeap

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692109#comment-17692109
 ] 

Apache Spark commented on SPARK-42528:
--

User 'alkis' has created a pull request for this issue:
https://github.com/apache/spark/pull/40121

> Optimize PercentileHeap
> ---
>
> Key: SPARK-42528
> URL: https://issues.apache.org/jira/browse/SPARK-42528
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Alkis Evlogimenos
>Assignee: Alkis Evlogimenos
>Priority: Major
> Fix For: 3.4.0
>
>
> It is not fast enough when used inside the scheduler for estimations which 
> slows down scheduling rate and as a result query execution time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42527) Scala Client add Window functions

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42527:


Assignee: (was: Apache Spark)

> Scala Client add Window functions
> -
>
> Key: SPARK-42527
> URL: https://issues.apache.org/jira/browse/SPARK-42527
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42527) Scala Client add Window functions

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42527?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42527:


Assignee: Apache Spark

> Scala Client add Window functions
> -
>
> Key: SPARK-42527
> URL: https://issues.apache.org/jira/browse/SPARK-42527
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42527) Scala Client add Window functions

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42527?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692026#comment-17692026
 ] 

Apache Spark commented on SPARK-42527:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40120

> Scala Client add Window functions
> -
>
> Key: SPARK-42527
> URL: https://issues.apache.org/jira/browse/SPARK-42527
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42526) Add Classifier.getNumClasses back

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42526:


Assignee: (was: Apache Spark)

> Add Classifier.getNumClasses back
> -
>
> Key: SPARK-42526
> URL: https://issues.apache.org/jira/browse/SPARK-42526
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42526) Add Classifier.getNumClasses back

2023-02-22 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42526?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42526:


Assignee: Apache Spark

> Add Classifier.getNumClasses back
> -
>
> Key: SPARK-42526
> URL: https://issues.apache.org/jira/browse/SPARK-42526
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42526) Add Classifier.getNumClasses back

2023-02-22 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692018#comment-17692018
 ] 

Apache Spark commented on SPARK-42526:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40119

> Add Classifier.getNumClasses back
> -
>
> Key: SPARK-42526
> URL: https://issues.apache.org/jira/browse/SPARK-42526
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691973#comment-17691973
 ] 

Apache Spark commented on SPARK-26365:
--

User 'zwangsheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40118

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691972#comment-17691972
 ] 

Apache Spark commented on SPARK-26365:
--

User 'zwangsheng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40118

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26365:


Assignee: (was: Apache Spark)

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-26365) spark-submit for k8s cluster doesn't propagate exit code

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-26365:


Assignee: Apache Spark

> spark-submit for k8s cluster doesn't propagate exit code
> 
>
> Key: SPARK-26365
> URL: https://issues.apache.org/jira/browse/SPARK-26365
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Spark Submit
>Affects Versions: 2.3.2, 2.4.0, 3.0.0, 3.1.0
>Reporter: Oscar Bonilla
>Assignee: Apache Spark
>Priority: Major
> Attachments: spark-2.4.5-raise-exception-k8s-failure.patch, 
> spark-3.0.0-raise-exception-k8s-failure.patch
>
>
> When launching apps using spark-submit in a kubernetes cluster, if the Spark 
> applications fails (returns exit code = 1 for example), spark-submit will 
> still exit gracefully and return exit code = 0.
> This is problematic, since there's no way to know if there's been a problem 
> with the Spark application.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42427) Conv should return an error if the internal conversion overflows

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42427?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691954#comment-17691954
 ] 

Apache Spark commented on SPARK-42427:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40117

> Conv should return an error if the internal conversion overflows
> 
>
> Key: SPARK-42427
> URL: https://issues.apache.org/jira/browse/SPARK-42427
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Gengliang Wang
>Assignee: Gengliang Wang
>Priority: Major
> Fix For: 3.4.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41391) The output column name of `groupBy.agg(count_distinct)` is incorrect

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691930#comment-17691930
 ] 

Apache Spark commented on SPARK-41391:
--

User 'ritikam2' has created a pull request for this issue:
https://github.com/apache/spark/pull/40116

> The output column name of `groupBy.agg(count_distinct)` is incorrect
> 
>
> Key: SPARK-41391
> URL: https://issues.apache.org/jira/browse/SPARK-41391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> scala> val df = spark.range(1, 10).withColumn("value", lit(1))
> df: org.apache.spark.sql.DataFrame = [id: bigint, value: int]
> scala> df.createOrReplaceTempView("table")
> scala> df.groupBy("id").agg(count_distinct($"value"))
> res1: org.apache.spark.sql.DataFrame = [id: bigint, count(value): bigint]
> scala> spark.sql(" SELECT id, COUNT(DISTINCT value) FROM table GROUP BY id ")
> res2: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT value): 
> bigint]
> scala> df.groupBy("id").agg(count_distinct($"*"))
> res3: org.apache.spark.sql.DataFrame = [id: bigint, count(unresolvedstar()): 
> bigint]
> scala> spark.sql(" SELECT id, COUNT(DISTINCT *) FROM table GROUP BY id ")
> res4: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT id, 
> value): bigint]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41391) The output column name of `groupBy.agg(count_distinct)` is incorrect

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41391?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691929#comment-17691929
 ] 

Apache Spark commented on SPARK-41391:
--

User 'ritikam2' has created a pull request for this issue:
https://github.com/apache/spark/pull/40116

> The output column name of `groupBy.agg(count_distinct)` is incorrect
> 
>
> Key: SPARK-41391
> URL: https://issues.apache.org/jira/browse/SPARK-41391
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.0, 3.3.0, 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>
> scala> val df = spark.range(1, 10).withColumn("value", lit(1))
> df: org.apache.spark.sql.DataFrame = [id: bigint, value: int]
> scala> df.createOrReplaceTempView("table")
> scala> df.groupBy("id").agg(count_distinct($"value"))
> res1: org.apache.spark.sql.DataFrame = [id: bigint, count(value): bigint]
> scala> spark.sql(" SELECT id, COUNT(DISTINCT value) FROM table GROUP BY id ")
> res2: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT value): 
> bigint]
> scala> df.groupBy("id").agg(count_distinct($"*"))
> res3: org.apache.spark.sql.DataFrame = [id: bigint, count(unresolvedstar()): 
> bigint]
> scala> spark.sql(" SELECT id, COUNT(DISTINCT *) FROM table GROUP BY id ")
> res4: org.apache.spark.sql.DataFrame = [id: bigint, count(DISTINCT id, 
> value): bigint]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42525) collapse two adjacent windows with the same partition/order in subquery

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42525:


Assignee: Apache Spark

> collapse two adjacent windows with the same partition/order in subquery
> ---
>
> Key: SPARK-42525
> URL: https://issues.apache.org/jira/browse/SPARK-42525
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: zhuml
>Assignee: Apache Spark
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes, when one window in 
> subquery.
>  
> {code:java}
> select a, b, c, row_number() over (partition by a order by b) as d from
> ( select a, b, rank() over (partition by a order by b) as c from t1) t2
> == Optimized Logical Plan ==
> before
> Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25], [a#11], [b#12 ASC NULLS FIRST]
>    +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 
> 1 replicas)
>          +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>             +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>                +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1517/1628848368@3a479fda, obj#5: 
> scala.Tuple2
>                   +- *(1) DeserializeToObject staticinvoke(class 
> java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, 
> false, true), obj#4: java.lang.Long
>                      +- *(1) Range (0, 10, step=1, splits=2)
> after
> Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 
> replicas)
>       +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>          +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>             +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1518/1928028672@4d7a64ca, obj#5: 
> scala.Tuple2
>                +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, 
> ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: 
> java.lang.Long
>                   +- *(1) Range (0, 10, step=1, splits=2){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42525) collapse two adjacent windows with the same partition/order in subquery

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691904#comment-17691904
 ] 

Apache Spark commented on SPARK-42525:
--

User 'zml1206' has created a pull request for this issue:
https://github.com/apache/spark/pull/40115

> collapse two adjacent windows with the same partition/order in subquery
> ---
>
> Key: SPARK-42525
> URL: https://issues.apache.org/jira/browse/SPARK-42525
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: zhuml
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes, when one window in 
> subquery.
>  
> {code:java}
> select a, b, c, row_number() over (partition by a order by b) as d from
> ( select a, b, rank() over (partition by a order by b) as c from t1) t2
> == Optimized Logical Plan ==
> before
> Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25], [a#11], [b#12 ASC NULLS FIRST]
>    +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 
> 1 replicas)
>          +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>             +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>                +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1517/1628848368@3a479fda, obj#5: 
> scala.Tuple2
>                   +- *(1) DeserializeToObject staticinvoke(class 
> java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, 
> false, true), obj#4: java.lang.Long
>                      +- *(1) Range (0, 10, step=1, splits=2)
> after
> Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 
> replicas)
>       +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>          +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>             +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1518/1928028672@4d7a64ca, obj#5: 
> scala.Tuple2
>                +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, 
> ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: 
> java.lang.Long
>                   +- *(1) Range (0, 10, step=1, splits=2){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42525) collapse two adjacent windows with the same partition/order in subquery

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42525:


Assignee: (was: Apache Spark)

> collapse two adjacent windows with the same partition/order in subquery
> ---
>
> Key: SPARK-42525
> URL: https://issues.apache.org/jira/browse/SPARK-42525
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: zhuml
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes, when one window in 
> subquery.
>  
> {code:java}
> select a, b, c, row_number() over (partition by a order by b) as d from
> ( select a, b, rank() over (partition by a order by b) as c from t1) t2
> == Optimized Logical Plan ==
> before
> Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25], [a#11], [b#12 ASC NULLS FIRST]
>    +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 
> 1 replicas)
>          +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>             +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>                +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1517/1628848368@3a479fda, obj#5: 
> scala.Tuple2
>                   +- *(1) DeserializeToObject staticinvoke(class 
> java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, 
> false, true), obj#4: java.lang.Long
>                      +- *(1) Range (0, 10, step=1, splits=2)
> after
> Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 
> replicas)
>       +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>          +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>             +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1518/1928028672@4d7a64ca, obj#5: 
> scala.Tuple2
>                +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, 
> ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: 
> java.lang.Long
>                   +- *(1) Range (0, 10, step=1, splits=2){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42525) collapse two adjacent windows with the same partition/order in subquery

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42525?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691905#comment-17691905
 ] 

Apache Spark commented on SPARK-42525:
--

User 'zml1206' has created a pull request for this issue:
https://github.com/apache/spark/pull/40115

> collapse two adjacent windows with the same partition/order in subquery
> ---
>
> Key: SPARK-42525
> URL: https://issues.apache.org/jira/browse/SPARK-42525
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.2.3
>Reporter: zhuml
>Assignee: Apache Spark
>Priority: Major
>
> Extend the CollapseWindow rule to collapse Window nodes, when one window in 
> subquery.
>  
> {code:java}
> select a, b, c, row_number() over (partition by a order by b) as d from
> ( select a, b, rank() over (partition by a order by b) as c from t1) t2
> == Optimized Logical Plan ==
> before
> Window [row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25], [a#11], [b#12 ASC NULLS FIRST]
>    +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 
> 1 replicas)
>          +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>             +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>                +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1517/1628848368@3a479fda, obj#5: 
> scala.Tuple2
>                   +- *(1) DeserializeToObject staticinvoke(class 
> java.lang.Long, ObjectType(class java.lang.Long), valueOf, id#0L, true, 
> false, true), obj#4: java.lang.Long
>                      +- *(1) Range (0, 10, step=1, splits=2)
> after
> Window [rank(b#12) windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> c#25, row_number() windowspecdefinition(a#11, b#12 ASC NULLS FIRST, 
> specifiedwindowframe(RowFrame, unboundedpreceding$(), currentrow$())) AS 
> d#26], [a#11], [b#12 ASC NULLS FIRST]
> +- InMemoryRelation [a#11, b#12], StorageLevel(disk, memory, deserialized, 1 
> replicas)
>       +- *(1) Project [_1#6 AS a#11, _2#7 AS b#12]
>          +- *(1) SerializeFromObject [knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._1 AS _1#6, knownnotnull(assertnotnull(input[0, 
> scala.Tuple2, true]))._2 AS _2#7]
>             +- *(1) MapElements 
> org.apache.spark.sql.DataFrameSuite$$Lambda$1518/1928028672@4d7a64ca, obj#5: 
> scala.Tuple2
>                +- *(1) DeserializeToObject staticinvoke(class java.lang.Long, 
> ObjectType(class java.lang.Long), valueOf, id#0L, true, false, true), obj#4: 
> java.lang.Long
>                   +- *(1) Range (0, 10, step=1, splits=2){code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42513) Push down topK through join

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42513:


Assignee: (was: Apache Spark)

> Push down topK through join
> ---
>
> Key: SPARK-42513
> URL: https://issues.apache.org/jira/browse/SPARK-42513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: after-UI.png, before-UI.png
>
>
> {code:scala}
> spark.range(1).selectExpr("id % 1 as a", "id as 
> b").write.saveAsTable("t1")
> spark.range(1).selectExpr("id % 1 as x", "id as 
> y").write.saveAsTable("t2")
> sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> spark.sql("set 
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.LimitPushDown")
> sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42513) Push down topK through join

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42513?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42513:


Assignee: Apache Spark

> Push down topK through join
> ---
>
> Key: SPARK-42513
> URL: https://issues.apache.org/jira/browse/SPARK-42513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
> Attachments: after-UI.png, before-UI.png
>
>
> {code:scala}
> spark.range(1).selectExpr("id % 1 as a", "id as 
> b").write.saveAsTable("t1")
> spark.range(1).selectExpr("id % 1 as x", "id as 
> y").write.saveAsTable("t2")
> sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> spark.sql("set 
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.LimitPushDown")
> sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42513) Push down topK through join

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691896#comment-17691896
 ] 

Apache Spark commented on SPARK-42513:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40114

> Push down topK through join
> ---
>
> Key: SPARK-42513
> URL: https://issues.apache.org/jira/browse/SPARK-42513
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: after-UI.png, before-UI.png
>
>
> {code:scala}
> spark.range(1).selectExpr("id % 1 as a", "id as 
> b").write.saveAsTable("t1")
> spark.range(1).selectExpr("id % 1 as x", "id as 
> y").write.saveAsTable("t2")
> sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> spark.sql("set 
> spark.sql.optimizer.excludedRules=org.apache.spark.sql.catalyst.optimizer.LimitPushDown")
> sql("select * from t1 left join t2 on a = x order by b limit 5").collect()
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42509) WindowGroupLimitExec supports codegen

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691876#comment-17691876
 ] 

Apache Spark commented on SPARK-42509:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40113

> WindowGroupLimitExec supports codegen
> -
>
> Key: SPARK-42509
> URL: https://issues.apache.org/jira/browse/SPARK-42509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42509) WindowGroupLimitExec supports codegen

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42509:


Assignee: (was: Apache Spark)

> WindowGroupLimitExec supports codegen
> -
>
> Key: SPARK-42509
> URL: https://issues.apache.org/jira/browse/SPARK-42509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41933) Provide local mode that automatically starts the server

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691875#comment-17691875
 ] 

Apache Spark commented on SPARK-41933:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40112

> Provide local mode that automatically starts the server
> ---
>
> Key: SPARK-41933
> URL: https://issues.apache.org/jira/browse/SPARK-41933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently the Spark Connect server has to be started manually which is 
> troublesome for end users and developers to try Spark Connect out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42509) WindowGroupLimitExec supports codegen

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42509?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42509:


Assignee: Apache Spark

> WindowGroupLimitExec supports codegen
> -
>
> Key: SPARK-42509
> URL: https://issues.apache.org/jira/browse/SPARK-42509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42509) WindowGroupLimitExec supports codegen

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42509?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691873#comment-17691873
 ] 

Apache Spark commented on SPARK-42509:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40113

> WindowGroupLimitExec supports codegen
> -
>
> Key: SPARK-42509
> URL: https://issues.apache.org/jira/browse/SPARK-42509
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: jiaan.geng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41933) Provide local mode that automatically starts the server

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691871#comment-17691871
 ] 

Apache Spark commented on SPARK-41933:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40112

> Provide local mode that automatically starts the server
> ---
>
> Key: SPARK-41933
> URL: https://issues.apache.org/jira/browse/SPARK-41933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.4.0
>
>
> Currently the Spark Connect server has to be started manually which is 
> troublesome for end users and developers to try Spark Connect out.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42524) Upgrade numpy and pandas in the release Dockerfile

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42524:


Assignee: (was: Apache Spark)

> Upgrade numpy and pandas in the release Dockerfile
> --
>
> Key: SPARK-42524
> URL: https://issues.apache.org/jira/browse/SPARK-42524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Otherwise, errors are raised as shown below when building release docs.
> {code}
> ImportError: Warning: Latest version of pandas (1.5.3) is required to 
> generate the documentation; however, your version was 1.1.5
> ImportError: this version of pandas is incompatible with numpy < 1.20.3
> your numpy version is 1.19.4.
> Please upgrade numpy to >= 1.20.3 to use this pandas version
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42524) Upgrade numpy and pandas in the release Dockerfile

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42524?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42524:


Assignee: Apache Spark

> Upgrade numpy and pandas in the release Dockerfile
> --
>
> Key: SPARK-42524
> URL: https://issues.apache.org/jira/browse/SPARK-42524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Otherwise, errors are raised as shown below when building release docs.
> {code}
> ImportError: Warning: Latest version of pandas (1.5.3) is required to 
> generate the documentation; however, your version was 1.1.5
> ImportError: this version of pandas is incompatible with numpy < 1.20.3
> your numpy version is 1.19.4.
> Please upgrade numpy to >= 1.20.3 to use this pandas version
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42524) Upgrade numpy and pandas in the release Dockerfile

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691867#comment-17691867
 ] 

Apache Spark commented on SPARK-42524:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40111

> Upgrade numpy and pandas in the release Dockerfile
> --
>
> Key: SPARK-42524
> URL: https://issues.apache.org/jira/browse/SPARK-42524
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Otherwise, errors are raised as shown below when building release docs.
> {code}
> ImportError: Warning: Latest version of pandas (1.5.3) is required to 
> generate the documentation; however, your version was 1.1.5
> ImportError: this version of pandas is incompatible with numpy < 1.20.3
> your numpy version is 1.19.4.
> Please upgrade numpy to >= 1.20.3 to use this pandas version
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41775) Implement training functions as input

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691836#comment-17691836
 ] 

Apache Spark commented on SPARK-41775:
--

User 'rithwik-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40110

> Implement training functions as input
> -
>
> Key: SPARK-41775
> URL: https://issues.apache.org/jira/browse/SPARK-41775
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.4.0
>Reporter: Rithwik Ediga Lakhamsani
>Assignee: Rithwik Ediga Lakhamsani
>Priority: Major
> Fix For: 3.4.0
>
>
> Sidenote: make formatting updates described in 
> https://github.com/apache/spark/pull/39188
>  
> Currently, `Distributor().run(...)` takes only files as input. Now we will 
> add in additional functionality to take in functions as well. This will 
> require us to go through the following process on each task in the executor 
> nodes:
> 1. take the input function and args and pickle them
> 2. Create a temp train.py file that looks like
> {code:java}
> import cloudpickle
> import os
> if _name_ == "_main_":
>     train, args = cloudpickle.load(f"{tempdir}/train_input.pkl")
>     output = train(*args)
>     if output and os.environ.get("RANK", "") == "0": # this is for 
> partitionId == 0
>         cloudpickle.dump(f"{tempdir}/train_output.pkl") {code}
> 3. Run that train.py file with `torchrun`
> 4. Check if `train_output.pkl` has been created on process on partitionId == 
> 0, if it has, then deserialize it and return that output through `.collect()`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42522) Fix DataFrameWriterV2 to find the default source

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42522:


Assignee: Apache Spark

> Fix DataFrameWriterV2 to find the default source
> 
>
> Key: SPARK-42522
> URL: https://issues.apache.org/jira/browse/SPARK-42522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Assignee: Apache Spark
>Priority: Major
>
> {code:python}
> df.writeTo("test_table").create()
> {code}
> throws:
> {noformat}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkClassNotFoundException) [DATA_SOURCE_NOT_FOUND] Failed 
> to find the data source: . Please find packages at 
> `https://spark.apache.org/third-party-projects.html`.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42522) Fix DataFrameWriterV2 to find the default source

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691801#comment-17691801
 ] 

Apache Spark commented on SPARK-42522:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40109

> Fix DataFrameWriterV2 to find the default source
> 
>
> Key: SPARK-42522
> URL: https://issues.apache.org/jira/browse/SPARK-42522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> df.writeTo("test_table").create()
> {code}
> throws:
> {noformat}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkClassNotFoundException) [DATA_SOURCE_NOT_FOUND] Failed 
> to find the data source: . Please find packages at 
> `https://spark.apache.org/third-party-projects.html`.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42522) Fix DataFrameWriterV2 to find the default source

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42522:


Assignee: (was: Apache Spark)

> Fix DataFrameWriterV2 to find the default source
> 
>
> Key: SPARK-42522
> URL: https://issues.apache.org/jira/browse/SPARK-42522
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Takuya Ueshin
>Priority: Major
>
> {code:python}
> df.writeTo("test_table").create()
> {code}
> throws:
> {noformat}
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (org.apache.spark.SparkClassNotFoundException) [DATA_SOURCE_NOT_FOUND] Failed 
> to find the data source: . Please find packages at 
> `https://spark.apache.org/third-party-projects.html`.
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42521) Add NULL values for INSERT commands with user-specified lists of fewer columns than the target table

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42521:


Assignee: (was: Apache Spark)

> Add NULL values for INSERT commands with user-specified lists of fewer 
> columns than the target table
> 
>
> Key: SPARK-42521
> URL: https://issues.apache.org/jira/browse/SPARK-42521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42521) Add NULL values for INSERT commands with user-specified lists of fewer columns than the target table

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42521?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42521:


Assignee: Apache Spark

> Add NULL values for INSERT commands with user-specified lists of fewer 
> columns than the target table
> 
>
> Key: SPARK-42521
> URL: https://issues.apache.org/jira/browse/SPARK-42521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42521) Add NULL values for INSERT commands with user-specified lists of fewer columns than the target table

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691786#comment-17691786
 ] 

Apache Spark commented on SPARK-42521:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/40108

> Add NULL values for INSERT commands with user-specified lists of fewer 
> columns than the target table
> 
>
> Key: SPARK-42521
> URL: https://issues.apache.org/jira/browse/SPARK-42521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42521) Add NULL values for INSERT commands with user-specified lists of fewer columns than the target table

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691785#comment-17691785
 ] 

Apache Spark commented on SPARK-42521:
--

User 'dtenedor' has created a pull request for this issue:
https://github.com/apache/spark/pull/40108

> Add NULL values for INSERT commands with user-specified lists of fewer 
> columns than the target table
> 
>
> Key: SPARK-42521
> URL: https://issues.apache.org/jira/browse/SPARK-42521
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Daniel
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42520) Spark Connect Scala Client: Window

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42520?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691774#comment-17691774
 ] 

Apache Spark commented on SPARK-42520:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40107

> Spark Connect Scala Client: Window
> --
>
> Key: SPARK-42520
> URL: https://issues.apache.org/jira/browse/SPARK-42520
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42520) Spark Connect Scala Client: Window

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42520:


Assignee: Apache Spark  (was: Rui Wang)

> Spark Connect Scala Client: Window
> --
>
> Key: SPARK-42520
> URL: https://issues.apache.org/jira/browse/SPARK-42520
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42520) Spark Connect Scala Client: Window

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42520?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42520:


Assignee: Rui Wang  (was: Apache Spark)

> Spark Connect Scala Client: Window
> --
>
> Key: SPARK-42520
> URL: https://issues.apache.org/jira/browse/SPARK-42520
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42518) Scala client Write API V2

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42518:


Assignee: (was: Apache Spark)

> Scala client Write API V2
> -
>
> Key: SPARK-42518
> URL: https://issues.apache.org/jira/browse/SPARK-42518
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42518) Scala client Write API V2

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691758#comment-17691758
 ] 

Apache Spark commented on SPARK-42518:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40075

> Scala client Write API V2
> -
>
> Key: SPARK-42518
> URL: https://issues.apache.org/jira/browse/SPARK-42518
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42518) Scala client Write API V2

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42518?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42518:


Assignee: Apache Spark

> Scala client Write API V2
> -
>
> Key: SPARK-42518
> URL: https://issues.apache.org/jira/browse/SPARK-42518
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> Impl the Dataset#writeTo method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691721#comment-17691721
 ] 

Apache Spark commented on SPARK-42002:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40106

> Implement DataFrameWriterV2 (ReadwriterV2Tests)
> ---
>
> Key: SPARK-42002
> URL: https://issues.apache.org/jira/browse/SPARK-42002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api)
> self = 
>  testMethod=test_api>
> def test_api(self):
> df = self.df
> >   writer = df.writeTo("testcat.t")
> ../test_readwriter.py:185: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = 
> {}
> def writeTo(self, *args: Any, **kwargs: Any) -> None:
> >   raise NotImplementedError("writeTo() is not implemented.")
> E   NotImplementedError: writeTo() is not implemented.
> ../../connect/dataframe.py:1529: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42002) Implement DataFrameWriterV2 (ReadwriterV2Tests)

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42002?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691720#comment-17691720
 ] 

Apache Spark commented on SPARK-42002:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40106

> Implement DataFrameWriterV2 (ReadwriterV2Tests)
> ---
>
> Key: SPARK-42002
> URL: https://issues.apache.org/jira/browse/SPARK-42002
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Sandeep Singh
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> pyspark/sql/tests/test_readwriter.py:182 (ReadwriterV2ParityTests.test_api)
> self = 
>  testMethod=test_api>
> def test_api(self):
> df = self.df
> >   writer = df.writeTo("testcat.t")
> ../test_readwriter.py:185: 
> _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
> _ 
> self = DataFrame[key: bigint, value: string], args = ('testcat.t',), kwargs = 
> {}
> def writeTo(self, *args: Any, **kwargs: Any) -> None:
> >   raise NotImplementedError("writeTo() is not implemented.")
> E   NotImplementedError: writeTo() is not implemented.
> ../../connect/dataframe.py:1529: NotImplementedError
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42516) Non-captured session time zone in view creation

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42516?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691715#comment-17691715
 ] 

Apache Spark commented on SPARK-42516:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/40103

> Non-captured session time zone in view creation
> ---
>
> Key: SPARK-42516
> URL: https://issues.apache.org/jira/browse/SPARK-42516
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The session time zone config is captured only when it is set explicitly but 
> if it is not the view is instantiated with the current settings. That's might 
> confuse users, for instance:
> Set the session time zone explicitly before view creation:
> {code:java}
> TODO
> {code}
> Set the same time zone implicitly as JVM time zone, and the default value of 
> the SQL config   spark.sql.session.timeZone.
> {code:java}
> TODO
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42516) Non-captured session time zone in view creation

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42516:


Assignee: Max Gekk  (was: Apache Spark)

> Non-captured session time zone in view creation
> ---
>
> Key: SPARK-42516
> URL: https://issues.apache.org/jira/browse/SPARK-42516
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>
> The session time zone config is captured only when it is set explicitly but 
> if it is not the view is instantiated with the current settings. That's might 
> confuse users, for instance:
> Set the session time zone explicitly before view creation:
> {code:java}
> TODO
> {code}
> Set the same time zone implicitly as JVM time zone, and the default value of 
> the SQL config   spark.sql.session.timeZone.
> {code:java}
> TODO
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42516) Non-captured session time zone in view creation

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42516?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42516:


Assignee: Apache Spark  (was: Max Gekk)

> Non-captured session time zone in view creation
> ---
>
> Key: SPARK-42516
> URL: https://issues.apache.org/jira/browse/SPARK-42516
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Assignee: Apache Spark
>Priority: Major
>
> The session time zone config is captured only when it is set explicitly but 
> if it is not the view is instantiated with the current settings. That's might 
> confuse users, for instance:
> Set the session time zone explicitly before view creation:
> {code:java}
> TODO
> {code}
> Set the same time zone implicitly as JVM time zone, and the default value of 
> the SQL config   spark.sql.session.timeZone.
> {code:java}
> TODO
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42514) Scala Client add partition transforms functions

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42514:


Assignee: (was: Apache Spark)

> Scala Client add partition transforms functions
> ---
>
> Key: SPARK-42514
> URL: https://issues.apache.org/jira/browse/SPARK-42514
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42514) Scala Client add partition transforms functions

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42514?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691692#comment-17691692
 ] 

Apache Spark commented on SPARK-42514:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/40105

> Scala Client add partition transforms functions
> ---
>
> Key: SPARK-42514
> URL: https://issues.apache.org/jira/browse/SPARK-42514
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42514) Scala Client add partition transforms functions

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42514?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42514:


Assignee: Apache Spark

> Scala Client add partition transforms functions
> ---
>
> Key: SPARK-42514
> URL: https://issues.apache.org/jira/browse/SPARK-42514
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42510:


Assignee: Apache Spark

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Assignee: Apache Spark
>Priority: Major
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691549#comment-17691549
 ] 

Apache Spark commented on SPARK-42510:
--

User 'xinrong-meng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40104

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42510) Implement `DataFrame.mapInPandas`

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42510?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42510:


Assignee: (was: Apache Spark)

> Implement `DataFrame.mapInPandas`
> -
>
> Key: SPARK-42510
> URL: https://issues.apache.org/jira/browse/SPARK-42510
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Xinrong Meng
>Priority: Major
>
> Implement `DataFrame.mapInPandas`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42508:


Assignee: (was: Apache Spark)

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691445#comment-17691445
 ] 

Apache Spark commented on SPARK-42508:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40097

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-02-21 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691446#comment-17691446
 ] 

Apache Spark commented on SPARK-42508:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/40097

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42508) Extract the common .ml classes to `mllib-common`

2023-02-21 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42508:


Assignee: Apache Spark

> Extract the common .ml classes to `mllib-common`
> 
>
> Key: SPARK-42508
> URL: https://issues.apache.org/jira/browse/SPARK-42508
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ML
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42507) Simplify ORC schema merging conflict error check

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691420#comment-17691420
 ] 

Apache Spark commented on SPARK-42507:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40101

> Simplify ORC schema merging conflict error check
> 
>
> Key: SPARK-42507
> URL: https://issues.apache.org/jira/browse/SPARK-42507
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42507) Simplify ORC schema merging conflict error check

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42507:


Assignee: (was: Apache Spark)

> Simplify ORC schema merging conflict error check
> 
>
> Key: SPARK-42507
> URL: https://issues.apache.org/jira/browse/SPARK-42507
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42507) Simplify ORC schema merging conflict error check

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691419#comment-17691419
 ] 

Apache Spark commented on SPARK-42507:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/40101

> Simplify ORC schema merging conflict error check
> 
>
> Key: SPARK-42507
> URL: https://issues.apache.org/jira/browse/SPARK-42507
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42507) Simplify ORC schema merging conflict error check

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42507:


Assignee: Apache Spark

> Simplify ORC schema merging conflict error check
> 
>
> Key: SPARK-42507
> URL: https://issues.apache.org/jira/browse/SPARK-42507
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 3.4.0
>Reporter: Dongjoon Hyun
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42506) Fix Sort's maxRowsPerPartition if maxRows does not exist

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42506:


Assignee: (was: Apache Spark)

> Fix Sort's maxRowsPerPartition if maxRows does not exist
> 
>
> Key: SPARK-42506
> URL: https://issues.apache.org/jira/browse/SPARK-42506
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42506) Fix Sort's maxRowsPerPartition if maxRows does not exist

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42506:


Assignee: Apache Spark

> Fix Sort's maxRowsPerPartition if maxRows does not exist
> 
>
> Key: SPARK-42506
> URL: https://issues.apache.org/jira/browse/SPARK-42506
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42506) Fix Sort's maxRowsPerPartition if maxRows does not exist

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691414#comment-17691414
 ] 

Apache Spark commented on SPARK-42506:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40100

> Fix Sort's maxRowsPerPartition if maxRows does not exist
> 
>
> Key: SPARK-42506
> URL: https://issues.apache.org/jira/browse/SPARK-42506
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42504) NestedColumnAliasing support pruning adjacent projects

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42504:


Assignee: Apache Spark

> NestedColumnAliasing support pruning adjacent projects
> --
>
> Key: SPARK-42504
> URL: https://issues.apache.org/jira/browse/SPARK-42504
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Assignee: Apache Spark
>Priority: Major
>
> CollapseProject won't combine adjacent projects into one, e.g. non-cheap 
> expression has been accessed more than once with the below project. Then 
> there would be possible to appear some adjacent project nodes that 
> NestedColumnAliasing does not support pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42504) NestedColumnAliasing support pruning adjacent projects

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691354#comment-17691354
 ] 

Apache Spark commented on SPARK-42504:
--

User 'ulysses-you' has created a pull request for this issue:
https://github.com/apache/spark/pull/40098

> NestedColumnAliasing support pruning adjacent projects
> --
>
> Key: SPARK-42504
> URL: https://issues.apache.org/jira/browse/SPARK-42504
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>
> CollapseProject won't combine adjacent projects into one, e.g. non-cheap 
> expression has been accessed more than once with the below project. Then 
> there would be possible to appear some adjacent project nodes that 
> NestedColumnAliasing does not support pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42504) NestedColumnAliasing support pruning adjacent projects

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42504:


Assignee: (was: Apache Spark)

> NestedColumnAliasing support pruning adjacent projects
> --
>
> Key: SPARK-42504
> URL: https://issues.apache.org/jira/browse/SPARK-42504
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: XiDuo You
>Priority: Major
>
> CollapseProject won't combine adjacent projects into one, e.g. non-cheap 
> expression has been accessed more than once with the below project. Then 
> there would be possible to appear some adjacent project nodes that 
> NestedColumnAliasing does not support pruning.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691290#comment-17691290
 ] 

Apache Spark commented on SPARK-41823:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40094

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691288#comment-17691288
 ] 

Apache Spark commented on SPARK-41823:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40094

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41823) DataFrame.join creating ambiguous column names

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691289#comment-17691289
 ] 

Apache Spark commented on SPARK-41823:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40094

> DataFrame.join creating ambiguous column names
> --
>
> Key: SPARK-41823
> URL: https://issues.apache.org/jira/browse/SPARK-41823
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Sandeep Singh
>Assignee: Ruifeng Zheng
>Priority: Major
>
> {code:java}
> File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 254, in pyspark.sql.connect.dataframe.DataFrame.drop
> Failed example:
>     df.join(df2, df.name == df2.name, 'inner').drop('name').show()
> Exception raised:
>     Traceback (most recent call last):
>       File 
> "/usr/local/Cellar/python@3.10/3.10.8/Frameworks/Python.framework/Versions/3.10/lib/python3.10/doctest.py",
>  line 1350, in __run
>         exec(compile(example.source, filename, "single",
>       File "", line 
> 1, in 
>         df.join(df2, df.name == df2.name, 'inner').drop('name').show()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 534, in show
>         print(self._show_string(n, truncate, vertical))
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 423, in _show_string
>         ).toPandas()
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/dataframe.py", 
> line 1031, in toPandas
>         return self._session.client.to_pandas(query)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 413, in to_pandas
>         return self._execute_and_fetch(req)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 573, in _execute_and_fetch
>         self._handle_error(rpc_error)
>       File 
> "/Users/s.singh/personal/spark-oss/python/pyspark/sql/connect/client.py", 
> line 619, in _handle_error
>         raise SparkConnectAnalysisException(
>     pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `name` is ambiguous, could be: [`name`, 
> `name`].
>     Plan: {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41812) DataFrame.join: ambiguous column

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691287#comment-17691287
 ] 

Apache Spark commented on SPARK-41812:
--

User 'grundprinzip' has created a pull request for this issue:
https://github.com/apache/spark/pull/40094

> DataFrame.join: ambiguous column
> 
>
> Key: SPARK-41812
> URL: https://issues.apache.org/jira/browse/SPARK-41812
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Hyukjin Kwon
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.4.0
>
>
> {code}
> File "/.../spark/python/pyspark/sql/connect/column.py", line 106, in 
> pyspark.sql.connect.column.Column.eqNullSafe
> Failed example:
> df1.join(df2, df1["value"] == df2["value"]).count()
> Exception raised:
> Traceback (most recent call last):
>   File "/.../miniconda3/envs/python3.9/lib/python3.9/doctest.py", line 
> 1336, in __run
> exec(compile(example.source, filename, "single",
>   File "", line 
> 1, in 
> df1.join(df2, df1["value"] == df2["value"]).count()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 151, in 
> count
> pdd = self.agg(_invoke_function("count", lit(1))).toPandas()
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1031, 
> in toPandas
> return self._session.client.to_pandas(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 413, in 
> to_pandas
> return self._execute_and_fetch(req)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 573, in 
> _execute_and_fetch
> self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 619, in 
> _handle_error
> raise SparkConnectAnalysisException(
> pyspark.sql.connect.client.SparkConnectAnalysisException: 
> [AMBIGUOUS_REFERENCE] Reference `value` is ambiguous, could be: [`value`, 
> `value`].
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691136#comment-17691136
 ] 

Apache Spark commented on SPARK-42500:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40093

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42500) ConstantPropagation support more cases

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42500:


Assignee: Apache Spark

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42500) ConstantPropagation support more cases

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42500:


Assignee: (was: Apache Spark)

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691135#comment-17691135
 ] 

Apache Spark commented on SPARK-42500:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40093

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42498) reduce spark connect service retry time

2023-02-20 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691056#comment-17691056
 ] 

Apache Spark commented on SPARK-42498:
--

User 'nija-at' has created a pull request for this issue:
https://github.com/apache/spark/pull/40066

> reduce spark connect service retry time
> ---
>
> Key: SPARK-42498
> URL: https://issues.apache.org/jira/browse/SPARK-42498
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Priority: Major
>
> https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411
>  
> Currently, 15 retries with the current backoff strategy result in the client 
> sitting in
> the retry loop for ~400 seconds in the worst case. This means, applications 
> and
> users using the spark connect client will hang for >6 minutes with no 
> response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42498) reduce spark connect service retry time

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42498:


Assignee: Apache Spark

> reduce spark connect service retry time
> ---
>
> Key: SPARK-42498
> URL: https://issues.apache.org/jira/browse/SPARK-42498
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Assignee: Apache Spark
>Priority: Major
>
> https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411
>  
> Currently, 15 retries with the current backoff strategy result in the client 
> sitting in
> the retry loop for ~400 seconds in the worst case. This means, applications 
> and
> users using the spark connect client will hang for >6 minutes with no 
> response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42498) reduce spark connect service retry time

2023-02-20 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42498:


Assignee: (was: Apache Spark)

> reduce spark connect service retry time
> ---
>
> Key: SPARK-42498
> URL: https://issues.apache.org/jira/browse/SPARK-42498
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.3.2
>Reporter: Niranjan Jayakar
>Priority: Major
>
> https://github.com/apache/spark/blob/5fc44dabe5084fb784f064afe691951a3c270793/python/pyspark/sql/connect/client.py#L411
>  
> Currently, 15 retries with the current backoff strategy result in the client 
> sitting in
> the retry loop for ~400 seconds in the worst case. This means, applications 
> and
> users using the spark connect client will hang for >6 minutes with no 
> response.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42475) Getting Started: Live Notebook for Spark Connect

2023-02-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42475:


Assignee: (was: Apache Spark)

> Getting Started: Live Notebook for Spark Connect
> 
>
> Key: SPARK-42475
> URL: https://issues.apache.org/jira/browse/SPARK-42475
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> It would be great to have Live Notebook for Spark Connect in [Getting 
> Started|https://spark.apache.org/docs/latest/api/python/getting_started/index.html]
>  section to help users quick start on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42475) Getting Started: Live Notebook for Spark Connect

2023-02-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42475?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691016#comment-17691016
 ] 

Apache Spark commented on SPARK-42475:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40092

> Getting Started: Live Notebook for Spark Connect
> 
>
> Key: SPARK-42475
> URL: https://issues.apache.org/jira/browse/SPARK-42475
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Priority: Major
>
> It would be great to have Live Notebook for Spark Connect in [Getting 
> Started|https://spark.apache.org/docs/latest/api/python/getting_started/index.html]
>  section to help users quick start on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42475) Getting Started: Live Notebook for Spark Connect

2023-02-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42475:


Assignee: Apache Spark

> Getting Started: Live Notebook for Spark Connect
> 
>
> Key: SPARK-42475
> URL: https://issues.apache.org/jira/browse/SPARK-42475
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> It would be great to have Live Notebook for Spark Connect in [Getting 
> Started|https://spark.apache.org/docs/latest/api/python/getting_started/index.html]
>  section to help users quick start on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41952:


Assignee: Apache Spark

> Upgrade Parquet to fix off-heap memory leaks in Zstd codec
> --
>
> Key: SPARK-41952
> URL: https://issues.apache.org/jira/browse/SPARK-41952
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.3, 3.3.1, 3.2.3
>Reporter: Alexey Kudinkin
>Assignee: Apache Spark
>Priority: Critical
>
> Recently, native memory leak have been discovered in Parquet in conjunction 
> of it using Zstd decompressor from luben/zstd-jni library (PARQUET-2160).
> This is very problematic to a point where we can't use Parquet w/ Zstd due to 
> pervasive OOMs taking down our executors and disrupting our jobs.
> Luckily fix addressing this had already landed in Parquet:
> [https://github.com/apache/parquet-mr/pull/982]
>  
> Now, we just need to
>  # Updated version of Parquet is released in a timely manner
>  # Spark is upgraded onto this new version in the upcoming release
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691015#comment-17691015
 ] 

Apache Spark commented on SPARK-41952:
--

User 'pan3793' has created a pull request for this issue:
https://github.com/apache/spark/pull/40091

> Upgrade Parquet to fix off-heap memory leaks in Zstd codec
> --
>
> Key: SPARK-41952
> URL: https://issues.apache.org/jira/browse/SPARK-41952
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.3, 3.3.1, 3.2.3
>Reporter: Alexey Kudinkin
>Priority: Critical
>
> Recently, native memory leak have been discovered in Parquet in conjunction 
> of it using Zstd decompressor from luben/zstd-jni library (PARQUET-2160).
> This is very problematic to a point where we can't use Parquet w/ Zstd due to 
> pervasive OOMs taking down our executors and disrupting our jobs.
> Luckily fix addressing this had already landed in Parquet:
> [https://github.com/apache/parquet-mr/pull/982]
>  
> Now, we just need to
>  # Updated version of Parquet is released in a timely manner
>  # Spark is upgraded onto this new version in the upcoming release
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41952) Upgrade Parquet to fix off-heap memory leaks in Zstd codec

2023-02-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41952:


Assignee: (was: Apache Spark)

> Upgrade Parquet to fix off-heap memory leaks in Zstd codec
> --
>
> Key: SPARK-41952
> URL: https://issues.apache.org/jira/browse/SPARK-41952
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 3.1.3, 3.3.1, 3.2.3
>Reporter: Alexey Kudinkin
>Priority: Critical
>
> Recently, native memory leak have been discovered in Parquet in conjunction 
> of it using Zstd decompressor from luben/zstd-jni library (PARQUET-2160).
> This is very problematic to a point where we can't use Parquet w/ Zstd due to 
> pervasive OOMs taking down our executors and disrupting our jobs.
> Luckily fix addressing this had already landed in Parquet:
> [https://github.com/apache/parquet-mr/pull/982]
>  
> Now, we just need to
>  # Updated version of Parquet is released in a timely manner
>  # Spark is upgraded onto this new version in the upcoming release
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41741) [SQL] ParquetFilters StringStartsWith push down matching string do not use UTF-8

2023-02-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41741:


Assignee: (was: Apache Spark)

> [SQL] ParquetFilters StringStartsWith push down matching string do not use 
> UTF-8
> 
>
> Key: SPARK-41741
> URL: https://issues.apache.org/jira/browse/SPARK-41741
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiale He
>Priority: Major
> Attachments: image-2022-12-28-18-00-00-861.png, 
> image-2022-12-28-18-00-21-586.png, image-2023-01-09-11-10-31-262.png, 
> image-2023-01-09-18-27-53-479.png, 
> part-0-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet
>
>
> Hello ~
>  
> I found a problem, but there are two ways to solve it.
>  
> The parquet filter is pushed down. When using the like '***%' statement to 
> query, if the system default encoding is not UTF-8, it may cause an error.
>  
> There are two ways to bypass this problem as far as I know
> 1. spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8"
> 2. spark.sql.parquet.filterPushdown.string.startsWith=false
>  
> The following is the information to reproduce this problem
> The parquet sample file is in the attachment
> {code:java}
> spark.read.parquet("file:///home/kylin/hjldir/part-0-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet").createTempView("tmp”)
> spark.sql("select * from tmp where `1` like '啦啦乐乐%'").show(false) {code}
> !image-2022-12-28-18-00-00-861.png|width=879,height=430!
>  
>   !image-2022-12-28-18-00-21-586.png|width=799,height=731!
>  
> I think the correct code should be:
> {code:java}
> private val strToBinary = 
> Binary.fromReusedByteArray(v.getBytes(StandardCharsets.UTF_8)) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-41741) [SQL] ParquetFilters StringStartsWith push down matching string do not use UTF-8

2023-02-19 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41741?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-41741:


Assignee: Apache Spark

> [SQL] ParquetFilters StringStartsWith push down matching string do not use 
> UTF-8
> 
>
> Key: SPARK-41741
> URL: https://issues.apache.org/jira/browse/SPARK-41741
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiale He
>Assignee: Apache Spark
>Priority: Major
> Attachments: image-2022-12-28-18-00-00-861.png, 
> image-2022-12-28-18-00-21-586.png, image-2023-01-09-11-10-31-262.png, 
> image-2023-01-09-18-27-53-479.png, 
> part-0-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet
>
>
> Hello ~
>  
> I found a problem, but there are two ways to solve it.
>  
> The parquet filter is pushed down. When using the like '***%' statement to 
> query, if the system default encoding is not UTF-8, it may cause an error.
>  
> There are two ways to bypass this problem as far as I know
> 1. spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8"
> 2. spark.sql.parquet.filterPushdown.string.startsWith=false
>  
> The following is the information to reproduce this problem
> The parquet sample file is in the attachment
> {code:java}
> spark.read.parquet("file:///home/kylin/hjldir/part-0-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet").createTempView("tmp”)
> spark.sql("select * from tmp where `1` like '啦啦乐乐%'").show(false) {code}
> !image-2022-12-28-18-00-00-861.png|width=879,height=430!
>  
>   !image-2022-12-28-18-00-21-586.png|width=799,height=731!
>  
> I think the correct code should be:
> {code:java}
> private val strToBinary = 
> Binary.fromReusedByteArray(v.getBytes(StandardCharsets.UTF_8)) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-41741) [SQL] ParquetFilters StringStartsWith push down matching string do not use UTF-8

2023-02-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-41741?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691010#comment-17691010
 ] 

Apache Spark commented on SPARK-41741:
--

User 'wangyum' has created a pull request for this issue:
https://github.com/apache/spark/pull/40090

> [SQL] ParquetFilters StringStartsWith push down matching string do not use 
> UTF-8
> 
>
> Key: SPARK-41741
> URL: https://issues.apache.org/jira/browse/SPARK-41741
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Jiale He
>Priority: Major
> Attachments: image-2022-12-28-18-00-00-861.png, 
> image-2022-12-28-18-00-21-586.png, image-2023-01-09-11-10-31-262.png, 
> image-2023-01-09-18-27-53-479.png, 
> part-0-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet
>
>
> Hello ~
>  
> I found a problem, but there are two ways to solve it.
>  
> The parquet filter is pushed down. When using the like '***%' statement to 
> query, if the system default encoding is not UTF-8, it may cause an error.
>  
> There are two ways to bypass this problem as far as I know
> 1. spark.executor.extraJavaOptions="-Dfile.encoding=UTF-8"
> 2. spark.sql.parquet.filterPushdown.string.startsWith=false
>  
> The following is the information to reproduce this problem
> The parquet sample file is in the attachment
> {code:java}
> spark.read.parquet("file:///home/kylin/hjldir/part-0-30432312-7cdb-43ef-befe-93bcfd174878-c000.snappy.parquet").createTempView("tmp”)
> spark.sql("select * from tmp where `1` like '啦啦乐乐%'").show(false) {code}
> !image-2022-12-28-18-00-00-861.png|width=879,height=430!
>  
>   !image-2022-12-28-18-00-21-586.png|width=799,height=731!
>  
> I think the correct code should be:
> {code:java}
> private val strToBinary = 
> Binary.fromReusedByteArray(v.getBytes(StandardCharsets.UTF_8)) {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42495) Scala Client: Add 2nd batch of functions

2023-02-19 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17691001#comment-17691001
 ] 

Apache Spark commented on SPARK-42495:
--

User 'hvanhovell' has created a pull request for this issue:
https://github.com/apache/spark/pull/40089

> Scala Client: Add 2nd batch of functions
> 
>
> Key: SPARK-42495
> URL: https://issues.apache.org/jira/browse/SPARK-42495
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



<    7   8   9   10   11   12   13   14   15   16   >