[jira] [Commented] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696410#comment-17696410
 ] 

Apache Spark commented on SPARK-42555:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40277

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42638) current_user() is blocked from VALUES, but current_timestamp() is not

2023-03-03 Thread zzzzming95 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42638?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696409#comment-17696409
 ] 

ming95 commented on SPARK-42638:


Maybe we can use `insert as select` to achieve the same effect?

> current_user() is blocked from VALUES, but current_timestamp() is not
> -
>
> Key: SPARK-42638
> URL: https://issues.apache.org/jira/browse/SPARK-42638
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Serge Rielau
>Priority: Major
>
> VALUES(current_user());
> returns:
> cannot evaluate expression current_user() in inline table definition.; line 1 
> pos 8
>  
> The same with current_timestamp() works.
> It appears current_user() is recognized as non-deterministic. But it is 
> constant within the statement, just like current_timestanmp().
> PS: It's not clear why we block non-deterministic functions to begin with



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-40885) Spark will filter out data field sorting when dynamic partitions and data fields are sorted at the same time

2023-03-03 Thread zzzzming95 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ming95 resolved SPARK-40885.

Resolution: Fixed

> Spark will filter out data field sorting when dynamic partitions and data 
> fields are sorted at the same time
> 
>
> Key: SPARK-40885
> URL: https://issues.apache.org/jira/browse/SPARK-40885
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.1.2, 3.3.0, 3.2.2, 3.4.0
>Reporter: ming95
>Priority: Major
> Attachments: 1666494504884.jpg
>
>
> When using dynamic partitions to write data and sort partitions and data 
> fields, Spark will filter the sorting of data fields.
>  
> reproduce sql:
> {code:java}
> CREATE TABLE `sort_table`(
>   `id` int,
>   `name` string
>   )
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION 'sort_table';CREATE TABLE `test_table`(
>   `id` int,
>   `name` string)
> PARTITIONED BY (
>   `dt` string)
> stored as textfile
> LOCATION
>   'test_table';//gen test data
> insert into test_table partition(dt=20221011) select 10,"15" union all select 
> 1,"10" union  all select 5,"50" union  all select 20,"2" union  all select 
> 30,"14"  ;
> set spark.hadoop.hive.exec.dynamici.partition=true;
> set spark.hadoop.hive.exec.dynamic.partition.mode=nonstrict;
> // this sql sort with partition filed (`dt`) and data filed (`name`), but 
> sort with `name` can not work
> insert overwrite table sort_table partition(dt) select id,name,dt from 
> test_table order by name,dt;
>  {code}
>  
> The Sort operator of DAG has only one sort field, but there are actually two 
> in SQL.(See the attached drawing)
>  
> It relate this issue : https://issues.apache.org/jira/browse/SPARK-40588



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-03 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42556?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-42556.
---
Fix Version/s: 3.4.0
   Resolution: Fixed

Issue resolved by pull request 40265
[https://github.com/apache/spark/pull/40265]

> Dataset.colregex should link a plan_id when it only matches a single column.
> 
>
> Key: SPARK-42556
> URL: https://issues.apache.org/jira/browse/SPARK-42556
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
> Fix For: 3.4.0
>
>
> When colregex returns a single column it should link the plans plan_id. For 
> reference here is the non-connect Dataset code that does this:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512]
> This also needs to be fixed for the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696397#comment-17696397
 ] 

Herman van Hövell commented on SPARK-42562:
---

It currently generates unique names, but it doesn't need to. I think we should 
remove that.

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42669) Short circuit local relation rpcs

2023-03-03 Thread Jira
Herman van Hövell created SPARK-42669:
-

 Summary: Short circuit local relation rpcs
 Key: SPARK-42669
 URL: https://issues.apache.org/jira/browse/SPARK-42669
 Project: Spark
  Issue Type: New Feature
  Components: Connect
Affects Versions: 3.4.0
Reporter: Herman van Hövell


Operations on LocalRelation can mostly be done locally (without sending rpcs). 
We should leverage this.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"

2023-03-03 Thread jiang13021 (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696395#comment-17696395
 ] 

jiang13021 commented on SPARK-42552:


The problem may be in this location: 
[https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/parser/ParseDriver.scala#L126]

When the `PredictionMode` is `SLL`, `AstBuilder` will throw `ParseException` 
instead of `ParseCancellationException`,so the parser doesn't try `LL` mode. In 
fact, if we use `LL` mode, we can parse the sql correctly.

> Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
> ---
>
> Key: SPARK-42552
> URL: https://issues.apache.org/jira/browse/SPARK-42552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
> Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_345)
> Spark version 3.2.3-SNAPSHOT
>Reporter: jiang13021
>Priority: Major
> Fix For: 3.2.3
>
>
> When I run sql
> {code:java}
> scala> spark.sql("SELECT 1 UNION SELECT 1;") {code}
> I get ParseException:
> {code:java}
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL ==
> SELECT 1 UNION SELECT 1;
> ---^^^  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
>   ... 47 elided
>  {code}
> If I run with parentheses , it works well 
> {code:java}
> scala> spark.sql("(SELECT 1) UNION (SELECT 1);") 
> res4: org.apache.spark.sql.DataFrame = [1: int]{code}
> This should be a bug
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696392#comment-17696392
 ] 

Apache Spark commented on SPARK-42630:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40276

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42630) Make `parse_data_type` use new proto message `DDLParse`

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42630?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696391#comment-17696391
 ] 

Apache Spark commented on SPARK-42630:
--

User 'ueshin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40276

> Make `parse_data_type` use new proto message `DDLParse`
> ---
>
> Key: SPARK-42630
> URL: https://issues.apache.org/jira/browse/SPARK-42630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.4.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42555.
---
Fix Version/s: 3.4.1
 Assignee: jiaan.geng
   Resolution: Fixed

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: jiaan.geng
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-03 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696390#comment-17696390
 ] 

jiaan.geng commented on SPARK-42562:


[~hvanhovell]I don't understand this issue. Could you tell me where 
UnresolvedLambdaVariables need unique names in python ?

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-03 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696390#comment-17696390
 ] 

jiaan.geng edited comment on SPARK-42562 at 3/4/23 2:59 AM:


[~hvanhovell] I don't understand this issue. Could you tell me where 
UnresolvedLambdaVariables need unique names in python ?


was (Author: beliefer):
[~hvanhovell]I don't understand this issue. Could you tell me where 
UnresolvedLambdaVariables need unique names in python ?

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42552) Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"

2023-03-03 Thread jiang13021 (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

jiang13021 updated SPARK-42552:
---
Priority: Major  (was: Minor)

> Get ParseException when run sql: "SELECT 1 UNION SELECT 1;"
> ---
>
> Key: SPARK-42552
> URL: https://issues.apache.org/jira/browse/SPARK-42552
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
> Environment: Scala version 2.12.15 (OpenJDK 64-Bit Server VM, Java 
> 1.8.0_345)
> Spark version 3.2.3-SNAPSHOT
>Reporter: jiang13021
>Priority: Major
> Fix For: 3.2.3
>
>
> When I run sql
> {code:java}
> scala> spark.sql("SELECT 1 UNION SELECT 1;") {code}
> I get ParseException:
> {code:java}
> org.apache.spark.sql.catalyst.parser.ParseException:
> mismatched input 'SELECT' expecting {, ';'}(line 1, pos 15)== SQL ==
> SELECT 1 UNION SELECT 1;
> ---^^^  at 
> org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:266)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:127)
>   at 
> org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:51)
>   at 
> org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:77)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$2(SparkSession.scala:616)
>   at 
> org.apache.spark.sql.catalyst.QueryPlanningTracker.measurePhase(QueryPlanningTracker.scala:111)
>   at org.apache.spark.sql.SparkSession.$anonfun$sql$1(SparkSession.scala:616)
>   at org.apache.spark.sql.SparkSession.withActive(SparkSession.scala:775)
>   at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:613)
>   ... 47 elided
>  {code}
> If I run with parentheses , it works well 
> {code:java}
> scala> spark.sql("(SELECT 1) UNION (SELECT 1);") 
> res4: org.apache.spark.sql.DataFrame = [1: int]{code}
> This should be a bug
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42557 ]


jiaan.geng deleted comment on SPARK-42557:


was (Author: beliefer):
I will take a look!

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42557:


Assignee: (was: Apache Spark)

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42557:


Assignee: Apache Spark

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Assignee: Apache Spark
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696387#comment-17696387
 ] 

Apache Spark commented on SPARK-42557:
--

User 'beliefer' has created a pull request for this issue:
https://github.com/apache/spark/pull/40275

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42562) UnresolvedLambdaVariables in python do not need unique names

2023-03-03 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696386#comment-17696386
 ] 

jiaan.geng commented on SPARK-42562:


I will take a look!

> UnresolvedLambdaVariables in python do not need unique names
> 
>
> Key: SPARK-42562
> URL: https://issues.apache.org/jira/browse/SPARK-42562
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> UnresolvedLambdaVariables do not need unique names in python. We already did 
> this for the scala client, and it is good to have parity between the two 
> implementations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42563) Implement SparkSession.newSession

2023-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42563.
---
Resolution: Duplicate

> Implement SparkSession.newSession
> -
>
> Key: SPARK-42563
> URL: https://issues.apache.org/jira/browse/SPARK-42563
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement SparkSession.newSession for Connect.
> {code:java}
> /**
>  * Start a new session with isolated SQL configurations, temporary tables, 
> registered
>  * functions are isolated, but sharing the underlying `SparkContext` and 
> cached data.
>  *
>  * @note Other than the `SparkContext`, all shared state is initialized 
> lazily.
>  * This method will force the initialization of the shared state to ensure 
> that parent
>  * and child sessions are set up with the same shared state. If the 
> underlying catalog
>  * implementation is Hive, this will initialize the metastore, which may take 
> some time.
>  *
>  * @since 2.0.0
>  */
> def newSession(): SparkSession {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42467) Spark Connect Scala Client: GroupBy and Aggregation

2023-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42467.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> Spark Connect Scala Client: GroupBy and Aggregation
> ---
>
> Key: SPARK-42467
> URL: https://issues.apache.org/jira/browse/SPARK-42467
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell updated SPARK-42667:
--
Epic Link: SPARK-42554

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42667.
---
Fix Version/s: 3.4.1
   Resolution: Fixed

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
> Fix For: 3.4.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42175) Implement more methods in the Scala Client Dataset API

2023-03-03 Thread Zhen Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Li resolved SPARK-42175.
-
Resolution: Duplicate

> Implement more methods in the Scala Client Dataset API
> --
>
> Key: SPARK-42175
> URL: https://issues.apache.org/jira/browse/SPARK-42175
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> Also fix the TODOs in the MiMa compatibility test. 
> https://github.com/apache/spark/pull/39712



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42215) Better Scala Client Integration test

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42215:


Assignee: (was: Apache Spark)

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42215) Better Scala Client Integration test

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696378#comment-17696378
 ] 

Apache Spark commented on SPARK-42215:
--

User 'zhenlineo' has created a pull request for this issue:
https://github.com/apache/spark/pull/40274

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42215) Better Scala Client Integration test

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42215?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42215:


Assignee: Apache Spark

> Better Scala Client Integration test
> 
>
> Key: SPARK-42215
> URL: https://issues.apache.org/jira/browse/SPARK-42215
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Zhen Li
>Assignee: Apache Spark
>Priority: Major
>
> The current Scala client has a few integration tests that requires a build 
> first before running client tests. This is not very nice to maven developers 
> as they will not be able to do a `mvn clean install` to run all tests.
>  
> Look into marking these test as ITs and other better ways for maven to run 
> test after packages are built.
>  
> Make sure the test run in SBT as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696374#comment-17696374
 ] 

Apache Spark commented on SPARK-42668:
--

User 'anishshri-db' has created a pull request for this issue:
https://github.com/apache/spark/pull/40273

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42668:


Assignee: (was: Apache Spark)

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42668:


Assignee: Apache Spark

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Assignee: Apache Spark
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Anish Shrigondekar (Jira)
Anish Shrigondekar created SPARK-42668:
--

 Summary: Catch exception while trying to close compressed stream 
in HDFSStateStoreProvider abort
 Key: SPARK-42668
 URL: https://issues.apache.org/jira/browse/SPARK-42668
 Project: Spark
  Issue Type: Task
  Components: Structured Streaming
Affects Versions: 3.4.0
Reporter: Anish Shrigondekar


Catch exception while trying to close compressed stream in 
HDFSStateStoreProvider abort

We have seen some cases where the task exits as cancelled/failed which triggers 
the abort in the task completion listener for HDFSStateStoreProvider. As part 
of this, we cancel the backing stream and close the compressed stream. However, 
different stores such as Azure blob store could throw exceptions which are not 
caught in the current path, leading to job failures. This change proposes to 
fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42668) Catch exception while trying to close compressed stream in HDFSStateStoreProvider abort

2023-03-03 Thread Anish Shrigondekar (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42668?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696368#comment-17696368
 ] 

Anish Shrigondekar commented on SPARK-42668:


Will send out the fix soon

 

cc - [~kabhwan] 

> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> ---
>
> Key: SPARK-42668
> URL: https://issues.apache.org/jira/browse/SPARK-42668
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 3.4.0
>Reporter: Anish Shrigondekar
>Priority: Major
>
> Catch exception while trying to close compressed stream in 
> HDFSStateStoreProvider abort
> We have seen some cases where the task exits as cancelled/failed which 
> triggers the abort in the task completion listener for 
> HDFSStateStoreProvider. As part of this, we cancel the backing stream and 
> close the compressed stream. However, different stores such as Azure blob 
> store could throw exceptions which are not caught in the current path, 
> leading to job failures. This change proposes to fix this issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696350#comment-17696350
 ] 

Apache Spark commented on SPARK-42667:
--

User 'amaliujia' has created a pull request for this issue:
https://github.com/apache/spark/pull/40272

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42667:


Assignee: Rui Wang  (was: Apache Spark)

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42667:


Assignee: Apache Spark  (was: Rui Wang)

> Spark Connect: newSession API
> -
>
> Key: SPARK-42667
> URL: https://issues.apache.org/jira/browse/SPARK-42667
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Rui Wang
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42667) Spark Connect: newSession API

2023-03-03 Thread Rui Wang (Jira)
Rui Wang created SPARK-42667:


 Summary: Spark Connect: newSession API
 Key: SPARK-42667
 URL: https://issues.apache.org/jira/browse/SPARK-42667
 Project: Spark
  Issue Type: Task
  Components: Connect
Affects Versions: 3.4.1
Reporter: Rui Wang
Assignee: Rui Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42666) Fix `createDataFrame` to work properly with rows and schema

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42666:

Summary: Fix `createDataFrame` to work properly with rows and schema  (was: 
Fix `createDataFrame` to work properly)

> Fix `createDataFrame` to work properly with rows and schema
> ---
>
> Key: SPARK-42666
> URL: https://issues.apache.org/jira/browse/SPARK-42666
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The code below is not working properly in Spark Connect:
> {code:java}
> >>> sdf = spark.range(10)
> >>> spark.createDataFrame(sdf.tail(5), sdf.schema) 
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in 
> __repr__
>     return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in 
> dtypes
>     return [(str(f.name), f.dataType.simpleString()) for f in 
> self.schema.fields]
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in 
> schema
>     self._schema = self._session.client.schema(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema
>     proto_schema = self._analyze(method="schema", plan=plan).schema
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in 
> _analyze
>     self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in 
> _handle_error
>     raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas working properly in regular PySpark:
> {code:java}
> >>> sdf = spark.range(10)
> >>> spark.createDataFrame(sdf.tail(5), sdf.schema).show()
> +---+
> | id|
> +---+
> |  5|
> |  6|
> |  7|
> |  8|
> |  9|
> +---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42666) Fix `createDataFrame` to work properly

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42666:

Summary: Fix `createDataFrame` to work properly  (was: Fix `tail` to work 
properly)

> Fix `createDataFrame` to work properly
> --
>
> Key: SPARK-42666
> URL: https://issues.apache.org/jira/browse/SPARK-42666
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> The code below is not working properly in Spark Connect:
> {code:java}
> >>> sdf = spark.range(10)
> >>> spark.createDataFrame(sdf.tail(5), sdf.schema) 
> Traceback (most recent call last):
>   File "", line 1, in 
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in 
> __repr__
>     return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in 
> dtypes
>     return [(str(f.name), f.dataType.simpleString()) for f in 
> self.schema.fields]
>   File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in 
> schema
>     self._schema = self._session.client.schema(query)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema
>     proto_schema = self._analyze(method="schema", plan=plan).schema
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in 
> _analyze
>     self._handle_error(rpc_error)
>   File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in 
> _handle_error
>     raise convert_exception(info, status.message) from None
> pyspark.errors.exceptions.connect.AnalysisException: 
> [NULLABLE_COLUMN_OR_FIELD] Column or field `id` is nullable while it's 
> required to be non-nullable.{code}
> whereas working properly in regular PySpark:
> {code:java}
> >>> sdf = spark.range(10)
> >>> spark.createDataFrame(sdf.tail(5), sdf.schema).show()
> +---+
> | id|
> +---+
> |  5|
> |  6|
> |  7|
> |  8|
> |  9|
> +---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696317#comment-17696317
 ] 

Apache Spark commented on SPARK-42662:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40270

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42666) Fix `tail` to work properly

2023-03-03 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42666:
---

 Summary: Fix `tail` to work properly
 Key: SPARK-42666
 URL: https://issues.apache.org/jira/browse/SPARK-42666
 Project: Spark
  Issue Type: Sub-task
  Components: Connect
Affects Versions: 3.5.0
Reporter: Haejoon Lee


The code below is not working properly in Spark Connect:
{code:java}
>>> sdf = spark.range(10)
>>> spark.createDataFrame(sdf.tail(5), sdf.schema) 
Traceback (most recent call last):
  File "", line 1, in 
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 94, in 
__repr__
    return "DataFrame[%s]" % (", ".join("%s: %s" % c for c in self.dtypes))
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 162, in dtypes
    return [(str(f.name), f.dataType.simpleString()) for f in 
self.schema.fields]
  File "/.../spark/python/pyspark/sql/connect/dataframe.py", line 1346, in 
schema
    self._schema = self._session.client.schema(query)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 614, in schema
    proto_schema = self._analyze(method="schema", plan=plan).schema
  File "/.../spark/python/pyspark/sql/connect/client.py", line 755, in _analyze
    self._handle_error(rpc_error)
  File "/.../spark/python/pyspark/sql/connect/client.py", line 894, in 
_handle_error
    raise convert_exception(info, status.message) from None
pyspark.errors.exceptions.connect.AnalysisException: [NULLABLE_COLUMN_OR_FIELD] 
Column or field `id` is nullable while it's required to be non-nullable.{code}
whereas working properly in regular PySpark:
{code:java}
>>> sdf = spark.range(10)
>>> spark.createDataFrame(sdf.tail(5), sdf.schema).show()
+---+
| id|
+---+
|  5|
|  6|
|  7|
|  8|
|  9|
+---+ {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42662:


Assignee: (was: Apache Spark)

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696314#comment-17696314
 ] 

Apache Spark commented on SPARK-42662:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40270

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42662:


Assignee: Apache Spark

> Support `withSequenceColumn` as PySpark DataFrame internal function.
> 
>
> Key: SPARK-42662
> URL: https://issues.apache.org/jira/browse/SPARK-42662
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark, PySpark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> Turn `withSequenceColumn` into PySpark internal API to support the 
> distributed-sequence index of the pandas API on Spark in Spark Connect as 
> well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42665) `simple udf` test failed using Maven

2023-03-03 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42665:
-
Description: 
{code:java}
simple udf *** FAILED ***
  io.grpc.StatusRuntimeException: INTERNAL: 
org.apache.spark.sql.ClientE2ETestSuite
  at io.grpc.Status.asRuntimeException(Status.java:535)
  at 
io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
  at 
org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
  at 
org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
  at 
org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
  at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
  at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
  at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
  at 
org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
  at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
{code}

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42665) `simple udf` test failed using Maven

2023-03-03 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42665:
-
Attachment: (was: image-2023-03-04-01-41-51-522.png)

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
>
> {code:java}
> simple udf *** FAILED ***
>   io.grpc.StatusRuntimeException: INTERNAL: 
> org.apache.spark.sql.ClientE2ETestSuite
>   at io.grpc.Status.asRuntimeException(Status.java:535)
>   at 
> io.grpc.stub.ClientCalls$BlockingResponseStream.hasNext(ClientCalls.java:660)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.org$apache$spark$sql$connect$client$SparkResult$$processResponses(SparkResult.scala:61)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.length(SparkResult.scala:106)
>   at 
> org.apache.spark.sql.connect.client.SparkResult.toArray(SparkResult.scala:123)
>   at org.apache.spark.sql.Dataset.$anonfun$collect$1(Dataset.scala:2426)
>   at org.apache.spark.sql.Dataset.withResult(Dataset.scala:2747)
>   at org.apache.spark.sql.Dataset.collect(Dataset.scala:2425)
>   at 
> org.apache.spark.sql.ClientE2ETestSuite.$anonfun$new$8(ClientE2ETestSuite.scala:85)
>   at scala.runtime.java8.JFunction0$mcV$sp.apply(JFunction0$mcV$sp.java:23) 
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42665) `simple udf` test failed using Maven

2023-03-03 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-42665:
-
Attachment: image-2023-03-04-01-41-51-522.png

> `simple udf` test failed using Maven 
> -
>
> Key: SPARK-42665
> URL: https://issues.apache.org/jira/browse/SPARK-42665
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Yang Jie
>Priority: Major
> Attachments: image-2023-03-04-01-41-51-522.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42665) `simple udf` test failed using Maven

2023-03-03 Thread Yang Jie (Jira)
Yang Jie created SPARK-42665:


 Summary: `simple udf` test failed using Maven 
 Key: SPARK-42665
 URL: https://issues.apache.org/jira/browse/SPARK-42665
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.4.0
Reporter: Yang Jie
 Attachments: image-2023-03-04-01-41-51-522.png





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42258) pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42258:


Assignee: Apache Spark

> pyspark.sql.functions should not expose typing.cast
> ---
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Furcy Pin
>Assignee: Apache Spark
>Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the 
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
> which executes without any problem, gives the following result:
>  
>  
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>  
> which indeed gives:
> {code:java}
> root
>  |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be 
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with 
> `typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42258) pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42258:


Assignee: (was: Apache Spark)

> pyspark.sql.functions should not expose typing.cast
> ---
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Furcy Pin
>Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the 
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
> which executes without any problem, gives the following result:
>  
>  
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>  
> which indeed gives:
> {code:java}
> root
>  |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be 
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with 
> `typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42258) pyspark.sql.functions should not expose typing.cast

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696237#comment-17696237
 ] 

Apache Spark commented on SPARK-42258:
--

User 'FurcyPin' has created a pull request for this issue:
https://github.com/apache/spark/pull/40271

> pyspark.sql.functions should not expose typing.cast
> ---
>
> Key: SPARK-42258
> URL: https://issues.apache.org/jira/browse/SPARK-42258
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.3.1
>Reporter: Furcy Pin
>Priority: Minor
>
> In pyspark, the `pyspark.sql.functions` modules imports and exposes the 
> method `typing.cast`.
> This may lead to errors from users that can be hard to spot.
> *Example*
> It took me a few minutes to understand why the following code:
>  
> {code:java}
> from pyspark.sql import SparkSession
> from pyspark.sql import functions as f
> spark = SparkSession.builder.getOrCreate()
> df = spark.sql("""SELECT 1 as a""")
> df.withColumn("a", f.cast("STRING", f.col("a"))).printSchema()  {code}
> which executes without any problem, gives the following result:
>  
>  
> {code:java}
> root
> |-- a: integer (nullable = false){code}
> This is because `f.cast` here calls `typing.cast, and the correct syntax is:
> {code:java}
> df.withColumn("a", f.col("a").cast("STRING")).printSchema(){code}
>  
> which indeed gives:
> {code:java}
> root
>  |-- a: string (nullable = false) {code}
> *Suggestion of solution*
> Option 1: The methods imported in the module `pyspark.sql.functions` could be 
> obfuscated to prevent this. For instance:
> {code:java}
> from typing import cast as _cast{code}
> Option 2: only import `typing` and replace all occurrences of `cast` with 
> `typing.cast`



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix `default_session` to work properly

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Description: 
Currently, default_session is not working properly in Spark Connect as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}

  was:
Currently, default_session is not working properly in Spark Connect as below 
since `SparkSession.conf.get` is nor working as expected:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}


> Fix `default_session` to work properly
> --
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix `default_session` to work properly

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Summary: Fix `default_session` to work properly  (was: Fix `default_session 
` to work properly)

> Fix `default_session` to work properly
> --
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect as below 
> since `SparkSession.conf.get` is nor working as expected:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix `default_session ` to work properly

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Summary: Fix `default_session ` to work properly  (was: Fix 
`SparkSession.conf.get` to work properly)

> Fix `default_session ` to work properly
> ---
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect as below 
> since `SparkSession.conf.get` is nor working as expected:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42664) Support bloomFilter for DataFrameStatFunctions

2023-03-03 Thread Yang Jie (Jira)
Yang Jie created SPARK-42664:


 Summary: Support bloomFilter for DataFrameStatFunctions
 Key: SPARK-42664
 URL: https://issues.apache.org/jira/browse/SPARK-42664
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 3.4.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix `SparkSession.conf.get` to work properly

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Description: 
Currently, default_session is not working properly in Spark Connect as below 
since `SparkSession.conf.get` is nor working as expected:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}

  was:
Currently, default_session is not working properly in Spark Connect:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}


> Fix `SparkSession.conf.get` to work properly
> 
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect as below 
> since `SparkSession.conf.get` is nor working as expected:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix `SparkSession.conf.get` to work properly

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Summary: Fix `SparkSession.conf.get` to work properly  (was: Fix 
default_session to work properly in Spark Connect)

> Fix `SparkSession.conf.get` to work properly
> 
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix default_session to work properly in Spark Connect

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Description: 
Currently, default_session is not working properly in Spark Connect:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}

  was:
Currently, default_session is not working properly in Spark Connect:

 
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}


> Fix default_session to work properly in Spark Connect
> -
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42663) Fix default_session to work properly in Spark Connect

2023-03-03 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42663:
---

 Summary: Fix default_session to work properly in Spark Connect
 Key: SPARK-42663
 URL: https://issues.apache.org/jira/browse/SPARK-42663
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, ps
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Currently, default_session is not working properly in Spark Connect:
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
It should work as expected in regular PySpark as below:
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42663) Fix default_session to work properly in Spark Connect

2023-03-03 Thread Haejoon Lee (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-42663:

Description: 
Currently, default_session is not working properly in Spark Connect:

 
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
{code}
It should work as expected in regular PySpark as below:
{code:java}
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'{code}

  was:
Currently, default_session is not working properly in Spark Connect:
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
Traceback (most recent call last):
...
pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
(java.util.NoSuchElementException) default_index_type
It should work as expected in regular PySpark as below:
>>> spark = default_session()
>>> spark.conf.set("default_index_type", "sequence")
>>> spark.conf.get("default_index_type")
'sequence'
>>>
>>> spark = default_session()
>>> spark.conf.get("default_index_type")
'sequence'


> Fix default_session to work properly in Spark Connect
> -
>
> Key: SPARK-42663
> URL: https://issues.apache.org/jira/browse/SPARK-42663
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, ps
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> Currently, default_session is not working properly in Spark Connect:
>  
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> Traceback (most recent call last):
> ...
> pyspark.errors.exceptions.connect.SparkConnectGrpcException: 
> (java.util.NoSuchElementException) default_index_type
> {code}
> It should work as expected in regular PySpark as below:
> {code:java}
> >>> spark = default_session()
> >>> spark.conf.set("default_index_type", "sequence")
> >>> spark.conf.get("default_index_type")
> 'sequence'
> >>>
> >>> spark = default_session()
> >>> spark.conf.get("default_index_type")
> 'sequence'{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42497:


Assignee: (was: Apache Spark)

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696206#comment-17696206
 ] 

Apache Spark commented on SPARK-42497:
--

User 'itholic' has created a pull request for this issue:
https://github.com/apache/spark/pull/40270

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42497) Support of pandas API on Spark for Spark Connect.

2023-03-03 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-42497:


Assignee: Apache Spark

> Support of pandas API on Spark for Spark Connect.
> -
>
> Key: SPARK-42497
> URL: https://issues.apache.org/jira/browse/SPARK-42497
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>
> We should enable `pandas API on Spark` on Spark Connect.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42662) Support `withSequenceColumn` as PySpark DataFrame internal function.

2023-03-03 Thread Haejoon Lee (Jira)
Haejoon Lee created SPARK-42662:
---

 Summary: Support `withSequenceColumn` as PySpark DataFrame 
internal function.
 Key: SPARK-42662
 URL: https://issues.apache.org/jira/browse/SPARK-42662
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Pandas API on Spark, PySpark
Affects Versions: 3.5.0
Reporter: Haejoon Lee


Turn `withSequenceColumn` into PySpark internal API to support the 
distributed-sequence index of the pandas API on Spark in Spark Connect as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696177#comment-17696177
 ] 

Apache Spark commented on SPARK-42500:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40268

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42500) ConstantPropagation support more cases

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696176#comment-17696176
 ] 

Apache Spark commented on SPARK-42500:
--

User 'peter-toth' has created a pull request for this issue:
https://github.com/apache/spark/pull/40268

> ConstantPropagation support more cases
> --
>
> Key: SPARK-42500
> URL: https://issues.apache.org/jira/browse/SPARK-42500
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Yuming Wang
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0

2023-03-03 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-42648:
-
Priority: Trivial  (was: Major)

> Upgrade versions-maven-plugin to 2.15.0
> ---
>
> Key: SPARK-42648
> URL: https://issues.apache.org/jira/browse/SPARK-42648
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Trivial
> Fix For: 3.5.0
>
>
> https://github.com/mojohaus/versions/releases/tag/2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0

2023-03-03 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-42648:


Assignee: Yang Jie

> Upgrade versions-maven-plugin to 2.15.0
> ---
>
> Key: SPARK-42648
> URL: https://issues.apache.org/jira/browse/SPARK-42648
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>
> https://github.com/mojohaus/versions/releases/tag/2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42648) Upgrade versions-maven-plugin to 2.15.0

2023-03-03 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-42648.
--
Fix Version/s: 3.5.0
   Resolution: Fixed

Issue resolved by pull request 40248
[https://github.com/apache/spark/pull/40248]

> Upgrade versions-maven-plugin to 2.15.0
> ---
>
> Key: SPARK-42648
> URL: https://issues.apache.org/jira/browse/SPARK-42648
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.5.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 3.5.0
>
>
> https://github.com/mojohaus/versions/releases/tag/2.15.0



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-03 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696144#comment-17696144
 ] 

Apache Spark commented on SPARK-42653:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/40267

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.4.1
>
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42653) Artifact transfer from Scala/JVM client to Server

2023-03-03 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-42653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Herman van Hövell resolved SPARK-42653.
---
Fix Version/s: 3.4.1
 Assignee: Venkata Sai Akhil Gudesa
   Resolution: Fixed

> Artifact transfer from Scala/JVM client to Server
> -
>
> Key: SPARK-42653
> URL: https://issues.apache.org/jira/browse/SPARK-42653
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Venkata Sai Akhil Gudesa
>Assignee: Venkata Sai Akhil Gudesa
>Priority: Major
> Fix For: 3.4.1
>
>
> In the decoupled client-server architecture of Spark Connect, a remote client 
> may use a local JAR or a new class in their UDF that may not be present on 
> the server. To handle these cases of missing "artifacts", we need to 
> implement a mechanism to transfer artifacts from the client side over to the 
> server side as per the protocol defined in 
> https://github.com/apache/spark/pull/40147 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42558) Implement DataFrameStatFunctions

2023-03-03 Thread Jira


[ 
https://issues.apache.org/jira/browse/SPARK-42558?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696123#comment-17696123
 ] 

Herman van Hövell commented on SPARK-42558:
---

[~LuciferYang] that is fine for now. We can add support for BloomFilters and 
CMS later.

> Implement DataFrameStatFunctions
> 
>
> Key: SPARK-42558
> URL: https://issues.apache.org/jira/browse/SPARK-42558
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Implement DataFrameStatFunctions for connect, and hook it up to Dataset.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42661) CSV Reader - multiline without quoted fields

2023-03-03 Thread Florian FERREIRA (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florian FERREIRA updated SPARK-42661:
-
Attachment: Capture d’écran 2023-03-03 à 12.18.07.png

> CSV Reader - multiline without quoted fields
> 
>
> Key: SPARK-42661
> URL: https://issues.apache.org/jira/browse/SPARK-42661
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1
> Environment: unquoted data
> {code}
> NAME,Address,CITY
> Atlassian,Level 6 341 George Street
> Sydney NSW 2000 Australia,Sydney
> Github,88 Colin P Kelly Junior Street
> San Francisco CA 94107 USA,San Francisco
> {code}
> quoted data : 
> {code}
> "NAME","Address","CITY"
> "Atlassian","Level 6 341 George Street
> Sydney NSW 2000 Australia","Sydney"
> "Github","88 Colin P Kelly Junior Street
> San Francisco CA 94107 USA","San Francisco"
> {code}
>Reporter: Florian FERREIRA
>Priority: Minor
> Attachments: Capture d’écran 2023-03-03 à 12.18.07.png
>
>
> Hello,
> We are facing an issue with the CSV format.
> When we try to read a "multiline file without quoted fields" the expected 
> result is not good.
> With quoted fields, all is ok. ( cf the screenshot ) 
> You can reproduce it easily with this code (just replace file path ) :
> {code:java}
> spark.read.options(Map(
> "multiline" -> "true",
> "quote" -> "",
> "header" -> "true",
>   )).csv("/Users/fferreira/correct_multiline.csv").show(false)
> spark.read.options(Map(
> "multiline" -> "true",
> "header" -> "true",  
> )).csv("/Users/fferreira/correct_multiline_with_quote.csv").show(false)
> {code}
> We continue to investigate on our side.
> Thanks you.
> !image-2023-03-03-12-11-21-258.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-42661) CSV Reader - multiline without quoted fields

2023-03-03 Thread Florian FERREIRA (Jira)
Florian FERREIRA created SPARK-42661:


 Summary: CSV Reader - multiline without quoted fields
 Key: SPARK-42661
 URL: https://issues.apache.org/jira/browse/SPARK-42661
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.3.1
 Environment: unquoted data
{code}
NAME,Address,CITY
Atlassian,Level 6 341 George Street
Sydney NSW 2000 Australia,Sydney
Github,88 Colin P Kelly Junior Street
San Francisco CA 94107 USA,San Francisco
{code}

quoted data : 
{code}
"NAME","Address","CITY"
"Atlassian","Level 6 341 George Street
Sydney NSW 2000 Australia","Sydney"
"Github","88 Colin P Kelly Junior Street
San Francisco CA 94107 USA","San Francisco"
{code}
Reporter: Florian FERREIRA


Hello,

We are facing an issue with the CSV format.
When we try to read a "multiline file without quoted fields" the expected 
result is not good.

With quoted fields, all is ok. ( cf the screenshot ) 

You can reproduce it easily with this code (just replace file path ) :
{code:java}
spark.read.options(Map(
"multiline" -> "true",
"quote" -> "",
"header" -> "true",
  )).csv("/Users/fferreira/correct_multiline.csv").show(false)

spark.read.options(Map(
"multiline" -> "true",
"header" -> "true",  
)).csv("/Users/fferreira/correct_multiline_with_quote.csv").show(false)
{code}
We continue to investigate on our side.

Thanks you.

!image-2023-03-03-12-11-21-258.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42661) CSV Reader - multiline without quoted fields

2023-03-03 Thread Florian FERREIRA (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Florian FERREIRA updated SPARK-42661:
-
Priority: Minor  (was: Major)

> CSV Reader - multiline without quoted fields
> 
>
> Key: SPARK-42661
> URL: https://issues.apache.org/jira/browse/SPARK-42661
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1
> Environment: unquoted data
> {code}
> NAME,Address,CITY
> Atlassian,Level 6 341 George Street
> Sydney NSW 2000 Australia,Sydney
> Github,88 Colin P Kelly Junior Street
> San Francisco CA 94107 USA,San Francisco
> {code}
> quoted data : 
> {code}
> "NAME","Address","CITY"
> "Atlassian","Level 6 341 George Street
> Sydney NSW 2000 Australia","Sydney"
> "Github","88 Colin P Kelly Junior Street
> San Francisco CA 94107 USA","San Francisco"
> {code}
>Reporter: Florian FERREIRA
>Priority: Minor
>
> Hello,
> We are facing an issue with the CSV format.
> When we try to read a "multiline file without quoted fields" the expected 
> result is not good.
> With quoted fields, all is ok. ( cf the screenshot ) 
> You can reproduce it easily with this code (just replace file path ) :
> {code:java}
> spark.read.options(Map(
> "multiline" -> "true",
> "quote" -> "",
> "header" -> "true",
>   )).csv("/Users/fferreira/correct_multiline.csv").show(false)
> spark.read.options(Map(
> "multiline" -> "true",
> "header" -> "true",  
> )).csv("/Users/fferreira/correct_multiline_with_quote.csv").show(false)
> {code}
> We continue to investigate on our side.
> Thanks you.
> !image-2023-03-03-12-11-21-258.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-42557) Add Broadcast to functions

2023-03-03 Thread jiaan.geng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-42557?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17696097#comment-17696097
 ] 

jiaan.geng commented on SPARK-42557:


I will take a look!

> Add Broadcast to functions
> --
>
> Key: SPARK-42557
> URL: https://issues.apache.org/jira/browse/SPARK-42557
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> Add the {{broadcast}} function to functions.scala. Please check if we can get 
> the same semantics as the current implementation using unresolved hints.
> https://github.com/apache/spark/blame/master/sql/core/src/main/scala/org/apache/spark/sql/functions.scala#L1246-L1261



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42555) Add JDBC to DataFrameReader

2023-03-03 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42555 ]


jiaan.geng deleted comment on SPARK-42555:


was (Author: beliefer):
I will take a look!

> Add JDBC to DataFrameReader
> ---
>
> Key: SPARK-42555
> URL: https://issues.apache.org/jira/browse/SPARK-42555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] (SPARK-42556) Dataset.colregex should link a plan_id when it only matches a single column.

2023-03-03 Thread jiaan.geng (Jira)


[ https://issues.apache.org/jira/browse/SPARK-42556 ]


jiaan.geng deleted comment on SPARK-42556:


was (Author: beliefer):
I'm working on.

> Dataset.colregex should link a plan_id when it only matches a single column.
> 
>
> Key: SPARK-42556
> URL: https://issues.apache.org/jira/browse/SPARK-42556
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.4.0
>Reporter: Herman van Hövell
>Priority: Major
>
> When colregex returns a single column it should link the plans plan_id. For 
> reference here is the non-connect Dataset code that does this:
> [https://github.com/apache/spark/blob/master/sql/core/src/main/scala/org/apache/spark/sql/Dataset.scala#L1512]
> This also needs to be fixed for the Python client.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org