[jira] [Updated] (SPARK-47896) Upgrade netty to `4.1.109.Final`

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47896?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47896:
---
Labels: pull-request-available  (was: )

> Upgrade netty to `4.1.109.Final`
> 
>
> Key: SPARK-47896
> URL: https://issues.apache.org/jira/browse/SPARK-47896
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47172) Upgrade Transport block cipher mode to GCM

2024-04-17 Thread Mridul Muralidharan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838474#comment-17838474
 ] 

Mridul Muralidharan edited comment on SPARK-47172 at 4/18/24 5:47 AM:
--

We do not backport features to released versions - so TLS will be in 4.x, not 
3.x
Given the security implications for  SPARK-47318, it was backported to 3.4 and 
3.5 - as it was fixing a security issue in existing functionality.

This proposal reads like a new feature development, which would typically be 
out of scope for 3.x
Given TLS, it would not very useful for 4.x either ?


was (Author: mridulm80):
We do not backport features to released versions - so TLS will be in 4.x, not 
3.x
Given the security implications for  SPARK-47318, it was backported to 3.4 and 
3.5 - as it was fixing a security issue in existing functionality.

This proposal reads like a new feature development, while would be out of scope 
for 3.x
Given TLS, not very useful for 4.x either ?

> Upgrade Transport block cipher mode to GCM
> --
>
> Key: SPARK-47172
> URL: https://issues.apache.org/jira/browse/SPARK-47172
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.4.2, 3.5.0
>Reporter: Steve Weis
>Priority: Minor
>
> The cipher transformation currently used for encrypting RPC calls is an 
> unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an 
> authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being 
> modified in transit.
> The relevant line is here: 
> [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220]
> GCM is relatively more computationally expensive than CTR and adds a 16-byte 
> block of authentication tag data to each payload. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47591) Hive-thriftserver: Migrate logInfo with variables to structured logging framework

2024-04-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47591.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45926
[https://github.com/apache/spark/pull/45926]

> Hive-thriftserver: Migrate logInfo with variables to structured logging 
> framework
> -
>
> Key: SPARK-47591
> URL: https://issues.apache.org/jira/browse/SPARK-47591
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47896) Upgrade netty to `4.1.109.Final`

2024-04-17 Thread BingKun Pan (Jira)
BingKun Pan created SPARK-47896:
---

 Summary: Upgrade netty to `4.1.109.Final`
 Key: SPARK-47896
 URL: https://issues.apache.org/jira/browse/SPARK-47896
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47172) Upgrade Transport block cipher mode to GCM

2024-04-17 Thread Mridul Muralidharan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838474#comment-17838474
 ] 

Mridul Muralidharan commented on SPARK-47172:
-

We do not backport features to released versions - so TLS will be in 4.x, not 
3.x
Given the security implications for  SPARK-47318, it was backported to 3.4 and 
3.5 - as it was fixing a security issue in existing functionality.

This proposal reads like a new feature development, while would be out of scope 
for 3.x
Given TLS, not very useful for 4.x either ?

> Upgrade Transport block cipher mode to GCM
> --
>
> Key: SPARK-47172
> URL: https://issues.apache.org/jira/browse/SPARK-47172
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.4.2, 3.5.0
>Reporter: Steve Weis
>Priority: Minor
>
> The cipher transformation currently used for encrypting RPC calls is an 
> unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an 
> authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being 
> modified in transit.
> The relevant line is here: 
> [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220]
> GCM is relatively more computationally expensive than CTR and adds a 16-byte 
> block of authentication tag data to each payload. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47895) group by all should be idempotent

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47895:
---
Labels: pull-request-available  (was: )

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47895) group by all should be idempotent

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47895?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan updated SPARK-47895:

Summary: group by all should be idempotent  (was: group by ordinal should 
be idempotent)

> group by all should be idempotent
> -
>
> Key: SPARK-47895
> URL: https://issues.apache.org/jira/browse/SPARK-47895
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47895) group by ordinal should be idempotent

2024-04-17 Thread Wenchen Fan (Jira)
Wenchen Fan created SPARK-47895:
---

 Summary: group by ordinal should be idempotent
 Key: SPARK-47895
 URL: https://issues.apache.org/jira/browse/SPARK-47895
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Wenchen Fan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47882:
-

Assignee: Kent Yao

> createTableColumnTypes need to be mapped to database types instead of using 
> directly
> 
>
> Key: SPARK-47882
> URL: https://issues.apache.org/jira/browse/SPARK-47882
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47882) createTableColumnTypes need to be mapped to database types instead of using directly

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47882?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47882.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46093
[https://github.com/apache/spark/pull/46093]

> createTableColumnTypes need to be mapped to database types instead of using 
> directly
> 
>
> Key: SPARK-47882
> URL: https://issues.apache.org/jira/browse/SPARK-47882
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47894) Add `Environment` page to Master UI

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47894.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46111
[https://github.com/apache/spark/pull/46111]

> Add `Environment` page to Master UI
> ---
>
> Key: SPARK-47894
> URL: https://issues.apache.org/jira/browse/SPARK-47894
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47894) Add `Environment` page to Master UI

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47894:
-

Assignee: Dongjoon Hyun

> Add `Environment` page to Master UI
> ---
>
> Key: SPARK-47894
> URL: https://issues.apache.org/jira/browse/SPARK-47894
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47839:
---

Assignee: Kelvin Jiang

> Fix Aggregate bug in RewriteWithExpression
> --
>
> Key: SPARK-47839
> URL: https://issues.apache.org/jira/browse/SPARK-47839
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> The following query will fail:
> {code:SQL}
> SELECT NULLIF(id + 1, 1)
> from range(10)
> group by id
> {code}
> This is because {{NullIf}} gets rewritten to {{With}}, then 
> {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of 
> the aggregate, resulting in an invalid plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47839) Fix Aggregate bug in RewriteWithExpression

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47839?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47839.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46034
[https://github.com/apache/spark/pull/46034]

> Fix Aggregate bug in RewriteWithExpression
> --
>
> Key: SPARK-47839
> URL: https://issues.apache.org/jira/browse/SPARK-47839
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The following query will fail:
> {code:SQL}
> SELECT NULLIF(id + 1, 1)
> from range(10)
> group by id
> {code}
> This is because {{NullIf}} gets rewritten to {{With}}, then 
> {{RewriteWithExpression}} tries to pull common expression {{id + 1}} out of 
> the aggregate, resulting in an invalid plan.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47846) Add support for Variant schema in from_json

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47846.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46046
[https://github.com/apache/spark/pull/46046]

> Add support for Variant schema in from_json
> ---
>
> Key: SPARK-47846
> URL: https://issues.apache.org/jira/browse/SPARK-47846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Adding support for the variant type in the from_json expression.
> "select from_json('', 'variant')" should interpret json_string 
> as a variant type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47846) Add support for Variant schema in from_json

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47846?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47846:
---

Assignee: Harsh Motwani

> Add support for Variant schema in from_json
> ---
>
> Key: SPARK-47846
> URL: https://issues.apache.org/jira/browse/SPARK-47846
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Adding support for the variant type in the from_json expression.
> "select from_json('', 'variant')" should interpret json_string 
> as a variant type.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47429) Rename errorClass to errorCondition

2024-04-17 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838412#comment-17838412
 ] 

BingKun Pan edited comment on SPARK-47429 at 4/18/24 1:26 AM:
--

This will be a very huge task, and I roughly counted almost 4k+ places where 
the variable `errorClass` is used

!image-2024-04-18-09-26-04-493.png|width=543,height=32!

But this consistent of terms that follow by SQL standards is really very great!

 


was (Author: panbingkun):
This will be a very huge task, and I roughly counted almost 4k+ places where 
the variable `errorClass` is used

!image-2024-04-18-09-26-04-493.png|width=543,height=32!

But this consistent of terms that follow by SQL standards is really very great!

 

 

> Rename errorClass to errorCondition
> ---
>
> Key: SPARK-47429
> URL: https://issues.apache.org/jira/browse/SPARK-47429
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
> Attachments: image-2024-04-18-09-26-04-493.png
>
>
> We've agreed on the parent task to rename {{errorClass}} to align it more 
> closely with the SQL standard, and take advantage of the opportunity to break 
> backwards compatibility offered by the Spark version change from 3.5 to 4.0.
> This ticket also covers renaming {{subClass}} as well.
> This is a subtask so the changes are in their own PR and easier to review 
> apart from other things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-47429) Rename errorClass to errorCondition

2024-04-17 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838412#comment-17838412
 ] 

BingKun Pan edited comment on SPARK-47429 at 4/18/24 1:26 AM:
--

This will be a very huge task, and I roughly counted almost 4k+ places where 
the variable `errorClass` is used

!image-2024-04-18-09-26-04-493.png|width=543,height=32!

But this consistent of terms that follow by SQL standards is really very great!

 

 


was (Author: panbingkun):
This will be a very huge task, and I roughly counted almost 4k+ places where 
the variable `errorClass` is used

!image-2024-04-18-09-22-20-736.png|width=680,height=42!

But this consistent of terms that follow by SQL standards is really very great!

 

> Rename errorClass to errorCondition
> ---
>
> Key: SPARK-47429
> URL: https://issues.apache.org/jira/browse/SPARK-47429
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
> Attachments: image-2024-04-18-09-26-04-493.png
>
>
> We've agreed on the parent task to rename {{errorClass}} to align it more 
> closely with the SQL standard, and take advantage of the opportunity to break 
> backwards compatibility offered by the Spark version change from 3.5 to 4.0.
> This ticket also covers renaming {{subClass}} as well.
> This is a subtask so the changes are in their own PR and easier to review 
> apart from other things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47429) Rename errorClass to errorCondition

2024-04-17 Thread BingKun Pan (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47429?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838412#comment-17838412
 ] 

BingKun Pan commented on SPARK-47429:
-

This will be a very huge task, and I roughly counted almost 4k+ places where 
the variable `errorClass` is used

!image-2024-04-18-09-22-20-736.png|width=680,height=42!

But this consistent of terms that follow by SQL standards is really very great!

 

> Rename errorClass to errorCondition
> ---
>
> Key: SPARK-47429
> URL: https://issues.apache.org/jira/browse/SPARK-47429
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Nicholas Chammas
>Priority: Minor
>
> We've agreed on the parent task to rename {{errorClass}} to align it more 
> closely with the SQL standard, and take advantage of the opportunity to break 
> backwards compatibility offered by the Spark version change from 3.5 to 4.0.
> This ticket also covers renaming {{subClass}} as well.
> This is a subtask so the changes are in their own PR and easier to review 
> apart from other things.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47891) Improve docstring of mapInPandas

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47891.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46108
[https://github.com/apache/spark/pull/46108]

> Improve docstring of mapInPandas
> 
>
> Key: SPARK-47891
> URL: https://issues.apache.org/jira/browse/SPARK-47891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Assignee: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Improve docstring of mapInPandas
>  * "using a Python native function that takes and outputs a pandas DataFrame" 
> is confusing cause the function takes and outputs "ITERATOR of pandas 
> DataFrames" instead.
>  * "All columns are passed together as an iterator of pandas DataFrames" 
> easily mislead users to think the entire DataFrame will be passed together, 
> "a batch of rows" is used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47894) Add `Environment` page to Master UI

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47894?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47894:
---
Labels: pull-request-available  (was: )

> Add `Environment` page to Master UI
> ---
>
> Key: SPARK-47894
> URL: https://issues.apache.org/jira/browse/SPARK-47894
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47894) Add `Environment` page to Master UI

2024-04-17 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47894:
-

 Summary: Add `Environment` page to Master UI
 Key: SPARK-47894
 URL: https://issues.apache.org/jira/browse/SPARK-47894
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Core, Web UI
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47892) XML: Stop ignoring CDATA within rows.

2024-04-17 Thread Yousof Hosny (Jira)
Yousof Hosny created SPARK-47892:


 Summary: XML: Stop ignoring CDATA within rows. 
 Key: SPARK-47892
 URL: https://issues.apache.org/jira/browse/SPARK-47892
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Yousof Hosny
 Fix For: 4.0.0


This change ignores CDATA within row tags as well as outside of it. We should 
only ignore CDATA found outside of row tags as they are considered data within 
the row.
[https://github.com/apache/spark/pull/45487]

 

NOTE: With the current parser implementation, after not ignoring CDATA elements 
within row tags there remains the edge case of a matching closing row tag 
within CDATA which will be parsed as a valid end tag. 
Example:
{code:java}
  {code}
after no longer ignoring CDATA within rows, the closing tag in the example 
above will be matched by the parser which is incorrect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47891) Improve docstring of mapInPandas

2024-04-17 Thread Xinrong Meng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xinrong Meng updated SPARK-47891:
-
Description: 
Improve docstring of mapInPandas
 * "using a Python native function that takes and outputs a pandas DataFrame" 
is confusing cause the function takes and outputs "ITERATOR of pandas 
DataFrames" instead.
 * "All columns are passed together as an iterator of pandas DataFrames" easily 
mislead users to think the entire DataFrame will be passed together, "a batch 
of rows" is used instead.

  was:Improve docstring of mapInPandas


> Improve docstring of mapInPandas
> 
>
> Key: SPARK-47891
> URL: https://issues.apache.org/jira/browse/SPARK-47891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInPandas
>  * "using a Python native function that takes and outputs a pandas DataFrame" 
> is confusing cause the function takes and outputs "ITERATOR of pandas 
> DataFrames" instead.
>  * "All columns are passed together as an iterator of pandas DataFrames" 
> easily mislead users to think the entire DataFrame will be passed together, 
> "a batch of rows" is used instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47891) Improve docstring of mapInPandas

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47891:
---
Labels: pull-request-available  (was: )

> Improve docstring of mapInPandas
> 
>
> Key: SPARK-47891
> URL: https://issues.apache.org/jira/browse/SPARK-47891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Xinrong Meng
>Priority: Major
>  Labels: pull-request-available
>
> Improve docstring of mapInPandas



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47172) Upgrade Transport block cipher mode to GCM

2024-04-17 Thread Steve Weis (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47172?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838380#comment-17838380
 ] 

Steve Weis commented on SPARK-47172:


[~mridulm80] What about 3.x? If we backported TLS support, that would be a 
better option. I mentioned this before and it sounded like there was not 
support for backporting TLS at this time.

> Upgrade Transport block cipher mode to GCM
> --
>
> Key: SPARK-47172
> URL: https://issues.apache.org/jira/browse/SPARK-47172
> Project: Spark
>  Issue Type: Improvement
>  Components: Security
>Affects Versions: 3.4.2, 3.5.0
>Reporter: Steve Weis
>Priority: Minor
>
> The cipher transformation currently used for encrypting RPC calls is an 
> unauthenticated mode (AES/CTR/NoPadding). This needs to be upgraded to an 
> authenticated mode (AES/GCM/NoPadding) to prevent ciphertext from being 
> modified in transit.
> The relevant line is here: 
> [https://github.com/apache/spark/blob/a939a7d0fd9c6b23c879cbee05275c6fbc939e38/common/network-common/src/main/java/org/apache/spark/network/util/TransportConf.java#L220]
> GCM is relatively more computationally expensive than CTR and adds a 16-byte 
> block of authentication tag data to each payload. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47889) Setup gradle as build tool for operator repository

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47889:
---
Labels: pull-request-available  (was: )

> Setup gradle as build tool for operator repository
> --
>
> Key: SPARK-47889
> URL: https://issues.apache.org/jira/browse/SPARK-47889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: kubernetes-operator-0.1.0
>Reporter: Zhou JIANG
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47889) Setup gradle as build tool for operator repository

2024-04-17 Thread Zhou JIANG (Jira)
Zhou JIANG created SPARK-47889:
--

 Summary: Setup gradle as build tool for operator repository
 Key: SPARK-47889
 URL: https://issues.apache.org/jira/browse/SPARK-47889
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: kubernetes-operator-0.1.0
Reporter: Zhou JIANG






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47584) SQL core: Migrate logWarn with variables to structured logging framework

2024-04-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47584?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47584.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46057
[https://github.com/apache/spark/pull/46057]

> SQL core: Migrate logWarn with variables to structured logging framework
> 
>
> Key: SPARK-47584
> URL: https://issues.apache.org/jira/browse/SPARK-47584
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47627) MERGE with WITH SCHEMA EVOLUTION keywords

2024-04-17 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47627?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47627.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45748
[https://github.com/apache/spark/pull/45748]

> MERGE with WITH SCHEMA EVOLUTION keywords
> -
>
> Key: SPARK-47627
> URL: https://issues.apache.org/jira/browse/SPARK-47627
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Pengfei Xu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47360) Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck (all collations)

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47360?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47360.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46003
[https://github.com/apache/spark/pull/46003]

> Overlay, FormatString, Length, BitLength, OctetLength, SoundEx, Luhncheck 
> (all collations)
> --
>
> Key: SPARK-47360
> URL: https://issues.apache.org/jira/browse/SPARK-47360
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Nikola Mandic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47726) Document push-based shuffle metrics

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47726.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45872
[https://github.com/apache/spark/pull/45872]

> Document push-based shuffle metrics
> ---
>
> Key: SPARK-47726
> URL: https://issues.apache.org/jira/browse/SPARK-47726
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This is to add documentation for the metrics related to push-based shuffle. 
> It's a follow up documentation ticket from: 
> https://issues.apache.org/jira/browse/SPARK-36620
> Related to this, note also: https://issues.apache.org/jira/browse/SPARK-42203



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47726) Document push-based shuffle metrics

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47726:
-

Assignee: Luca Canali

> Document push-based shuffle metrics
> ---
>
> Key: SPARK-47726
> URL: https://issues.apache.org/jira/browse/SPARK-47726
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>
> This is to add documentation for the metrics related to push-based shuffle. 
> It's a follow up documentation ticket from: 
> https://issues.apache.org/jira/browse/SPARK-36620
> Related to this, note also: https://issues.apache.org/jira/browse/SPARK-42203



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47416) Add benchmark for stringpredicate expressions

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47416?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47416.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46078
[https://github.com/apache/spark/pull/46078]

> Add benchmark for stringpredicate expressions
> -
>
> Key: SPARK-47416
> URL: https://issues.apache.org/jira/browse/SPARK-47416
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47886) Postgres: Add test and doc for Postgres special numeric values

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47886?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47886.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46102
[https://github.com/apache/spark/pull/46102]

> Postgres: Add test and doc for Postgres special numeric values
> --
>
> Key: SPARK-47886
> URL: https://issues.apache.org/jira/browse/SPARK-47886
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47887) Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto`

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47887:
---
Labels: pull-request-available  (was: )

> Remove unused import `spark/connect/common.proto` from 
> `spark/connect/relations.proto`
> --
>
> Key: SPARK-47887
> URL: https://issues.apache.org/jira/browse/SPARK-47887
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>
> fix compile waring:
>  
> {code:java}
> spark/connect/relations.proto:26:1: warning: Import 
> spark/connect/common.proto is unused. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47887) Remove unused import `spark/connect/common.proto` from `spark/connect/relations.proto`

2024-04-17 Thread Yang Jie (Jira)
Yang Jie created SPARK-47887:


 Summary: Remove unused import `spark/connect/common.proto` from 
`spark/connect/relations.proto`
 Key: SPARK-47887
 URL: https://issues.apache.org/jira/browse/SPARK-47887
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Yang Jie


fix compile waring:

 
{code:java}
spark/connect/relations.proto:26:1: warning: Import spark/connect/common.proto 
is unused. {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47830) Reeanble ResourceProfileTests for pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47830?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47830.
--
Fix Version/s: 4.0.0
 Assignee: Hyukjin Kwon
   Resolution: Fixed

fixed in https://github.com/apache/spark/pull/46090

> Reeanble ResourceProfileTests for pyspark-connect
> -
>
> Key: SPARK-47830
> URL: https://issues.apache.org/jira/browse/SPARK-47830
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47885.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46100
[https://github.com/apache/spark/pull/46100]

> Make pyspark.resource compatible with pyspark-connect
> -
>
> Key: SPARK-47885
> URL: https://issues.apache.org/jira/browse/SPARK-47885
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47540) SPIP: Pure Python Package (Spark Connect)

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47540?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47540.
--
Fix Version/s: 4.0.0
 Assignee: Hyukjin Kwon
   Resolution: Done

> SPIP: Pure Python Package (Spark Connect)
> -
>
> Key: SPARK-47540
> URL: https://issues.apache.org/jira/browse/SPARK-47540
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Critical
> Fix For: 4.0.0
>
>
> *Q1. What are you trying to do? Articulate your objectives using absolutely 
> no jargon.*
> As part of the [Spark 
> Connect|https://spark.apache.org/docs/latest/spark-connect-overview.html] 
> development, we have introduced Scala and Python clients. While the Scala 
> client is already provided as a separate library and is available in Maven, 
> the Python client is not. This proposal aims for end users to install the 
> pure Python package for Spark Connect by using pip install pyspark-connect.
> The pure Python package contains only Python source code without jars, which 
> reduces the size of the package significantly and widens the use cases of 
> PySpark. See also [Introducing Spark Connect - The Power of Apache Spark, 
> Everywhere'|https://www.databricks.com/blog/2022/07/07/introducing-spark-connect-the-power-of-apache-spark-everywhere.html].
> *Q2. What problem is this proposal NOT designed to solve?*
> This proposal does not aim to Change existing PySpark package, e.g., pip 
> install pyspark is not affected
> - Implement full compatibility with classic PySpark, e.g., implementing RDD 
> API
> - Address how to launch Spark Connect server. Spark Connect server is 
> launched by users themselves
> - Local mode. Without launching Spark Connect server, users cannot use this 
> package.
> - [Official release channel|https://spark.apache.org/downloads.html] is not 
> affected but only PyPI.
> *Q3. How is it done today, and what are the limits of current practice?*
> Currently, we run pip install pyspark, and it is over 300MB because of 
> dependent jars. In addition, PySpark requires you to set up other 
> environments such as JDK installation.
> This is not suitable when the running environment and resource is limited 
> such as edge devices such as smart home devices.
> Requiring a non-Python environment is not Python friendly.
> *Q4. What is new in your approach and why do you think it will be successful?*
> It provides a pure Python library, which eliminates other environment 
> requirements such as JDK, and reduces the resource usage by decoupling Spark 
> Driver, and reduces the package size.
> *Q5. Who cares? If you are successful, what difference will it make?*
> Users who want to leverage Spark in the limited environment, and want to 
> decouple running JVM with Spark Driver to run Spark as a Service. They can 
> simply pip install pyspark-connect that does not require other dependencies 
> (except Python dependencies just like other Python libraries). 
> *Q6. What are the risks?*
> Because we do not change the existing PySpark package, I do not see any major 
> risk in classic PySpark itself. We will reuse the same Python source, and 
> therefore we should make sure no Py4J is used, and no JVM access is made. 
> This requirement might confuse the developers. At the very least, we should 
> add the dedicated CI to make sure the pure Python package works.
> *Q7. How long will it take?*
> I expect around one month including CI set up. In fact, the prototype is 
> ready so I expect this to be done sooner.
> *Q8. What are the mid-term and final “exams” to check for success?*
> The mid-term goal is to set up a scheduled CI job that builds the pure Python 
> library, and runs all the tests against them.
> The final goral would be to properly test end-to-end usecase from pip 
> installation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47351) Between

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47351:
-
Summary: Between  (was: TBD)

> Between
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47408) Distinct

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47408?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47408:
-
Summary: Distinct  (was: TBD)

> Distinct
> 
>
> Key: SPARK-47408
> URL: https://issues.apache.org/jira/browse/SPARK-47408
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47352) Fix Upper, Lower, InitCap collation awareness

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47352?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47352:
---
Labels: pull-request-available  (was: )

> Fix Upper, Lower, InitCap collation awareness
> -
>
> Key: SPARK-47352
> URL: https://issues.apache.org/jira/browse/SPARK-47352
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47863.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46097
[https://github.com/apache/spark/pull/46097]

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47863:
---

Assignee: Vladimir Golubev

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Assignee: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47421) TBD

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47421:
-
Summary: TBD  (was: Split, SplitPart (binary & lowercase collation only))

> TBD
> ---
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47353) NullIf

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47353:
-
Summary: NullIf  (was: TBD)

> NullIf
> --
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47355) Min & Max

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47355?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47355:
-
Summary: Min & Max  (was: TBD)

> Min & Max
> -
>
> Key: SPARK-47355
> URL: https://issues.apache.org/jira/browse/SPARK-47355
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47354) Case

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47354:
-
Summary: Case  (was: TBD)

> Case
> 
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47421) Coalesce

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47421:
-
Summary: Coalesce  (was: TBD)

> Coalesce
> 
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47350) SplitPart (binary & lowercase collation only)

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47350:
-
Summary: SplitPart (binary & lowercase collation only)  (was: SplitPart 
(binary & lowercase collation))

> SplitPart (binary & lowercase collation only)
> -
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47350) SplitPart (binary & lowercase collation)

2024-04-17 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47350?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47350:
-
Summary: SplitPart (binary & lowercase collation)  (was: TBD)

> SplitPart (binary & lowercase collation)
> 
>
> Key: SPARK-47350
> URL: https://issues.apache.org/jira/browse/SPARK-47350
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47884) Switch ANSI SQL CI job to NON-ANSI SQL CI job

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47884.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46099
[https://github.com/apache/spark/pull/46099]

> Switch ANSI SQL CI job to NON-ANSI SQL CI job
> -
>
> Key: SPARK-47884
> URL: https://issues.apache.org/jira/browse/SPARK-47884
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47864:
--

Assignee: Apache Spark

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47864:
--

Assignee: (was: Apache Spark)

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47864:
--

Assignee: Apache Spark

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47864:
--

Assignee: (was: Apache Spark)

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47885?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47885:
---
Labels: pull-request-available  (was: )

> Make pyspark.resource compatible with pyspark-connect
> -
>
> Key: SPARK-47885
> URL: https://issues.apache.org/jira/browse/SPARK-47885
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47807) Make pyspark.ml compatible with pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47807:
-
Summary: Make pyspark.ml compatible with pyspark-connect  (was: Make 
pyspark.ml compatible witbh pyspark-connect)

> Make pyspark.ml compatible with pyspark-connect
> ---
>
> Key: SPARK-47807
> URL: https://issues.apache.org/jira/browse/SPARK-47807
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47885) Make pyspark.resource compatible with pyspark-connect

2024-04-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47885:


 Summary: Make pyspark.resource compatible with pyspark-connect
 Key: SPARK-47885
 URL: https://issues.apache.org/jira/browse/SPARK-47885
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47884) Switch ANSI SQL CI job to NON-ANSI SQL CI job

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47884:
--
Summary: Switch ANSI SQL CI job to NON-ANSI SQL CI job  (was: Switch ANSI 
SQL CI to NON-ANSI SQL CI)

> Switch ANSI SQL CI job to NON-ANSI SQL CI job
> -
>
> Key: SPARK-47884
> URL: https://issues.apache.org/jira/browse/SPARK-47884
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47884) Switch ANSI SQL CI job to NON-ANSI SQL CI job

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47884:
-

Assignee: Dongjoon Hyun

> Switch ANSI SQL CI job to NON-ANSI SQL CI job
> -
>
> Key: SPARK-47884
> URL: https://issues.apache.org/jira/browse/SPARK-47884
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47884) Switch ANSI SQL CI to NON-ANSI SQL CI

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47884?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47884:
---
Labels: pull-request-available  (was: )

> Switch ANSI SQL CI to NON-ANSI SQL CI
> -
>
> Key: SPARK-47884
> URL: https://issues.apache.org/jira/browse/SPARK-47884
> Project: Spark
>  Issue Type: Sub-task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47884) Switch ANSI SQL CI to NON-ANSI SQL CI

2024-04-17 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47884:
-

 Summary: Switch ANSI SQL CI to NON-ANSI SQL CI
 Key: SPARK-47884
 URL: https://issues.apache.org/jira/browse/SPARK-47884
 Project: Spark
  Issue Type: Sub-task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44444) Use ANSI mode by default

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-4.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46013
[https://github.com/apache/spark/pull/46013]

> Use ANSI mode by default
> 
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To avoid data issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44444) Use ANSI SQL mode by default

2024-04-17 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-4:
--
Summary: Use ANSI SQL mode by default  (was: Use ANSI mode by default)

> Use ANSI SQL mode by default
> 
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Yuming Wang
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> To avoid data issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47822) Prohibit Hash expressions from hashing Variant type

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47822:
---

Assignee: Harsh Motwani

> Prohibit Hash expressions from hashing Variant type
> ---
>
> Key: SPARK-47822
> URL: https://issues.apache.org/jira/browse/SPARK-47822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
>
> Prohibiting Hash functions from being applied on the Variant type. This is 
> because they haven't been implemented on the variant type and crash during 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47822) Prohibit Hash expressions from hashing Variant type

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47822?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47822.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46017
[https://github.com/apache/spark/pull/46017]

> Prohibit Hash expressions from hashing Variant type
> ---
>
> Key: SPARK-47822
> URL: https://issues.apache.org/jira/browse/SPARK-47822
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Harsh Motwani
>Assignee: Harsh Motwani
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Prohibiting Hash functions from being applied on the Variant type. This is 
> because they haven't been implemented on the variant type and crash during 
> execution.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47821) Add is_variant_null expression

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47821.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46011
[https://github.com/apache/spark/pull/46011]

> Add is_variant_null expression
> --
>
> Key: SPARK-47821
> URL: https://issues.apache.org/jira/browse/SPARK-47821
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Richard Chen
>Assignee: Richard Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> adds a `is_variant_null` expression, which returns whether a given variant 
> value represents a variant null (note the difference between a variant null 
> and an engine null)



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47867) Support Variant in JSON scan.

2024-04-17 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47867?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47867.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46071
[https://github.com/apache/spark/pull/46071]

> Support Variant in JSON scan.
> -
>
> Key: SPARK-47867
> URL: https://issues.apache.org/jira/browse/SPARK-47867
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47863) endsWith and startsWith don't work correctly for some collations

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47863?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47863:
---
Labels: pull-request-available  (was: )

> endsWith and startsWith don't work correctly for some collations
> 
>
> Key: SPARK-47863
> URL: https://issues.apache.org/jira/browse/SPARK-47863
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Vladimir Golubev
>Priority: Major
>  Labels: pull-request-available
>
> *CollationSupport.EndsWIth* and *CollationSupport.StartsWith* use 
> {*}CollationAwareUTF8String.matchAt{*}, which operates byte offsets to 
> compare prefixes/suffixes. This is not correct, since sometimes string parts 
> (suffix/prefix) of different lengths are actually equal in context of 
> case-insensitive and lower-case collations.
> Example test cases that highlight the problem:
> {{{}- *assertContains("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testContains{*}.{}}} 
> {{{}- *assertEndsWith("The İo", "i̇o", "UNICODE_CI", true);* for 
> *CollationSupportSuite.*{}}}{{{}{*}testEndsWith{*}.{}}}
> {{The first passes, since it uses *StringSearch* directly, the second one 
> does not.}}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47864) Enhance "Installation" page to cover all installable options

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47864?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47864:
---
Labels: pull-request-available  (was: )

> Enhance "Installation" page to cover all installable options
> 
>
> Key: SPARK-47864
> URL: https://issues.apache.org/jira/browse/SPARK-47864
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> Like Installation page from Pandas, we might need to cover all installable 
> options with related dependencies from our Installation documentation.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47883) Make CollectTailExec execute lazily

2024-04-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47883:
---
Labels: pull-request-available  (was: )

> Make CollectTailExec execute lazily 
> 
>
> Key: SPARK-47883
> URL: https://issues.apache.org/jira/browse/SPARK-47883
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47883) Make CollectTailExec execute lazily

2024-04-17 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng updated SPARK-47883:
--
Summary: Make CollectTailExec execute lazily   (was: Make CollectTailExec 
lazily execute)

> Make CollectTailExec execute lazily 
> 
>
> Key: SPARK-47883
> URL: https://issues.apache.org/jira/browse/SPARK-47883
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47883) Make CollectTailExec lazily execute

2024-04-17 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-47883:
-

 Summary: Make CollectTailExec lazily execute
 Key: SPARK-47883
 URL: https://issues.apache.org/jira/browse/SPARK-47883
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org