[jira] [Assigned] (SPARK-47210) Implicit casting on collated expressions

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47210:
--

Assignee: Apache Spark

> Implicit casting on collated expressions
> 
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
> - if there are explicit collations and all of them are equal then that 
> collation will be the output
> - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
> comparison operations, and we should be able to combine different implicit 
> collations for certain operations like concat and possible others in the 
> future.
> This is why I had to add another predefined collation id named 
> `INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
> conflicting non-default implicit collations. Right now it has an id of -1 so 
> it fails if it ever goes to the `CollatorFactory`.
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47210) Implicit casting on collated expressions

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47210:
--

Assignee: (was: Apache Spark)

> Implicit casting on collated expressions
> 
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
> - if there are explicit collations and all of them are equal then that 
> collation will be the output
> - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
> comparison operations, and we should be able to combine different implicit 
> collations for certain operations like concat and possible others in the 
> future.
> This is why I had to add another predefined collation id named 
> `INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
> conflicting non-default implicit collations. Right now it has an id of -1 so 
> it fails if it ever goes to the `CollatorFactory`.
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47621) Refine docstring of `try_sum`, `try_avg`, `avg`, `sum`, `mean`

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47621:
---
Labels: pull-request-available  (was: )

> Refine docstring of `try_sum`, `try_avg`, `avg`, `sum`, `mean`
> --
>
> Key: SPARK-47621
> URL: https://issues.apache.org/jira/browse/SPARK-47621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47625) Addition of Indeterminate Collation Support

2024-03-28 Thread Mihailo Milosevic (Jira)
Mihailo Milosevic created SPARK-47625:
-

 Summary: Addition of Indeterminate Collation Support
 Key: SPARK-47625
 URL: https://issues.apache.org/jira/browse/SPARK-47625
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47626) Addition for Map Implicit Casting of Collated Strings

2024-03-28 Thread Mihailo Milosevic (Jira)
Mihailo Milosevic created SPARK-47626:
-

 Summary: Addition for Map Implicit Casting of Collated Strings
 Key: SPARK-47626
 URL: https://issues.apache.org/jira/browse/SPARK-47626
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47625) Addition of Indeterminate Collation Support

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47625:
--
Description: 
{{INDETERMINATE_COLLATION}} should only be thrown on comparison operations and 
memory storing of data, and we should be able to combine different implicit 
collations for certain operations like concat and possible others in the future.
This is why we have to add another predefined collation id named 
{{INDETERMINATE_COLLATION_ID}} which means that the result is a combination of 
conflicting non-default implicit collations. Right now it would an id of -1 so 
it fail if it ever goes to the {{{}CollatorFactory{}}}.

> Addition of Indeterminate Collation Support
> ---
>
> Key: SPARK-47625
> URL: https://issues.apache.org/jira/browse/SPARK-47625
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>
> {{INDETERMINATE_COLLATION}} should only be thrown on comparison operations 
> and memory storing of data, and we should be able to combine different 
> implicit collations for certain operations like concat and possible others in 
> the future.
> This is why we have to add another predefined collation id named 
> {{INDETERMINATE_COLLATION_ID}} which means that the result is a combination 
> of conflicting non-default implicit collations. Right now it would an id of 
> -1 so it fail if it ever goes to the {{{}CollatorFactory{}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47409) StringTrim & StringTrimLeft/Right/Both (all collations)

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47409:
---
Labels: pull-request-available  (was: )

> StringTrim & StringTrimLeft/Right/Both (all collations)
> ---
>
> Key: SPARK-47409
> URL: https://issues.apache.org/jira/browse/SPARK-47409
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTrim* built-in string function in 
> Spark (including {*}StringTrimBoth{*}, {*}StringTrimLeft{*}, 
> {*}StringTrimRight{*}). First confirm what is the expected behaviour for 
> these functions when given collated strings, and then move on to 
> implementation and testing. One way to go about this is to consider using 
> {_}StringSearch{_}, an efficient ICU service for string matching. Implement 
> the corresponding unit tests (CollationStringExpressionsSuite) and E2E tests 
> (CollationSuite) to reflect how this function should be used with collation 
> in SparkSQL, and feel free to use your chosen Spark SQL Editor to experiment 
> with the existing functions to learn more about how they work. In addition, 
> look into the possible use-cases and implementation of similar functions 
> within other other open-source DBMS, such as 
> [PostgreSQL|[https://www.postgresql.org/docs/]].
>  
> The goal for this Jira ticket is to implement the *StringTrim* function so it 
> supports all collation types currently supported in Spark. To understand what 
> changes were introduced in order to enable full collation support for other 
> existing functions in Spark, take a look at the Spark PRs and Jira tickets 
> for completed tasks in this parent (for example: Contains, StartsWith, 
> EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47626) Addition for Map Implicit Casting of Collated Strings

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47626:
--
Description: Initial ticket for addition of collation implicit casting 
SPARK-47210 introduced support for casting of arrays and normal string types. 
This ticket needs to dive into the problem of casting MapType.  (was: Initial 
PR for addition of collation implicit casting [SPARK-47210] introduced support 
for casting of arrays and normal string types.)

> Addition for Map Implicit Casting of Collated Strings
> -
>
> Key: SPARK-47626
> URL: https://issues.apache.org/jira/browse/SPARK-47626
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>
> Initial ticket for addition of collation implicit casting SPARK-47210 
> introduced support for casting of arrays and normal string types. This ticket 
> needs to dive into the problem of casting MapType.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47210) Implicit casting on collated expressions

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47210:
--
Epic Link: (was: SPARK-46830)

> Implicit casting on collated expressions
> 
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
> - if there are explicit collations and all of them are equal then that 
> collation will be the output
> - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
> comparison operations, and we should be able to combine different implicit 
> collations for certain operations like concat and possible others in the 
> future.
> This is why I had to add another predefined collation id named 
> `INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
> conflicting non-default implicit collations. Right now it has an id of -1 so 
> it fails if it ever goes to the `CollatorFactory`.
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47210) Addition of implicit casting without indeterminate support

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47210:
--
Summary: Addition of implicit casting without indeterminate support  (was: 
Implicit casting on collated expressions)

> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
> - if there are explicit collations and all of them are equal then that 
> collation will be the output
> - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
> comparison operations, and we should be able to combine different implicit 
> collations for certain operations like concat and possible others in the 
> future.
> This is why I had to add another predefined collation id named 
> `INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
> conflicting non-default implicit collations. Right now it has an id of -1 so 
> it fails if it ever goes to the `CollatorFactory`.
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47210) Implicit casting on collated expressions

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47210:
--
Parent: SPARK-47624
Issue Type: Sub-task  (was: Improvement)

> Implicit casting on collated expressions
> 
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
> - if there are explicit collations and all of them are equal then that 
> collation will be the output
> - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
> - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
> - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
> comparison operations, and we should be able to combine different implicit 
> collations for certain operations like concat and possible others in the 
> future.
> This is why I had to add another predefined collation id named 
> `INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
> conflicting non-default implicit collations. Right now it has an id of -1 so 
> it fails if it ever goes to the `CollatorFactory`.
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47210) Addition of implicit casting without indeterminate support

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47210?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47210:
--
Description: 
*What changes were proposed in this pull request?*
This PR adds automatic casting and collations resolution as per `PGSQL` 
behaviour:

1. Collations set on the metadata level are implicit
2. Collations set using the `COLLATE` expression are explicit
3. When there is a combination of expressions of multiple collations the output 
will be:
 - if there are explicit collations and all of them are equal then that 
collation will be the output
 - if there are multiple different explicit collations 
`COLLATION_MISMATCH.EXPLICIT` will be thrown
 - if there are no explicit collations and only a single type of non default 
collation, that one will be used
 - if there are no explicit collations and multiple non-default implicit ones 
`COLLATION_MISMATCH.IMPLICIT` will be thrown

*Why are the changes needed?*
We need to be able to compare columns and values with different collations and 
set a way of explicitly changing the collation we want to use.

  was:
*What changes were proposed in this pull request?*
This PR adds automatic casting and collations resolution as per `PGSQL` 
behaviour:

1. Collations set on the metadata level are implicit
2. Collations set using the `COLLATE` expression are explicit
3. When there is a combination of expressions of multiple collations the output 
will be:
- if there are explicit collations and all of them are equal then that 
collation will be the output
- if there are multiple different explicit collations 
`COLLATION_MISMATCH.EXPLICIT` will be thrown
- if there are no explicit collations and only a single type of non default 
collation, that one will be used
- if there are no explicit collations and multiple non-default implicit ones 
`COLLATION_MISMATCH.IMPLICIT` will be thrown


Another thing is that `INDETERMINATE_COLLATION` should only be thrown on 
comparison operations, and we should be able to combine different implicit 
collations for certain operations like concat and possible others in the future.
This is why I had to add another predefined collation id named 
`INDETERMINATE_COLLATION_ID` which means that the result is a combination of 
conflicting non-default implicit collations. Right now it has an id of -1 so it 
fails if it ever goes to the `CollatorFactory`.


*Why are the changes needed?*
We need to be able to compare columns and values with different collations and 
set a way of explicitly changing the collation we want to use.


> Addition of implicit casting without indeterminate support
> --
>
> Key: SPARK-47210
> URL: https://issues.apache.org/jira/browse/SPARK-47210
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>
> *What changes were proposed in this pull request?*
> This PR adds automatic casting and collations resolution as per `PGSQL` 
> behaviour:
> 1. Collations set on the metadata level are implicit
> 2. Collations set using the `COLLATE` expression are explicit
> 3. When there is a combination of expressions of multiple collations the 
> output will be:
>  - if there are explicit collations and all of them are equal then that 
> collation will be the output
>  - if there are multiple different explicit collations 
> `COLLATION_MISMATCH.EXPLICIT` will be thrown
>  - if there are no explicit collations and only a single type of non default 
> collation, that one will be used
>  - if there are no explicit collations and multiple non-default implicit ones 
> `COLLATION_MISMATCH.IMPLICIT` will be thrown
> *Why are the changes needed?*
> We need to be able to compare columns and values with different collations 
> and set a way of explicitly changing the collation we want to use.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47624) Collation Implict Casting Support

2024-03-28 Thread Mihailo Milosevic (Jira)
Mihailo Milosevic created SPARK-47624:
-

 Summary: Collation Implict Casting Support
 Key: SPARK-47624
 URL: https://issues.apache.org/jira/browse/SPARK-47624
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Mihailo Milosevic






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47626) Addition for Map Implicit Casting of Collated Strings

2024-03-28 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47626?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47626:
--
Description: Initial PR for addition of collation implicit casting 
[SPARK-47210] introduced support for casting of arrays and normal string types.

> Addition for Map Implicit Casting of Collated Strings
> -
>
> Key: SPARK-47626
> URL: https://issues.apache.org/jira/browse/SPARK-47626
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>
> Initial PR for addition of collation implicit casting [SPARK-47210] 
> introduced support for casting of arrays and normal string types.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47622) Spark creates lot of tiny blocks for a single driverLog file of size less than a dfs.blocksize

2024-03-28 Thread Srinivasu Majeti (Jira)
Srinivasu Majeti created SPARK-47622:


 Summary: Spark creates lot of tiny blocks for a single driverLog 
file of size less than a dfs.blocksize
 Key: SPARK-47622
 URL: https://issues.apache.org/jira/browse/SPARK-47622
 Project: Spark
  Issue Type: Bug
  Components: Spark Shell, Spark Submit
Affects Versions: 3.3.2
Reporter: Srinivasu Majeti


Upon reviewing the spark code, found that /user/spark/driverLogs are synced to 
HDFS with hsync option as shown below.
{code:java}
 hdfsStream.hsync(EnumSet.allOf(classOf[HdfsDataOutputStream.SyncFlag]))

Ref: 
https://github.com/apache/spark/blob/a3c04ec1145662e4227d57cd953bffce96b8aad7/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala{code}
As a result of this we see lot of tiny blocks getting synced every 5 seconds 
with a new block. So we see a small HDFS file with 8 blocks as shown in the 
below example.
{code:java}
[r...@ccycloud-3.smajeti.root.comops.site subdir0]# hdfs fsck 
/user/spark/driverLogs/application_1710495774861_0002_driver.log
Connecting to namenode via 
https://ccycloud-3.smajeti.root.comops.site:20102/fsck?ugi=hdfs=%2Fuser%2Fspark%2FdriverLogs%2Fapplication_1710495774861_0002_driver.log
FSCK started by hdfs (auth:KERBEROS_SSL) from /10.140.136.139 for path 
/user/spark/driverLogs/application_1710495774861_0002_driver.log at Thu Mar 28 
06:37:29 UTC 2024

Status: HEALTHY
 Number of data-nodes:  4
 Number of racks:   1
 Total dirs:0
 Total symlinks:0

Replicated Blocks:
 Total size:157574 B
 Total files:   1
 Total blocks (validated):  8 (avg. block size 19696 B)
 Minimally replicated blocks:   8 (100.0 %) {code}
HdfsDataOutputStream.SyncFlag includes two flags UPDATE_LENGTH and END_BLOCK. 
This has been an expected behavior for some time now and these flags will help 
visualize the latest size of the HDFS Driver log file and to achieve that, 
blocks are being ended/closed every 5-second sync. Every new sync will create a 
new block for the same HDFS driver log file. This hysnc behavior was started 
after fixing SPARK-29105 (SHS may delete driver log file of in-progress 
application) 5 years back.

But this leaves Namenode to manage a lot of meta and becomes an overhead at 
times in large clusters.
{code:java}
public static enum SyncFlag {
UPDATE_LENGTH,
END_BLOCK;

private SyncFlag() {
}
}
{code}
I don't see any configurable option to avoid this and avoiding this type of 
hsync may have some side effects in spark as we saw SPARK-29105 bug.
We only have two options that needs manual intevention
1. Keep cleaning these Driver logs after some time
2. Keep merging these small block files into files with 128MB 

Can we provide some customizable option to merge these blocks while closing the 
spark-shell or during closing the driver log file?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47628) Fix Postgres bit array issue 'Cannot cast to boolean'

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47628:
---
Labels: pull-request-available  (was: )

> Fix Postgres bit array issue 'Cannot cast to boolean'
> -
>
> Key: SPARK-47628
> URL: https://issues.apache.org/jira/browse/SPARK-47628
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [info]   Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: 
> "10101"
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763)
> [info]   at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320)
> [info]   at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179)
> [info]   at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346)
> [info]   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> [info]   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47621) Refine docstring of `try_sum`, `try_avg`, `avg`, `sum`, `mean`

2024-03-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47621?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47621.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45745
[https://github.com/apache/spark/pull/45745]

> Refine docstring of `try_sum`, `try_avg`, `avg`, `sum`, `mean`
> --
>
> Key: SPARK-47621
> URL: https://issues.apache.org/jira/browse/SPARK-47621
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47559) Codegen Support for variant `parse_json`

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47559:
---

Assignee: BingKun Pan

> Codegen Support for variant `parse_json`
> 
>
> Key: SPARK-47559
> URL: https://issues.apache.org/jira/browse/SPARK-47559
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47559) Codegen Support for variant `parse_json`

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47559.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45714
[https://github.com/apache/spark/pull/45714]

> Codegen Support for variant `parse_json`
> 
>
> Key: SPARK-47559
> URL: https://issues.apache.org/jira/browse/SPARK-47559
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47629) Add `common/variant` to maven daily test module list

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47629:
---
Labels: pull-request-available  (was: )

> Add `common/variant` to maven daily test module list
> 
>
> Key: SPARK-47629
> URL: https://issues.apache.org/jira/browse/SPARK-47629
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47629) Add `common/variant` to maven daily test module list

2024-03-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-47629:


 Summary: Add `common/variant` to maven daily test module list
 Key: SPARK-47629
 URL: https://issues.apache.org/jira/browse/SPARK-47629
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47475) Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47475:
--
Summary: Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s 
Cluster Mode  (was: Jars Download from Driver Caused Executor Scalability Issue)

> Support `spark.kubernetes.jars.avoidDownloadSchemes` for K8s Cluster Mode
> -
>
> Key: SPARK-47475
> URL: https://issues.apache.org/jira/browse/SPARK-47475
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Kubernetes, Spark Core
>Affects Versions: 3.4.0, 3.5.0
>Reporter: Jiale Tan
>Assignee: Jiale Tan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Under K8s cluster deployment mode, all the jars, including primary resource 
> jar, jars from {{--jars}} or {{spark.jars}}, will be downloaded to driver 
> local and then served to executors through file server running on driver.
> When jars are big and the application requests a lot of executors, the 
> massive concurrent jars download from the driver will cause network 
> saturation. In this case, the executors jar download will timeout, causing 
> executors to be terminated. From user point of view, the application is 
> trapped in the loop of massive executor loss and re-provision, but never gets 
> enough live executors as requested, which leads to job SLA breach or 
> sometimes job failure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47628) Fix Postgres bit array issue 'Cannot cast to boolean'

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47628:
-

Assignee: Kent Yao

> Fix Postgres bit array issue 'Cannot cast to boolean'
> -
>
> Key: SPARK-47628
> URL: https://issues.apache.org/jira/browse/SPARK-47628
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> {code:java}
> [info]   Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: 
> "10101"
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763)
> [info]   at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320)
> [info]   at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179)
> [info]   at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346)
> [info]   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> [info]   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47628) Fix Postgres bit array issue 'Cannot cast to boolean'

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47628.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45751
[https://github.com/apache/spark/pull/45751]

> Fix Postgres bit array issue 'Cannot cast to boolean'
> -
>
> Key: SPARK-47628
> URL: https://issues.apache.org/jira/browse/SPARK-47628
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code:java}
> [info]   Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: 
> "10101"
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763)
> [info]   at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320)
> [info]   at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179)
> [info]   at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346)
> [info]   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> [info]   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47622) Spark creates lot of tiny blocks for a single driverLog file of size less than a dfs.blocksize

2024-03-28 Thread Srinivasu Majeti (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47622?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17831831#comment-17831831
 ] 

Srinivasu Majeti commented on SPARK-47622:
--

CCing [~vanzin] to look at it and guide on the next proceedings. Thank you!

> Spark creates lot of tiny blocks for a single driverLog file of size less 
> than a dfs.blocksize
> --
>
> Key: SPARK-47622
> URL: https://issues.apache.org/jira/browse/SPARK-47622
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Shell, Spark Submit
>Affects Versions: 3.3.2
>Reporter: Srinivasu Majeti
>Priority: Major
>
> Upon reviewing the spark code, found that /user/spark/driverLogs are synced 
> to HDFS with hsync option as shown below.
> {code:java}
>  hdfsStream.hsync(EnumSet.allOf(classOf[HdfsDataOutputStream.SyncFlag]))
> Ref: 
> https://github.com/apache/spark/blob/a3c04ec1145662e4227d57cd953bffce96b8aad7/core/src/main/scala/org/apache/spark/util/logging/DriverLogger.scala{code}
> As a result of this we see lot of tiny blocks getting synced every 5 seconds 
> with a new block. So we see a small HDFS file with 8 blocks as shown in the 
> below example.
> {code:java}
> [r...@ccycloud-3.smajeti.root.comops.site subdir0]# hdfs fsck 
> /user/spark/driverLogs/application_1710495774861_0002_driver.log
> Connecting to namenode via 
> https://ccycloud-3.smajeti.root.comops.site:20102/fsck?ugi=hdfs=%2Fuser%2Fspark%2FdriverLogs%2Fapplication_1710495774861_0002_driver.log
> FSCK started by hdfs (auth:KERBEROS_SSL) from /10.140.136.139 for path 
> /user/spark/driverLogs/application_1710495774861_0002_driver.log at Thu Mar 
> 28 06:37:29 UTC 2024
> Status: HEALTHY
>  Number of data-nodes:4
>  Number of racks: 1
>  Total dirs:  0
>  Total symlinks:  0
> Replicated Blocks:
>  Total size:  157574 B
>  Total files: 1
>  Total blocks (validated):8 (avg. block size 19696 B)
>  Minimally replicated blocks: 8 (100.0 %) {code}
> HdfsDataOutputStream.SyncFlag includes two flags UPDATE_LENGTH and END_BLOCK. 
> This has been an expected behavior for some time now and these flags will 
> help visualize the latest size of the HDFS Driver log file and to achieve 
> that, blocks are being ended/closed every 5-second sync. Every new sync will 
> create a new block for the same HDFS driver log file. This hysnc behavior was 
> started after fixing SPARK-29105 (SHS may delete driver log file of 
> in-progress application) 5 years back.
> But this leaves Namenode to manage a lot of meta and becomes an overhead at 
> times in large clusters.
> {code:java}
> public static enum SyncFlag {
> UPDATE_LENGTH,
> END_BLOCK;
> private SyncFlag() {
> }
> }
> {code}
> I don't see any configurable option to avoid this and avoiding this type of 
> hsync may have some side effects in spark as we saw SPARK-29105 bug.
> We only have two options that needs manual intevention
> 1. Keep cleaning these Driver logs after some time
> 2. Keep merging these small block files into files with 128MB 
> Can we provide some customizable option to merge these blocks while closing 
> the spark-shell or during closing the driver log file?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47614) Rename `JavaModuleOptions` to `JVMRuntimeOptions`

2024-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-47614:


Assignee: BingKun Pan

> Rename `JavaModuleOptions` to `JVMRuntimeOptions`
> -
>
> Key: SPARK-47614
> URL: https://issues.apache.org/jira/browse/SPARK-47614
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47614) Rename `JavaModuleOptions` to `JVMRuntimeOptions`

2024-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-47614.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45735
[https://github.com/apache/spark/pull/45735]

> Rename `JavaModuleOptions` to `JVMRuntimeOptions`
> -
>
> Key: SPARK-47614
> URL: https://issues.apache.org/jira/browse/SPARK-47614
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47628) Fix Postgres bit array issue 'Cannot cast to boolean'

2024-03-28 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47628?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao updated SPARK-47628:
-
Description: 
{code:java}
[info]   Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: 
"10101"
[info]   at 
org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
[info]   at 
org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
[info]   at 
org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267)
[info]   at 
org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128)
[info]   at 
org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763)
[info]   at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320)
[info]   at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179)
[info]   at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365)
[info]   at 
org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346)
[info]   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
[info]   at 
org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37) 
{code}

> Fix Postgres bit array issue 'Cannot cast to boolean'
> -
>
> Key: SPARK-47628
> URL: https://issues.apache.org/jira/browse/SPARK-47628
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>
> {code:java}
> [info]   Cause: org.postgresql.util.PSQLException: Cannot cast to boolean: 
> "10101"
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.cannotCoerceException(BooleanTypeUtil.java:99)
> [info]   at 
> org.postgresql.jdbc.BooleanTypeUtil.fromString(BooleanTypeUtil.java:67)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$7.parseValue(ArrayDecoding.java:267)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding$AbstractObjectStringArrayDecoder.populateFromString(ArrayDecoding.java:128)
> [info]   at 
> org.postgresql.jdbc.ArrayDecoding.readStringArray(ArrayDecoding.java:763)
> [info]   at org.postgresql.jdbc.PgArray.buildArray(PgArray.java:320)
> [info]   at org.postgresql.jdbc.PgArray.getArrayImpl(PgArray.java:179)
> [info]   at org.postgresql.jdbc.PgArray.getArray(PgArray.java:116)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$25(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.nullSafeConvert(JdbcUtils.scala:561)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24(JdbcUtils.scala:548)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$.$anonfun$makeGetter$24$adapted(JdbcUtils.scala:545)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:365)
> [info]   at 
> org.apache.spark.sql.execution.datasources.jdbc.JdbcUtils$$anon$1.getNext(JdbcUtils.scala:346)
> [info]   at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
> [info]   at 
> org.apache.spark.InterruptibleIterator.hasNext(InterruptibleIterator.scala:37)
>  {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47628) Fix Postgres bit array issue 'Cannot cast to boolean'

2024-03-28 Thread Kent Yao (Jira)
Kent Yao created SPARK-47628:


 Summary: Fix Postgres bit array issue 'Cannot cast to boolean'
 Key: SPARK-47628
 URL: https://issues.apache.org/jira/browse/SPARK-47628
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47614) Update some outdated comments about JavaModuleOptions

2024-03-28 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-47614:

Component/s: Documentation

> Update some outdated comments about JavaModuleOptions
> -
>
> Key: SPARK-47614
> URL: https://issues.apache.org/jira/browse/SPARK-47614
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47629) Add `common/variant` and `connector/kinesis-asl` to maven daily test module list

2024-03-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-47629:
-
Summary: Add `common/variant` and `connector/kinesis-asl` to maven daily 
test module list  (was: Add `common/variant` to maven daily test module list)

> Add `common/variant` and `connector/kinesis-asl` to maven daily test module 
> list
> 
>
> Key: SPARK-47629
> URL: https://issues.apache.org/jira/browse/SPARK-47629
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47614) Update some outdated comments about JavaModuleOptions

2024-03-28 Thread BingKun Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47614?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-47614:

Summary: Update some outdated comments about JavaModuleOptions  (was: 
Rename `JavaModuleOptions` to `JVMRuntimeOptions`)

> Update some outdated comments about JavaModuleOptions
> -
>
> Key: SPARK-47614
> URL: https://issues.apache.org/jira/browse/SPARK-47614
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47634) Legacy support for map normalization

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47634:
---
Labels: pull-request-available  (was: )

> Legacy support for map normalization
> 
>
> Key: SPARK-47634
> URL: https://issues.apache.org/jira/browse/SPARK-47634
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Stevo Mitric
>Priority: Major
>  Labels: pull-request-available
>
> Add legacy support for creating a map without normalizing keys before 
> inserting in `ArrayBasedMapBuilder`.
>  
> Key normalization change can be found in this PR: 
> https://issues.apache.org/jira/browse/SPARK-47563



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47630) Upgrade `zstd-jni` to 1.5.6-1

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47630.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45756
[https://github.com/apache/spark/pull/45756]

> Upgrade `zstd-jni` to 1.5.6-1
> -
>
> Key: SPARK-47630
> URL: https://issues.apache.org/jira/browse/SPARK-47630
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47632) Ban `com.amazonaws:aws-java-sdk-bundle` dependency

2024-03-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47632:
-

 Summary: Ban `com.amazonaws:aws-java-sdk-bundle` dependency
 Key: SPARK-47632
 URL: https://issues.apache.org/jira/browse/SPARK-47632
 Project: Spark
  Issue Type: Sub-task
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47632) Ban `com.amazonaws:aws-java-sdk-bundle` dependency

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47632:
---
Labels: pull-request-available  (was: )

> Ban `com.amazonaws:aws-java-sdk-bundle` dependency
> --
>
> Key: SPARK-47632
> URL: https://issues.apache.org/jira/browse/SPARK-47632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47632) Ban `com.amazonaws:aws-java-sdk-bundle` dependency

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47632.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

This is resolved via https://github.com/apache/spark/pull/45759

> Ban `com.amazonaws:aws-java-sdk-bundle` dependency
> --
>
> Key: SPARK-47632
> URL: https://issues.apache.org/jira/browse/SPARK-47632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47632) Ban `com.amazonaws:aws-java-sdk-bundle` dependency

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47632?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47632:
-

Assignee: Dongjoon Hyun

> Ban `com.amazonaws:aws-java-sdk-bundle` dependency
> --
>
> Key: SPARK-47632
> URL: https://issues.apache.org/jira/browse/SPARK-47632
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47492) Relax definition of whitespace in lexer

2024-03-28 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang reassigned SPARK-47492:
--

Assignee: Serge Rielau

> Relax definition of whitespace in lexer
> ---
>
> Key: SPARK-47492
> URL: https://issues.apache.org/jira/browse/SPARK-47492
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> There have been multiple incidences where queries "copied" in from other 
> sources resulted in "weird" syntax errors which ultimately boiled down to 
> whitespaces which the lexer does not recognize as such.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47492) Relax definition of whitespace in lexer

2024-03-28 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47492.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45620
[https://github.com/apache/spark/pull/45620]

> Relax definition of whitespace in lexer
> ---
>
> Key: SPARK-47492
> URL: https://issues.apache.org/jira/browse/SPARK-47492
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Assignee: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There have been multiple incidences where queries "copied" in from other 
> sources resulted in "weird" syntax errors which ultimately boiled down to 
> whitespaces which the lexer does not recognize as such.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47635) Use Java 21 instead of 21-jre in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47635:
-

 Summary: Use Java 21 instead of 21-jre in K8s Dockerfile
 Key: SPARK-47635
 URL: https://issues.apache.org/jira/browse/SPARK-47635
 Project: Spark
  Issue Type: Sub-task
  Components: Kubernetes
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun


{code}
$ docker run -it --rm azul/zulu-openjdk:21-jre jmap
docker: Error response from daemon: failed to create task for container: failed 
to create shim task: OCI runtime create failed: runc create failed: unable to 
start container process: exec: "jmap": executable file not found in $PATH: 
unknown.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47631) Remove unused `SQLConf.parquetOutputCommitterClass` method

2024-03-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47631:
-

 Summary: Remove unused `SQLConf.parquetOutputCommitterClass` method
 Key: SPARK-47631
 URL: https://issues.apache.org/jira/browse/SPARK-47631
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47631) Remove unused `SQLConf.parquetOutputCommitterClass` method

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47631:
---
Labels: pull-request-available  (was: )

> Remove unused `SQLConf.parquetOutputCommitterClass` method
> --
>
> Key: SPARK-47631
> URL: https://issues.apache.org/jira/browse/SPARK-47631
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47635) Use Java 21 instead of 21-jre in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47635:
--
Affects Version/s: 3.5.1
   3.5.0

> Use Java 21 instead of 21-jre in K8s Dockerfile
> ---
>
> Key: SPARK-47635
> URL: https://issues.apache.org/jira/browse/SPARK-47635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ docker run -it --rm azul/zulu-openjdk:21-jre jmap
> docker: Error response from daemon: failed to create task for container: 
> failed to create shim task: OCI runtime create failed: runc create failed: 
> unable to start container process: exec: "jmap": executable file not found in 
> $PATH: unknown.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47634) Legacy support for map normalization

2024-03-28 Thread Stevo Mitric (Jira)
Stevo Mitric created SPARK-47634:


 Summary: Legacy support for map normalization
 Key: SPARK-47634
 URL: https://issues.apache.org/jira/browse/SPARK-47634
 Project: Spark
  Issue Type: Task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Stevo Mitric


Add legacy support for creating a map without normalizing keys before inserting 
in `ArrayBasedMapBuilder`.

 

Key normalization change can be found in this PR: 
https://issues.apache.org/jira/browse/SPARK-47563



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47635) Use Java 21 instead of 21-jre in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47635:
-

Assignee: Dongjoon Hyun

> Use Java 21 instead of 21-jre in K8s Dockerfile
> ---
>
> Key: SPARK-47635
> URL: https://issues.apache.org/jira/browse/SPARK-47635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 3.5.0, 4.0.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ docker run -it --rm azul/zulu-openjdk:21-jre jmap
> docker: Error response from daemon: failed to create task for container: 
> failed to create shim task: OCI runtime create failed: runc create failed: 
> unable to start container process: exec: "jmap": executable file not found in 
> $PATH: unknown.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-03-28 Thread Bruce Robbins (Jira)
Bruce Robbins created SPARK-47633:
-

 Summary: Cache miss for queries using JOIN LATERAL with join 
condition
 Key: SPARK-47633
 URL: https://issues.apache.org/jira/browse/SPARK-47633
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 4.0.0
Reporter: Bruce Robbins


For example:
{noformat}
CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);

create or replace temp view v1 as
select *
from t1
join lateral (
  select c1 as a, c2 as b
  from t2)
on c1 = a;

cache table v1;

explain select * from v1;
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
   :- LocalTableScan [c1#180, c2#181]
   +- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
false] as bigint)),false), [plan_id=113]
  +- LocalTableScan [a#173, b#174]
{noformat}
Note that there is no {{InMemoryRelation}}.

However, if you move the join condition into the subquery, the cached plan is 
used:
{noformat}
CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);

create or replace temp view v2 as
select *
from t1
join lateral (
  select c1 as a, c2 as b
  from t2
  where t1.c1 = t2.c1);

cache table v2;

explain select * from v2;
== Physical Plan ==
AdaptiveSparkPlan isFinalPlan=false
+- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
  +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
memory, deserialized, 1 replicas)
+- AdaptiveSparkPlan isFinalPlan=true
   +- == Final Plan ==
  *(1) Project [c1#26, c2#27, a#19, b#20]
  +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
false
 :- BroadcastQueryStage 0
 :  +- BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), 
[plan_id=37]
 : +- LocalTableScan [c1#26, c2#27]
 +- *(1) LocalTableScan [a#19, b#20, c1#30]
   +- == Initial Plan ==
  Project [c1#26, c2#27, a#19, b#20]
  +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, false
 :- BroadcastExchange 
HashedRelationBroadcastMode(List(cast(input[0, int, false] as bigint)),false), 
[plan_id=37]
 :  +- LocalTableScan [c1#26, c2#27]
 +- LocalTableScan [a#19, b#20, c1#30]
{noformat}




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-03-28 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-47633:
--
Affects Version/s: 3.5.1

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Priority: Major
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-03-28 Thread Bruce Robbins (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bruce Robbins updated SPARK-47633:
--
Affects Version/s: 3.4.2

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Priority: Major
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47636) Use Java 17 instead of 17-jre image in K8s Dockerfile

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47636:
---
Labels: pull-request-available  (was: )

> Use Java 17 instead of 17-jre image in K8s Dockerfile
> -
>
> Key: SPARK-47636
> URL: https://issues.apache.org/jira/browse/SPARK-47636
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47635) Use Java 21 instead of 21-jre in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-47635:
--
Affects Version/s: (was: 3.5.0)
   (was: 3.5.1)

> Use Java 21 instead of 21-jre in K8s Dockerfile
> ---
>
> Key: SPARK-47635
> URL: https://issues.apache.org/jira/browse/SPARK-47635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ docker run -it --rm azul/zulu-openjdk:21-jre jmap
> docker: Error response from daemon: failed to create task for container: 
> failed to create shim task: OCI runtime create failed: runc create failed: 
> unable to start container process: exec: "jmap": executable file not found in 
> $PATH: unknown.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47636) Use Java 17 instead of 17-jre image in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-47636:
-

 Summary: Use Java 17 instead of 17-jre image in K8s Dockerfile
 Key: SPARK-47636
 URL: https://issues.apache.org/jira/browse/SPARK-47636
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes
Affects Versions: 3.5.1, 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47633) Cache miss for queries using JOIN LATERAL with join condition

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47633:
---
Labels: pull-request-available  (was: )

> Cache miss for queries using JOIN LATERAL with join condition
> -
>
> Key: SPARK-47633
> URL: https://issues.apache.org/jira/browse/SPARK-47633
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Bruce Robbins
>Priority: Major
>  Labels: pull-request-available
>
> For example:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v1 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2)
> on c1 = a;
> cache table v1;
> explain select * from v1;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- BroadcastHashJoin [c1#180], [a#173], Inner, BuildRight, false
>:- LocalTableScan [c1#180, c2#181]
>+- BroadcastExchange HashedRelationBroadcastMode(List(cast(input[0, int, 
> false] as bigint)),false), [plan_id=113]
>   +- LocalTableScan [a#173, b#174]
> {noformat}
> Note that there is no {{InMemoryRelation}}.
> However, if you move the join condition into the subquery, the cached plan is 
> used:
> {noformat}
> CREATE or REPLACE TEMP VIEW t1(c1, c2) AS VALUES (0, 1), (1, 2);
> CREATE or REPLACE TEMP VIEW t2(c1, c2) AS VALUES (0, 1), (1, 2);
> create or replace temp view v2 as
> select *
> from t1
> join lateral (
>   select c1 as a, c2 as b
>   from t2
>   where t1.c1 = t2.c1);
> cache table v2;
> explain select * from v2;
> == Physical Plan ==
> AdaptiveSparkPlan isFinalPlan=false
> +- Scan In-memory table v2 [c1#176, c2#177, a#178, b#179]
>   +- InMemoryRelation [c1#176, c2#177, a#178, b#179], StorageLevel(disk, 
> memory, deserialized, 1 replicas)
> +- AdaptiveSparkPlan isFinalPlan=true
>+- == Final Plan ==
>   *(1) Project [c1#26, c2#27, a#19, b#20]
>   +- *(1) BroadcastHashJoin [c1#26], [c1#30], Inner, 
> BuildLeft, false
>  :- BroadcastQueryStage 0
>  :  +- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  : +- LocalTableScan [c1#26, c2#27]
>  +- *(1) LocalTableScan [a#19, b#20, c1#30]
>+- == Initial Plan ==
>   Project [c1#26, c2#27, a#19, b#20]
>   +- BroadcastHashJoin [c1#26], [c1#30], Inner, BuildLeft, 
> false
>  :- BroadcastExchange 
> HashedRelationBroadcastMode(List(cast(input[0, int, false] as 
> bigint)),false), [plan_id=37]
>  :  +- LocalTableScan [c1#26, c2#27]
>  +- LocalTableScan [a#19, b#20, c1#30]
> {noformat}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47525) Support subquery correlation joining on map attributes

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47525.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45673
[https://github.com/apache/spark/pull/45673]

> Support subquery correlation joining on map attributes
> --
>
> Key: SPARK-47525
> URL: https://issues.apache.org/jira/browse/SPARK-47525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, when a subquery is correlated on a condition like `outer_map[1] = 
> inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself,
> which is unsupported, so the query cannot run - for example:
>  
> {code:java}
> scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select 
> v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 
> where v1.value[1] = v2.value[1])").explain
> org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
>  Unsupported subquery expression: Correlated column reference 'v1.value' 
> cannot be map type. SQLSTATE: 0A000; line 1 pos 49
> at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463)
> ... {code}
> However, if we rewrite the query to pull out the map access `outer_map[1]` 
> into the outer plan, it succeeds:
>  
> {code:java}
> scala> sql("""with tmp as (
> select value[0] as value0, value[1] as value1 from v
> )
> select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from 
> tmp v2 where v1.value1 = v2.value1)""").explain{code}
> Another point that can be improved is that, even if the data type supports 
> join, we still don’t need to join on the full attribute, and we can get a 
> better plan by doing the same rewrite to pull out the extract expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47638) Skip column name validation in PS

2024-03-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-47638:
-

Assignee: Ruifeng Zheng

> Skip column name validation in PS
> -
>
> Key: SPARK-47638
> URL: https://issues.apache.org/jira/browse/SPARK-47638
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47638) Skip column name validation in PS

2024-03-28 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47638.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45752
[https://github.com/apache/spark/pull/45752]

> Skip column name validation in PS
> -
>
> Key: SPARK-47638
> URL: https://issues.apache.org/jira/browse/SPARK-47638
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47638) Skip column name validation in PS

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47638?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47638:
---
Labels: pull-request-available  (was: )

> Skip column name validation in PS
> -
>
> Key: SPARK-47638
> URL: https://issues.apache.org/jira/browse/SPARK-47638
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47639) Support codegen for json_tuple

2024-03-28 Thread Xianming Lei (Jira)
Xianming Lei created SPARK-47639:


 Summary: Support codegen for json_tuple
 Key: SPARK-47639
 URL: https://issues.apache.org/jira/browse/SPARK-47639
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Xianming Lei


Sometimes using json_tuple may cause performance regression because it does not 
support whole stage codegen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47640) Support codegen for json_tuple

2024-03-28 Thread Xianming Lei (Jira)
Xianming Lei created SPARK-47640:


 Summary: Support codegen for json_tuple
 Key: SPARK-47640
 URL: https://issues.apache.org/jira/browse/SPARK-47640
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Xianming Lei


Sometimes using json_tuple may cause performance regression because it does not 
support whole stage codegen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47640) Support codegen for json_tuple

2024-03-28 Thread Xianming Lei (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47640?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xianming Lei resolved SPARK-47640.
--
Resolution: Duplicate

> Support codegen for json_tuple
> --
>
> Key: SPARK-47640
> URL: https://issues.apache.org/jira/browse/SPARK-47640
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Xianming Lei
>Priority: Major
>
> Sometimes using json_tuple may cause performance regression because it does 
> not support whole stage codegen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47639) Support codegen for json_tuple

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47639?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47639:
---
Labels: pull-request-available  (was: )

> Support codegen for json_tuple
> --
>
> Key: SPARK-47639
> URL: https://issues.apache.org/jira/browse/SPARK-47639
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Xianming Lei
>Priority: Major
>  Labels: pull-request-available
>
> Sometimes using json_tuple may cause performance regression because it does 
> not support whole stage codegen.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47637) Use errorCapturingIdentifier rule in more places to improve error messages

2024-03-28 Thread Serge Rielau (Jira)
Serge Rielau created SPARK-47637:


 Summary: Use errorCapturingIdentifier rule in more places to 
improve error messages
 Key: SPARK-47637
 URL: https://issues.apache.org/jira/browse/SPARK-47637
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 4.0.0
Reporter: Serge Rielau


errorCapturingIdentifier parses identifier with included '-' to raise 
INVALID_IDENTIFIER 

instead of SYNTAX_ERROR for non-delimited identifiers containing a hyphen.
It is meant to be used wherever the context is not that of an expression
This Jira replaces a few missed identifiers with that rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47635) Use Java 21 instead of 21-jre in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47635.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45761
[https://github.com/apache/spark/pull/45761]

> Use Java 21 instead of 21-jre in K8s Dockerfile
> ---
>
> Key: SPARK-47635
> URL: https://issues.apache.org/jira/browse/SPARK-47635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> {code}
> $ docker run -it --rm azul/zulu-openjdk:21-jre jmap
> docker: Error response from daemon: failed to create task for container: 
> failed to create shim task: OCI runtime create failed: runc create failed: 
> unable to start container process: exec: "jmap": executable file not found in 
> $PATH: unknown.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47631) Remove unused `SQLConf.parquetOutputCommitterClass` method

2024-03-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47631.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45757
[https://github.com/apache/spark/pull/45757]

> Remove unused `SQLConf.parquetOutputCommitterClass` method
> --
>
> Key: SPARK-47631
> URL: https://issues.apache.org/jira/browse/SPARK-47631
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47525) Support subquery correlation joining on map attributes

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47525?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47525:
---

Assignee: Jack Chen

> Support subquery correlation joining on map attributes
> --
>
> Key: SPARK-47525
> URL: https://issues.apache.org/jira/browse/SPARK-47525
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when a subquery is correlated on a condition like `outer_map[1] = 
> inner_map[1]`, DecorrelateInnerQuery generates a join on the map itself,
> which is unsupported, so the query cannot run - for example:
>  
> {code:java}
> scala> Seq(Map(0 -> 0)).toDF.createOrReplaceTempView("v")scala> sql("select 
> v1.value[0] from v v1 where v1.value[0] > (select avg(v2.value[0]) from v v2 
> where v1.value[1] = v2.value[1])").explain
> org.apache.spark.sql.AnalysisException: 
> [UNSUPPORTED_SUBQUERY_EXPRESSION_CATEGORY.UNSUPPORTED_CORRELATED_REFERENCE_DATA_TYPE]
>  Unsupported subquery expression: Correlated column reference 'v1.value' 
> cannot be map type. SQLSTATE: 0A000; line 1 pos 49
> at 
> org.apache.spark.sql.errors.QueryCompilationErrors$.unsupportedCorrelatedReferenceDataTypeError(QueryCompilationErrors.scala:2463)
> ... {code}
> However, if we rewrite the query to pull out the map access `outer_map[1]` 
> into the outer plan, it succeeds:
>  
> {code:java}
> scala> sql("""with tmp as (
> select value[0] as value0, value[1] as value1 from v
> )
> select v1.value0 from tmp v1 where v1.value0 > (select avg(v2.value0) from 
> tmp v2 where v1.value1 = v2.value1)""").explain{code}
> Another point that can be improved is that, even if the data type supports 
> join, we still don’t need to join on the full attribute, and we can get a 
> better plan by doing the same rewrite to pull out the extract expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47637) Use errorCapturingIdentifier rule in more places to improve error messages

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47637?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47637:
---
Labels: pull-request-available  (was: )

> Use errorCapturingIdentifier rule in more places to improve error messages
> --
>
> Key: SPARK-47637
> URL: https://issues.apache.org/jira/browse/SPARK-47637
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Serge Rielau
>Priority: Major
>  Labels: pull-request-available
>
> errorCapturingIdentifier parses identifier with included '-' to raise 
> INVALID_IDENTIFIER 
> instead of SYNTAX_ERROR for non-delimited identifiers containing a hyphen.
> It is meant to be used wherever the context is not that of an expression
> This Jira replaces a few missed identifiers with that rule.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47623) Use `QuietTest` in parity tests

2024-03-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47623?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47623.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45747
[https://github.com/apache/spark/pull/45747]

> Use `QuietTest` in parity tests
> ---
>
> Key: SPARK-47623
> URL: https://issues.apache.org/jira/browse/SPARK-47623
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47511) Canonicalize With expressions by re-assigning IDs

2024-03-28 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47511?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47511.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45649
[https://github.com/apache/spark/pull/45649]

> Canonicalize With expressions by re-assigning IDs
> -
>
> Key: SPARK-47511
> URL: https://issues.apache.org/jira/browse/SPARK-47511
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The current canonicalization of `With` expressions takes into account the ID 
> of the common expressions, which comes from a global monotonically increasing 
> ID. This means that queries with `With` expressions (e.g. `NULLIF` 
> expressions) will have inconsistent canonicalizations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47636) Use Java 17 instead of 17-jre image in K8s Dockerfile

2024-03-28 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47636?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47636.
---
Fix Version/s: 3.5.2
   Resolution: Fixed

Issue resolved by pull request 45762
[https://github.com/apache/spark/pull/45762]

> Use Java 17 instead of 17-jre image in K8s Dockerfile
> -
>
> Key: SPARK-47636
> URL: https://issues.apache.org/jira/browse/SPARK-47636
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.5.0, 3.5.1
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.2
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47631) Remove unused `SQLConf.parquetOutputCommitterClass` method

2024-03-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47631:


Assignee: Dongjoon Hyun

> Remove unused `SQLConf.parquetOutputCommitterClass` method
> --
>
> Key: SPARK-47631
> URL: https://issues.apache.org/jira/browse/SPARK-47631
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Trivial
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47638) Skip column name validation in PS

2024-03-28 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-47638:
-

 Summary: Skip column name validation in PS
 Key: SPARK-47638
 URL: https://issues.apache.org/jira/browse/SPARK-47638
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PS
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47642) Exclude `org.junit.jupiter` and `org.junit.platform` from `jmock-junit5`

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47642:
---
Labels: pull-request-available  (was: )

> Exclude `org.junit.jupiter` and `org.junit.platform` from `jmock-junit5`
> 
>
> Key: SPARK-47642
> URL: https://issues.apache.org/jira/browse/SPARK-47642
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47644) Refine docstrings of try_*

2024-03-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-47644:
-
Summary: Refine docstrings of try_*  (was: Improve docstrings of try_*)

> Refine docstrings of try_*
> --
>
> Key: SPARK-47644
> URL: https://issues.apache.org/jira/browse/SPARK-47644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47642) Exclude `junit-jupiter-api` and `org.junit.platform` from `jmock-junit5`

2024-03-28 Thread Yang Jie (Jira)
Yang Jie created SPARK-47642:


 Summary: Exclude `junit-jupiter-api` and `org.junit.platform` from 
`jmock-junit5`
 Key: SPARK-47642
 URL: https://issues.apache.org/jira/browse/SPARK-47642
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47642) Exclude `org.junit.jupiter` and `org.junit.platform` from `jmock-junit5`

2024-03-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47642?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-47642:
-
Summary: Exclude `org.junit.jupiter` and `org.junit.platform` from 
`jmock-junit5`  (was: Exclude `junit-jupiter-api` and `org.junit.platform` from 
`jmock-junit5`)

> Exclude `org.junit.jupiter` and `org.junit.platform` from `jmock-junit5`
> 
>
> Key: SPARK-47642
> URL: https://issues.apache.org/jira/browse/SPARK-47642
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47643) Add pyspark test for python streaming data source

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47643?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47643:
---
Labels: pull-request-available  (was: )

> Add pyspark test for python streaming data source
> -
>
> Key: SPARK-47643
> URL: https://issues.apache.org/jira/browse/SPARK-47643
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Add pyspark end to end test for Python streaming dada source in pure python 
> environment. Currently there are only scala tests for python streaming data 
> source.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47644) Refine docstrings of try_*

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47644?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47644:
---
Labels: pull-request-available  (was: )

> Refine docstrings of try_*
> --
>
> Key: SPARK-47644
> URL: https://issues.apache.org/jira/browse/SPARK-47644
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47644) Improve docstrings of try_*

2024-03-28 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47644:


 Summary: Improve docstrings of try_*
 Key: SPARK-47644
 URL: https://issues.apache.org/jira/browse/SPARK-47644
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47568) Fix race condition between maintenance thread and task thead for RocksDB snapshot

2024-03-28 Thread Jungtaek Lim (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jungtaek Lim resolved SPARK-47568.
--
Fix Version/s: 4.0.0
 Assignee: Bhuwan Sahni
   Resolution: Fixed

Issue resolved via https://github.com/apache/spark/pull/45724

> Fix race condition between maintenance thread and task thead for RocksDB 
> snapshot
> -
>
> Key: SPARK-47568
> URL: https://issues.apache.org/jira/browse/SPARK-47568
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 3.5.0, 4.0.0, 3.5.1, 3.5.2
>Reporter: Bhuwan Sahni
>Assignee: Bhuwan Sahni
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> There are currently some race conditions between maintenance thread and task 
> thread which can result in corrupted checkpoint state.
>  # The maintenance thread currently relies on class variable {{lastSnapshot}} 
> to find the latest checkpoint and uploads it to DFS. This checkpoint can be 
> modified at commit time by Task thread if a new snapshot is created.
>  # The task thread does not reset lastSnapshot at load time, which can result 
> in newer snapshots (if a old version is loaded) being considered valid and 
> uploaded to DFS. This results in VersionIdMismatch errors.
> This issue proposes to fix these issues by guarding latestSnapshot variable 
> modification, and setting latestSnapshot properly at load time.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47629) Add `common/variant` and `connector/kinesis-asl` to maven daily test module list

2024-03-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-47629:


Assignee: Yang Jie

> Add `common/variant` and `connector/kinesis-asl` to maven daily test module 
> list
> 
>
> Key: SPARK-47629
> URL: https://issues.apache.org/jira/browse/SPARK-47629
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47629) Add `common/variant` and `connector/kinesis-asl` to maven daily test module list

2024-03-28 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-47629.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45754
[https://github.com/apache/spark/pull/45754]

> Add `common/variant` and `connector/kinesis-asl` to maven daily test module 
> list
> 
>
> Key: SPARK-47629
> URL: https://issues.apache.org/jira/browse/SPARK-47629
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47635) Use Java 21 instead of 21-jre in K8s Dockerfile

2024-03-28 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47635:
---
Labels: pull-request-available  (was: )

> Use Java 21 instead of 21-jre in K8s Dockerfile
> ---
>
> Key: SPARK-47635
> URL: https://issues.apache.org/jira/browse/SPARK-47635
> Project: Spark
>  Issue Type: Sub-task
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>
> {code}
> $ docker run -it --rm azul/zulu-openjdk:21-jre jmap
> docker: Error response from daemon: failed to create task for container: 
> failed to create shim task: OCI runtime create failed: runc create failed: 
> unable to start container process: exec: "jmap": executable file not found in 
> $PATH: unknown.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org