[jira] [Resolved] (SPARK-46427) Change Python Data Source's description to be pretty in explain

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46427.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44379
[https://github.com/apache/spark/pull/44379]

> Change Python Data Source's description to be pretty in explain
> ---
>
> Key: SPARK-46427
> URL: https://issues.apache.org/jira/browse/SPARK-46427
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Now it's as below:
> {code}
> == Physical Plan ==
> *(1) Project [x#0, y#1]
> +- BatchScan test[x#0, y#1] class 
> org.apache.spark.sql.execution.python.PythonTableProvider$$anon$1$$anon$2 
> RuntimeFilters: []
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46427) Change Python Data Source's description to be pretty in explain

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46427:


Assignee: Hyukjin Kwon

> Change Python Data Source's description to be pretty in explain
> ---
>
> Key: SPARK-46427
> URL: https://issues.apache.org/jira/browse/SPARK-46427
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Now it's as below:
> {code}
> == Physical Plan ==
> *(1) Project [x#0, y#1]
> +- BatchScan test[x#0, y#1] class 
> org.apache.spark.sql.execution.python.PythonTableProvider$$anon$1$$anon$2 
> RuntimeFilters: []
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-15 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng reassigned SPARK-46403:
--

Assignee: Wan Kun

> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
>  Labels: pull-request-available
> Attachments: image-2023-12-14-16-30-39-104.png
>
>
> Now spark will get a parquet binary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-30-39-104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46403) Decode parquet binary with getBytesUnsafe method

2023-12-15 Thread Jiaan Geng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46403?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jiaan Geng resolved SPARK-46403.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44351
[https://github.com/apache/spark/pull/44351]

> Decode parquet binary with getBytesUnsafe method
> 
>
> Key: SPARK-46403
> URL: https://issues.apache.org/jira/browse/SPARK-46403
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wan Kun
>Assignee: Wan Kun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
> Attachments: image-2023-12-14-16-30-39-104.png
>
>
> Now spark will get a parquet binary object with getBytes() method.
> The *Binary.getBytes()* method will always make a new copy of the internal 
> bytes.
> We can use *Binary.getBytesUnsafe()* method to get the cached bytes if it has 
> already been called getBytes() and has the cached bytes.
> !image-2023-12-14-16-30-39-104.png!



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46427) Change Python Data Source's description to be pretty in explain

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46427?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46427:
---
Labels: pull-request-available  (was: )

> Change Python Data Source's description to be pretty in explain
> ---
>
> Key: SPARK-46427
> URL: https://issues.apache.org/jira/browse/SPARK-46427
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Now it's as below:
> {code}
> == Physical Plan ==
> *(1) Project [x#0, y#1]
> +- BatchScan test[x#0, y#1] class 
> org.apache.spark.sql.execution.python.PythonTableProvider$$anon$1$$anon$2 
> RuntimeFilters: []
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46427) Change Python Data Source's description to be pretty in explain

2023-12-15 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46427:


 Summary: Change Python Data Source's description to be pretty in 
explain
 Key: SPARK-46427
 URL: https://issues.apache.org/jira/browse/SPARK-46427
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Now it's as below:

{code}

== Physical Plan ==
*(1) Project [x#0, y#1]
+- BatchScan test[x#0, y#1] class 
org.apache.spark.sql.execution.python.PythonTableProvider$$anon$1$$anon$2 
RuntimeFilters: []
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46425) Pin the bundler version in CI

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46425.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44376
[https://github.com/apache/spark/pull/44376]

> Pin the bundler version in CI
> -
>
> Key: SPARK-46425
> URL: https://issues.apache.org/jira/browse/SPARK-46425
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> https://github.com/apache/spark/actions/runs/7226413850/job/19691970695
> {code}
> Requirement already satisfied: docutils<0.18.0 in 
> /usr/local/lib/python3.9/dist-packages (0.17.1)
> WARNING: Running pip as the 'root' user can result in broken permissions and 
> conflicting behaviour with the system package manager. It is recommended to 
> use a virtual environment instead: https://pip.pypa.io/warnings/venv
> ERROR:  Error installing bundler:
>   The last version of bundler (>= 0) to support your Ruby & RubyGems was 
> 2.4.22. Try installing it with `gem install bundler -v 2.4.22`
>   bundler requires Ruby version >= 3.0.0. The current ruby version is 
> 2.7.0.0.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46425) Pin the bundler version in CI

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46425:


Assignee: Hyukjin Kwon

> Pin the bundler version in CI
> -
>
> Key: SPARK-46425
> URL: https://issues.apache.org/jira/browse/SPARK-46425
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/7226413850/job/19691970695
> {code}
> Requirement already satisfied: docutils<0.18.0 in 
> /usr/local/lib/python3.9/dist-packages (0.17.1)
> WARNING: Running pip as the 'root' user can result in broken permissions and 
> conflicting behaviour with the system package manager. It is recommended to 
> use a virtual environment instead: https://pip.pypa.io/warnings/venv
> ERROR:  Error installing bundler:
>   The last version of bundler (>= 0) to support your Ruby & RubyGems was 
> 2.4.22. Try installing it with `gem install bundler -v 2.4.22`
>   bundler requires Ruby version >= 3.0.0. The current ruby version is 
> 2.7.0.0.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46414) Use prependBaseUri to render javascript imports

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46414.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44363
[https://github.com/apache/spark/pull/44363]

> Use prependBaseUri to render javascript imports
> ---
>
> Key: SPARK-46414
> URL: https://issues.apache.org/jira/browse/SPARK-46414
> Project: Spark
>  Issue Type: Sub-task
>  Components: UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46422) Move `test_window` to `pyspark.pandas.tests.window.*`

2023-12-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46422.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44371
[https://github.com/apache/spark/pull/44371]

> Move `test_window` to `pyspark.pandas.tests.window.*`
> -
>
> Key: SPARK-46422
> URL: https://issues.apache.org/jira/browse/SPARK-46422
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42829) Add Identifier to the cached RDD node on the Stages page

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42829?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42829:
---
Labels: pull-request-available  (was: )

> Add Identifier to the cached RDD node on the Stages page 
> -
>
> Key: SPARK-42829
> URL: https://issues.apache.org/jira/browse/SPARK-42829
> Project: Spark
>  Issue Type: Improvement
>  Components: Web UI
>Affects Versions: 3.3.2
>Reporter: Yian Liou
>Priority: Major
>  Labels: pull-request-available
> Attachments: Screen Shot 2023-03-20 at 3.55.40 PM.png
>
>
> On the stages page in the Web UI, there is no distinction for which cached 
> RDD is being executed in a particular stage. This Jira aims to add an repeat 
> identifier to distinguish which cached RDD is being executed.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Deleted] (SPARK-46426) Uses sum metrics for number of output length

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon deleted SPARK-46426:
-


> Uses sum metrics for number of output length
> 
>
> Key: SPARK-46426
> URL: https://issues.apache.org/jira/browse/SPARK-46426
> Project: Spark
>  Issue Type: Improvement
>Reporter: Hyukjin Kwon
>Priority: Major
>
> screenhot attached. it shouldn't look like bytes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46426) Uses sum metrics for number of output length

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46426:
-
Attachment: Screenshot 2023-12-15 at 3.23.09 PM.png

> Uses sum metrics for number of output length
> 
>
> Key: SPARK-46426
> URL: https://issues.apache.org/jira/browse/SPARK-46426
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Attachments: Screenshot 2023-12-15 at 3.23.09 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46426) Uses sum metrics for number of output length

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46426:
-
Description: screenhot attached. it shouldn't look like bytes

> Uses sum metrics for number of output length
> 
>
> Key: SPARK-46426
> URL: https://issues.apache.org/jira/browse/SPARK-46426
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
> Attachments: Screenshot 2023-12-15 at 3.23.09 PM.png
>
>
> screenhot attached. it shouldn't look like bytes



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46426) Uses sum metrics for number of output length

2023-12-15 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46426:


 Summary: Uses sum metrics for number of output length
 Key: SPARK-46426
 URL: https://issues.apache.org/jira/browse/SPARK-46426
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon
 Attachments: Screenshot 2023-12-15 at 3.23.09 PM.png





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46423) Refactor Python Data Source instance loading

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46423:


Assignee: Hyukjin Kwon

> Refactor Python Data Source instance loading
> 
>
> Key: SPARK-46423
> URL: https://issues.apache.org/jira/browse/SPARK-46423
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> we should make the instance in lookupDataSourceV2 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46423) Refactor Python Data Source instance loading

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46423.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44374
[https://github.com/apache/spark/pull/44374]

> Refactor Python Data Source instance loading
> 
>
> Key: SPARK-46423
> URL: https://issues.apache.org/jira/browse/SPARK-46423
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> we should make the instance in lookupDataSourceV2 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46425) Pin the bundler version in CI

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46425?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46425:
---
Labels: pull-request-available  (was: )

> Pin the bundler version in CI
> -
>
> Key: SPARK-46425
> URL: https://issues.apache.org/jira/browse/SPARK-46425
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> https://github.com/apache/spark/actions/runs/7226413850/job/19691970695
> {code}
> Requirement already satisfied: docutils<0.18.0 in 
> /usr/local/lib/python3.9/dist-packages (0.17.1)
> WARNING: Running pip as the 'root' user can result in broken permissions and 
> conflicting behaviour with the system package manager. It is recommended to 
> use a virtual environment instead: https://pip.pypa.io/warnings/venv
> ERROR:  Error installing bundler:
>   The last version of bundler (>= 0) to support your Ruby & RubyGems was 
> 2.4.22. Try installing it with `gem install bundler -v 2.4.22`
>   bundler requires Ruby version >= 3.0.0. The current ruby version is 
> 2.7.0.0.
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46425) Pin the bundler version in CI

2023-12-15 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46425:


 Summary: Pin the bundler version in CI
 Key: SPARK-46425
 URL: https://issues.apache.org/jira/browse/SPARK-46425
 Project: Spark
  Issue Type: Improvement
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


https://github.com/apache/spark/actions/runs/7226413850/job/19691970695

{code}
Requirement already satisfied: docutils<0.18.0 in 
/usr/local/lib/python3.9/dist-packages (0.17.1)
WARNING: Running pip as the 'root' user can result in broken permissions and 
conflicting behaviour with the system package manager. It is recommended to use 
a virtual environment instead: https://pip.pypa.io/warnings/venv
ERROR:  Error installing bundler:
The last version of bundler (>= 0) to support your Ruby & RubyGems was 
2.4.22. Try installing it with `gem install bundler -v 2.4.22`
bundler requires Ruby version >= 3.0.0. The current ruby version is 
2.7.0.0.
{code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46424) Support PythonSQLMetrics.pythonMetrics

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46424:
---
Labels: pull-request-available  (was: )

> Support PythonSQLMetrics.pythonMetrics
> --
>
> Key: SPARK-46424
> URL: https://issues.apache.org/jira/browse/SPARK-46424
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
>
> We should show the stats such as `pythonDataSent`, `pythonDataReceived` and 
> `pythonNumRowsReceived`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45807) DataSourceV2: Improve ViewCatalog API

2023-12-15 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45807?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45807.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44330
[https://github.com/apache/spark/pull/44330]

> DataSourceV2: Improve ViewCatalog API
> -
>
> Key: SPARK-45807
> URL: https://issues.apache.org/jira/browse/SPARK-45807
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Eduard Tudenhoefner
>Assignee: Eduard Tudenhoefner
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> The goal is to add createOrReplaceView(..) and replaceView(..) methods to the 
> ViewCatalog API



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46424) Support PythonSQLMetrics.pythonMetrics

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46424:
-
Summary: Support PythonSQLMetrics.pythonMetrics  (was: Support 
PythonSQLMetrics.pythonMetrics via custom metrics API in DSv2)

> Support PythonSQLMetrics.pythonMetrics
> --
>
> Key: SPARK-46424
> URL: https://issues.apache.org/jira/browse/SPARK-46424
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> We should show the stats such as `pythonDataSent`, `pythonDataReceived` and 
> `pythonNumRowsReceived`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46424) Support PythonSQLMetrics.pythonMetrics via custom metrics API in DSv2

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46424?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-46424:
-
Priority: Minor  (was: Major)

> Support PythonSQLMetrics.pythonMetrics via custom metrics API in DSv2
> -
>
> Key: SPARK-46424
> URL: https://issues.apache.org/jira/browse/SPARK-46424
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Minor
>
> We should show the stats such as `pythonDataSent`, `pythonDataReceived` and 
> `pythonNumRowsReceived`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46424) Support PythonSQLMetrics.pythonMetrics via custom metrics API in DSv2

2023-12-15 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46424:


 Summary: Support PythonSQLMetrics.pythonMetrics via custom metrics 
API in DSv2
 Key: SPARK-46424
 URL: https://issues.apache.org/jira/browse/SPARK-46424
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


We should show the stats such as `pythonDataSent`, `pythonDataReceived` and 
`pythonNumRowsReceived`.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46423) Refactor Python Data Source instance loading

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46423?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46423:
---
Labels: pull-request-available  (was: )

> Refactor Python Data Source instance loading
> 
>
> Key: SPARK-46423
> URL: https://issues.apache.org/jira/browse/SPARK-46423
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> we should make the instance in lookupDataSourceV2 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46419) Reorganize `DatetimeIndexTests`: Factor out 3 slow tests

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46419:


Assignee: Ruifeng Zheng

> Reorganize `DatetimeIndexTests`: Factor out 3 slow tests
> 
>
> Key: SPARK-46419
> URL: https://issues.apache.org/jira/browse/SPARK-46419
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46419) Reorganize `DatetimeIndexTests`: Factor out 3 slow tests

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46419.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44369
[https://github.com/apache/spark/pull/44369]

> Reorganize `DatetimeIndexTests`: Factor out 3 slow tests
> 
>
> Key: SPARK-46419
> URL: https://issues.apache.org/jira/browse/SPARK-46419
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46423) Refactor Python Data Source instance loading

2023-12-15 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-46423:


 Summary: Refactor Python Data Source instance loading
 Key: SPARK-46423
 URL: https://issues.apache.org/jira/browse/SPARK-46423
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, SQL
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


we should make the instance in lookupDataSourceV2 instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45597:


Assignee: Hyukjin Kwon  (was: Allison Wang)

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45597:


Assignee: Allison Wang

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45597.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44305
[https://github.com/apache/spark/pull/44305]

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-32710) Add Hive Murmur3Hash expression

2023-12-15 Thread Eric Xiao (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-32710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17797261#comment-17797261
 ] 

Eric Xiao commented on SPARK-32710:
---

Hi [~chengsu], I am interested in working on enabling hive bucketing in spark 
and I noticed there are a couple of tickets still open. May I take a stab at 
this ticket?

This ticket does not seem too complicated as well? A couple of questions:
 * Where in Spark would one start to implement the `murmur3hash` algorithm?
 * Is the scope of this ticket just to implement the exact hashing logic found 
in the linked hive code snippet?

> Add Hive Murmur3Hash expression
> ---
>
> Key: SPARK-32710
> URL: https://issues.apache.org/jira/browse/SPARK-32710
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Cheng Su
>Priority: Minor
>
> To allow Spark to write Hive 3 compatible bucketed table, we need to follow 
> the same hash function with Hive/Presto. Hive murmur3hash is quite some 
> different with Spark murmur3hash (with different default seed, different 
> logic for NULL, array, map, struct, detail in 
> [https://github.com/apache/hive/blob/ece58fff1b53ea451bfc524c4c15f63ee12eca00/serde/src/java/org/apache/hadoop/hive/serde2/objectinspector/ObjectInspectorUtils.java#L813]).
>   So here introduce a Hive murmur3hash expression.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46409) Spark Connect Repl does not work with ClosureCleaner

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-46409.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44360
[https://github.com/apache/spark/pull/44360]

> Spark Connect Repl does not work with ClosureCleaner
> 
>
> Key: SPARK-46409
> URL: https://issues.apache.org/jira/browse/SPARK-46409
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Vsevolod Stepanov
>Assignee: Vsevolod Stepanov
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-45136 added ClosureCleaner support to SparkConnect client. 
> Unfortunately, this change breaks ConnectRepl launched by 
> `./connector/connect/bin/spark-connect-scala-client`. To reproduce the issue:
>  # Run `./connector/connect/bin/spark-connect-shell`
>  # Run  `./connector/connect/bin/spark-connect-scala-client`
>  # In the REPL, execute this code:
> ```
> @ def plus1(x: Int): Int = x + 1
> @ val plus1_udf = udf(plus1 _)
> ```
> This will fail with the following error:
> ```
> java.lang.reflect.InaccessibleObjectException: Unable to make private native 
> java.lang.reflect.Field[] java.lang.Class.getDeclaredFields0(boolean) 
> accessible: module java.base does not "opens java.lang" to unnamed module 
> @45099dd3
>   
> java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
>   
> java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
>   java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
>   java.lang.reflect.Method.setAccessible(Method.java:193)
>   
> org.apache.spark.util.ClosureCleaner$.getFinalModifiersFieldForJava17(ClosureCleaner.scala:577)
>   
> org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:560)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18(ClosureCleaner.scala:533)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18$adapted(ClosureCleaner.scala:525)
>   scala.collection.ArrayOps$WithFilter.foreach(ArrayOps.scala:73)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16(ClosureCleaner.scala:525)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16$adapted(ClosureCleaner.scala:522)
>   scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   scala.collection.AbstractIterable.foreach(Iterable.scala:933)
>   scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903)
>   
> org.apache.spark.util.ClosureCleaner$.cleanupAmmoniteReplClosure(ClosureCleaner.scala:522)
>   org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:251)
>   
> org.apache.spark.sql.expressions.SparkConnectClosureCleaner$.clean(UserDefinedFunction.scala:210)
>   
> org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:187)
>   
> org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:180)
>   org.apache.spark.sql.functions$.udf(functions.scala:7956)
>   ammonite.$sess.cmd1$Helper.(cmd1.sc:1)
>   ammonite.$sess.cmd1$.(cmd1.sc:7)
> ```
>  
> This is because ClosureCleaner is heavily reliant on using reflection API and 
> is not compatible with Java 17. The rest of Spark bypasses this by adding 
> `--add-opens` JVM flags, see 
> https://issues.apache.org/jira/browse/SPARK-36796. We need to add these 
> options to Spark Connect Client launch script as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46409) Spark Connect Repl does not work with ClosureCleaner

2023-12-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46409?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-46409:


Assignee: Vsevolod Stepanov

> Spark Connect Repl does not work with ClosureCleaner
> 
>
> Key: SPARK-46409
> URL: https://issues.apache.org/jira/browse/SPARK-46409
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Vsevolod Stepanov
>Assignee: Vsevolod Stepanov
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-45136 added ClosureCleaner support to SparkConnect client. 
> Unfortunately, this change breaks ConnectRepl launched by 
> `./connector/connect/bin/spark-connect-scala-client`. To reproduce the issue:
>  # Run `./connector/connect/bin/spark-connect-shell`
>  # Run  `./connector/connect/bin/spark-connect-scala-client`
>  # In the REPL, execute this code:
> ```
> @ def plus1(x: Int): Int = x + 1
> @ val plus1_udf = udf(plus1 _)
> ```
> This will fail with the following error:
> ```
> java.lang.reflect.InaccessibleObjectException: Unable to make private native 
> java.lang.reflect.Field[] java.lang.Class.getDeclaredFields0(boolean) 
> accessible: module java.base does not "opens java.lang" to unnamed module 
> @45099dd3
>   
> java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:354)
>   
> java.lang.reflect.AccessibleObject.checkCanSetAccessible(AccessibleObject.java:297)
>   java.lang.reflect.Method.checkCanSetAccessible(Method.java:199)
>   java.lang.reflect.Method.setAccessible(Method.java:193)
>   
> org.apache.spark.util.ClosureCleaner$.getFinalModifiersFieldForJava17(ClosureCleaner.scala:577)
>   
> org.apache.spark.util.ClosureCleaner$.setFieldAndIgnoreModifiers(ClosureCleaner.scala:560)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18(ClosureCleaner.scala:533)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$18$adapted(ClosureCleaner.scala:525)
>   scala.collection.ArrayOps$WithFilter.foreach(ArrayOps.scala:73)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16(ClosureCleaner.scala:525)
>   
> org.apache.spark.util.ClosureCleaner$.$anonfun$cleanupAmmoniteReplClosure$16$adapted(ClosureCleaner.scala:522)
>   scala.collection.IterableOnceOps.foreach(IterableOnce.scala:576)
>   scala.collection.IterableOnceOps.foreach$(IterableOnce.scala:574)
>   scala.collection.AbstractIterable.foreach(Iterable.scala:933)
>   scala.collection.IterableOps$WithFilter.foreach(Iterable.scala:903)
>   
> org.apache.spark.util.ClosureCleaner$.cleanupAmmoniteReplClosure(ClosureCleaner.scala:522)
>   org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:251)
>   
> org.apache.spark.sql.expressions.SparkConnectClosureCleaner$.clean(UserDefinedFunction.scala:210)
>   
> org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:187)
>   
> org.apache.spark.sql.expressions.ScalarUserDefinedFunction$.apply(UserDefinedFunction.scala:180)
>   org.apache.spark.sql.functions$.udf(functions.scala:7956)
>   ammonite.$sess.cmd1$Helper.(cmd1.sc:1)
>   ammonite.$sess.cmd1$.(cmd1.sc:7)
> ```
>  
> This is because ClosureCleaner is heavily reliant on using reflection API and 
> is not compatible with Java 17. The rest of Spark bypasses this by adding 
> `--add-opens` JVM flags, see 
> https://issues.apache.org/jira/browse/SPARK-36796. We need to add these 
> options to Spark Connect Client launch script as well



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23890) Hive ALTER TABLE CHANGE COLUMN for struct type no longer works

2023-12-15 Thread Andrew Otto (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Otto resolved SPARK-23890.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Ah! This is supported in DataSource v2 after all, except just not via CHANGE 
COLUMN.  Instead, you can add a column to a nested field by addressing it with 
dotted notation:

 
ALTER TABLE otto.test_table03 ADD COLUMN s1.s1_f2_added STRING;

> Hive ALTER TABLE CHANGE COLUMN for struct type no longer works
> --
>
> Key: SPARK-23890
> URL: https://issues.apache.org/jira/browse/SPARK-23890
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.0.0, 3.0.0
>Reporter: Andrew Otto
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 3.0.0
>
>
> As part of SPARK-14118, Spark SQL removed support for sending ALTER TABLE 
> CHANGE COLUMN commands to Hive.  This restriction was loosened in 
> [https://github.com/apache/spark/pull/12714] to allow for those commands if 
> they only change the column comment.
> Wikimedia has been evolving Parquet backed Hive tables with data originally 
> from JSON events by adding newly found columns to the Hive table schema, via 
> a Spark job we call 'Refine'.  We do this by recursively merging an input 
> DataFrame schema with a Hive table DataFrame schema, finding new fields, and 
> then issuing an ALTER TABLE statement to add the columns.  However, because 
> we allow for nested data types in the incoming JSON data, we make extensive 
> use of struct type fields.  In order to add newly detected fields in a nested 
> data type, we must alter the struct column and append the nested struct 
> field.  This requires CHANGE COLUMN that alters the column type.  In reality, 
> the 'type' of the column is not changing, it just just a new field being 
> added to the struct, but to SQL, this looks like a type change.
> -We were about to upgrade to Spark 2 but this new restriction in SQL DDL that 
> can be sent to Hive will block us.  I believe this is fixable by adding an 
> exception in 
> [command/ddl.scala|https://github.com/apache/spark/blob/v2.3.0/sql/core/src/main/scala/org/apache/spark/sql/execution/command/ddl.scala#L294-L325]
>  to allow ALTER TABLE CHANGE COLUMN with a new type, if the original type and 
> destination type are both struct types, and the destination type only adds 
> new fields.-
>  
> In this [PR|https://github.com/apache/spark/pull/21012], I was told that the 
> Spark 3 datasource v2 would support this.
> However, it is clear that it does not.  There is an [explicit 
> check|https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckAnalysis.scala#L1441]
>  and 
> [test|https://github.com/apache/spark/blob/e3f46ed57dc063566cdb9425b4d5e02c65332df1/sql/core/src/test/scala/org/apache/spark/sql/connector/AlterTableTests.scala#L583]
>  that prevents this from happening.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46422) Move `test_window` to `pyspark.pandas.tests.window.*`

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46422?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46422:
---
Labels: pull-request-available  (was: )

> Move `test_window` to `pyspark.pandas.tests.window.*`
> -
>
> Key: SPARK-46422
> URL: https://issues.apache.org/jira/browse/SPARK-46422
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46422) Move `test_window` to `pyspark.pandas.tests.window.*`

2023-12-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46422:
-

 Summary: Move `test_window` to `pyspark.pandas.tests.window.*`
 Key: SPARK-46422
 URL: https://issues.apache.org/jira/browse/SPARK-46422
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46421) Broken support for explode on a Map in typed API

2023-12-15 Thread Emil Ejbyfeldt (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Emil Ejbyfeldt updated SPARK-46421:
---
Description: 
 
{code:java}
scala> spark.createDataset(Seq(Tuple1(Map(1 -> 
2.select(explode($"_1").as[(Int, Int)])
org.apache.spark.sql.AnalysisException: 
[UNSUPPORTED_DESERIALIZER.FIELD_NUMBER_MISMATCH] The deserializer is not 
supported: try to map "STRUCT" to Tuple1, but failed as 
the number of fields does not line up.
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.fieldNumberMismatchForDeserializerError(QueryCompilationErrors.scala:357)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$.fail(Analyzer.scala:3494)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveDeserializer$$validateTopLevelTupleFields(Analyzer.scala:3510)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52$$anonfun$applyOrElse$228.applyOrElse(Analyzer.scala:3462)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52$$anonfun$applyOrElse$228.applyOrElse(Analyzer.scala:3454)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:167)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:208)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:208)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:219)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:229)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:304)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:229)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDownWithPruning(QueryPlan.scala:167)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsWithPruning(QueryPlan.scala:138)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52.applyOrElse(Analyzer.scala:3454)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52.applyOrElse(Analyzer.scala:3449)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:32)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214)
  at 
org.apache.spark.sql.catalyst.plans.logical.MapElements.mapChildren(object.scala:223)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:32)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214)
  at 

[jira] [Created] (SPARK-46421) Broken support for explode on a Map in typed API

2023-12-15 Thread Emil Ejbyfeldt (Jira)
Emil Ejbyfeldt created SPARK-46421:
--

 Summary: Broken support for explode on a Map in typed API
 Key: SPARK-46421
 URL: https://issues.apache.org/jira/browse/SPARK-46421
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0
Reporter: Emil Ejbyfeldt


```
scala> spark.createDataset(Seq(Tuple1(Map(1 -> 
2.select(explode($"_1").as[(Int, Int)])
org.apache.spark.sql.AnalysisException: 
[UNSUPPORTED_DESERIALIZER.FIELD_NUMBER_MISMATCH] The deserializer is not 
supported: try to map "STRUCT" to Tuple1, but failed as 
the number of fields does not line up.
  at 
org.apache.spark.sql.errors.QueryCompilationErrors$.fieldNumberMismatchForDeserializerError(QueryCompilationErrors.scala:357)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$.fail(Analyzer.scala:3494)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveDeserializer$$validateTopLevelTupleFields(Analyzer.scala:3510)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52$$anonfun$applyOrElse$228.applyOrElse(Analyzer.scala:3462)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52$$anonfun$applyOrElse$228.applyOrElse(Analyzer.scala:3454)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDownWithPruning$1(TreeNode.scala:461)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDownWithPruning(TreeNode.scala:461)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$transformExpressionsDownWithPruning$1(QueryPlan.scala:167)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$1(QueryPlan.scala:208)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpression$1(QueryPlan.scala:208)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.recursiveTransform$1(QueryPlan.scala:219)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.$anonfun$mapExpressions$4(QueryPlan.scala:229)
  at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:304)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.mapExpressions(QueryPlan.scala:229)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsDownWithPruning(QueryPlan.scala:167)
  at 
org.apache.spark.sql.catalyst.plans.QueryPlan.transformExpressionsWithPruning(QueryPlan.scala:138)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52.applyOrElse(Analyzer.scala:3454)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveDeserializer$$anonfun$apply$52.applyOrElse(Analyzer.scala:3449)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$3(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(origin.scala:76)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:138)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:32)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren(TreeNode.scala:1215)
  at 
org.apache.spark.sql.catalyst.trees.UnaryLike.mapChildren$(TreeNode.scala:1214)
  at 
org.apache.spark.sql.catalyst.plans.logical.MapElements.mapChildren(object.scala:223)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$1(AnalysisHelper.scala:135)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$.allowInvokingTransformsInAnalyzer(AnalysisHelper.scala:323)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning(AnalysisHelper.scala:134)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.resolveOperatorsUpWithPruning$(AnalysisHelper.scala:130)
  at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.resolveOperatorsUpWithPruning(LogicalPlan.scala:32)
  at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper.$anonfun$resolveOperatorsUpWithPruning$2(AnalysisHelper.scala:135)
  at 

[jira] [Updated] (SPARK-46420) Remove unused transport from SparkSQLCLIDriver

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46420:
---
Labels: pull-request-available  (was: )

> Remove unused transport from SparkSQLCLIDriver
> --
>
> Key: SPARK-46420
> URL: https://issues.apache.org/jira/browse/SPARK-46420
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46420) Remove unused transport from SparkSQLCLIDriver

2023-12-15 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-46420:
-

 Summary: Remove unused transport from SparkSQLCLIDriver
 Key: SPARK-46420
 URL: https://issues.apache.org/jira/browse/SPARK-46420
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-46419) Reorganize `DatetimeIndexTests`: Factor out 3 slow tests

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46419?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-46419:
---
Labels: pull-request-available  (was: )

> Reorganize `DatetimeIndexTests`: Factor out 3 slow tests
> 
>
> Key: SPARK-46419
> URL: https://issues.apache.org/jira/browse/SPARK-46419
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-46419) Reorganize `DatetimeIndexTests`: Factor out 3 slow tests

2023-12-15 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-46419:
-

 Summary: Reorganize `DatetimeIndexTests`: Factor out 3 slow tests
 Key: SPARK-46419
 URL: https://issues.apache.org/jira/browse/SPARK-46419
 Project: Spark
  Issue Type: Test
  Components: PS, Tests
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-40876) Spark's Vectorized ParquetReader should support type promotions

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-40876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-40876:
---
Labels: pull-request-available  (was: )

> Spark's Vectorized ParquetReader should support type promotions
> ---
>
> Key: SPARK-40876
> URL: https://issues.apache.org/jira/browse/SPARK-40876
> Project: Spark
>  Issue Type: Improvement
>  Components: Input/Output
>Affects Versions: 3.3.0
>Reporter: Alexey Kudinkin
>Priority: Major
>  Labels: pull-request-available
>
> Currently, when reading Parquet table using Spark's `VectorizedColumnReader`, 
> we hit an issue where we specify requested (projection) schema where one of 
> the field's type is widened from int32 to long.
> Expectation is that since this is totally legitimate primitive type 
> promotion, we should be able to read Ints into Longs w/ no problems (for ex, 
> Avro is able to do that perfectly fine).
> However, we're facing an issue where `ParquetVectorUpdaterFactory.getUpdater` 
> method fails w/ the exception listed below.
> Looking at the code, It actually seems to be allowing the opposite – it 
> allows to "down-size" Int32s persisted in the Parquet to be read as Bytes or 
> Shorts for ex. I'm actually not sure what's the rationale for this behavior, 
> and this actually seems like a bug to me (as this will essentially be leading 
> to data truncation):
> {code:java}
> case INT32:
>   if (sparkType == DataTypes.IntegerType || canReadAsIntDecimal(descriptor, 
> sparkType)) {
> return new IntegerUpdater();
>   } else if (sparkType == DataTypes.LongType && isUnsignedIntTypeMatched(32)) 
> {
> // In `ParquetToSparkSchemaConverter`, we map parquet UINT32 to our 
> LongType.
> // For unsigned int32, it stores as plain signed int32 in Parquet when 
> dictionary
> // fallbacks. We read them as long values.
> return new UnsignedIntegerUpdater();
>   } else if (sparkType == DataTypes.ByteType) {
> return new ByteUpdater();
>   } else if (sparkType == DataTypes.ShortType) {
> return new ShortUpdater();
>   } else if (sparkType == DataTypes.DateType) {
> if ("CORRECTED".equals(datetimeRebaseMode)) {
>   return new IntegerUpdater();
> } else {
>   boolean failIfRebase = "EXCEPTION".equals(datetimeRebaseMode);
>   return new IntegerWithRebaseUpdater(failIfRebase);
> }
>   }
>   break; {code}
> Exception:
> {code:java}
> at 
> org.apache.spark.scheduler.DAGScheduler.failJobAndIndependentStages(DAGScheduler.scala:2454)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2(DAGScheduler.scala:2403)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$abortStage$2$adapted(DAGScheduler.scala:2402)
>     at 
> scala.collection.mutable.ResizableArray.foreach(ResizableArray.scala:62)
>     at 
> scala.collection.mutable.ResizableArray.foreach$(ResizableArray.scala:55)
>     at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:49)
>     at 
> org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:2402)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1(DAGScheduler.scala:1160)
>     at 
> org.apache.spark.scheduler.DAGScheduler.$anonfun$handleTaskSetFailed$1$adapted(DAGScheduler.scala:1160)
>     at scala.Option.foreach(Option.scala:407)
>     at 
> org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:1160)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.doOnReceive(DAGScheduler.scala:2642)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2584)
>     at 
> org.apache.spark.scheduler.DAGSchedulerEventProcessLoop.onReceive(DAGScheduler.scala:2573)
>     at org.apache.spark.util.EventLoop$$anon$1.run(EventLoop.scala:49)
>     at org.apache.spark.scheduler.DAGScheduler.runJob(DAGScheduler.scala:938)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2214)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2235)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2254)
>     at org.apache.spark.SparkContext.runJob(SparkContext.scala:2279)
>     at org.apache.spark.rdd.RDD.$anonfun$collect$1(RDD.scala:1030)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>     at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>     at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>     at org.apache.spark.rdd.RDD.collect(RDD.scala:1029)
>     at org.apache.spark.RangePartitioner$.sketch(Partitioner.scala:304)
>     at org.apache.spark.RangePartitioner.(Partitioner.scala:171)
>     at 
> 

[jira] [Resolved] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-15 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao resolved SPARK-46417.
--
Fix Version/s: 3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 44364
[https://github.com/apache/spark/pull/44364]

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.1, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-15 Thread Kent Yao (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Kent Yao reassigned SPARK-46417:


Assignee: Wenchen Fan

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-46418) Reorganize `ReshapeTests`

2023-12-15 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-46418.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 44365
[https://github.com/apache/spark/pull/44365]

> Reorganize `ReshapeTests`
> -
>
> Key: SPARK-46418
> URL: https://issues.apache.org/jira/browse/SPARK-46418
> Project: Spark
>  Issue Type: Test
>  Components: PS, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46417:
--

Assignee: (was: Apache Spark)

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46417:
--

Assignee: Apache Spark

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45597:
--

Assignee: (was: Apache Spark)

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45597:
--

Assignee: Apache Spark

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45597:
--

Assignee: Apache Spark

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-45597) Support creating table using a Python data source in SQL

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-45597?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45597:
--

Assignee: (was: Apache Spark)

> Support creating table using a Python data source in SQL
> 
>
> Key: SPARK-45597
> URL: https://issues.apache.org/jira/browse/SPARK-45597
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Support creating table using a Python data source in SQL query:
> For instance:
> `CREATE TABLE tableName() USING  OPTIONS 
> `



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46417:
--

Assignee: (was: Apache Spark)

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-46417) do not fail when calling getTable and throwException is false

2023-12-15 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-46417?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-46417:
--

Assignee: Apache Spark

> do not fail when calling getTable and throwException is false
> -
>
> Key: SPARK-46417
> URL: https://issues.apache.org/jira/browse/SPARK-46417
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org