[jira] [Assigned] (SPARK-48662) Fix StructsToXml expression with collations

2024-06-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48662:


Assignee: Mihailo Milosevic

> Fix StructsToXml expression with collations
> ---
>
> Key: SPARK-48662
> URL: https://issues.apache.org/jira/browse/SPARK-48662
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48662) Fix StructsToXml expression with collations

2024-06-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48662?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48662.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47053
[https://github.com/apache/spark/pull/47053]

> Fix StructsToXml expression with collations
> ---
>
> Key: SPARK-48662
> URL: https://issues.apache.org/jira/browse/SPARK-48662
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Assignee: Mihailo Milosevic
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48672) Update Jakarta Servlet reference in security page

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48672:


Assignee: Cheng Pan

> Update Jakarta Servlet reference in security page
> -
>
> Key: SPARK-48672
> URL: https://issues.apache.org/jira/browse/SPARK-48672
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48672) Update Jakarta Servlet reference in security page

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48672?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48672.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47044
[https://github.com/apache/spark/pull/47044]

> Update Jakarta Servlet reference in security page
> -
>
> Key: SPARK-48672
> URL: https://issues.apache.org/jira/browse/SPARK-48672
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Assignee: Cheng Pan
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48677) Upgrade `scalafmt` to 3.8.2

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48677.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47048
[https://github.com/apache/spark/pull/47048]

> Upgrade `scalafmt` to 3.8.2
> ---
>
> Key: SPARK-48677
> URL: https://issues.apache.org/jira/browse/SPARK-48677
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48653) Fix Python data source error class references

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48653.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47013
[https://github.com/apache/spark/pull/47013]

> Fix Python data source error class references
> -
>
> Key: SPARK-48653
> URL: https://issues.apache.org/jira/browse/SPARK-48653
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Fix invalid error class references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48653) Fix Python data source error class references

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48653:


Assignee: Allison Wang

> Fix Python data source error class references
> -
>
> Key: SPARK-48653
> URL: https://issues.apache.org/jira/browse/SPARK-48653
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Fix invalid error class references.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48635) Assign classes to join type errors and as-of join error

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48635:


Assignee: Wei Guo

>  Assign classes to join type errors  and as-of join error
> -
>
> Key: SPARK-48635
> URL: https://issues.apache.org/jira/browse/SPARK-48635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
>  Labels: pull-request-available
>
> job type errors: 
> LEGACY_ERROR_TEMP[1319, 3216]
> as-of join error:
> _LEGACY_ERROR_TEMP_3217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48635) Assign classes to join type errors and as-of join error

2024-06-20 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48635.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46994
[https://github.com/apache/spark/pull/46994]

>  Assign classes to join type errors  and as-of join error
> -
>
> Key: SPARK-48635
> URL: https://issues.apache.org/jira/browse/SPARK-48635
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> job type errors: 
> LEGACY_ERROR_TEMP[1319, 3216]
> as-of join error:
> _LEGACY_ERROR_TEMP_3217



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-48567:
--
  Assignee: (was: Wei Liu)

Reverted at 
https://github.com/apache/spark/commit/d067fc6c1635dfe7730223021e912e78637bb791

> Pyspark StreamingQuery lastProgress and friend should return actual 
> StreamingQueryProgress
> --
>
> Key: SPARK-48567
> URL: https://issues.apache.org/jira/browse/SPARK-48567
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-48567:
-
Fix Version/s: (was: 4.0.0)

> Pyspark StreamingQuery lastProgress and friend should return actual 
> StreamingQueryProgress
> --
>
> Key: SPARK-48567
> URL: https://issues.apache.org/jira/browse/SPARK-48567
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48634:


Assignee: Hyukjin Kwon

> Avoid statically initialize threadpool at 
> ExecutePlanResponseReattachableIterator
> -
>
> Key: SPARK-48634
> URL: https://issues.apache.org/jira/browse/SPARK-48634
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to 
> initialize ThreadPool which might be dragged in pickling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48634.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46993
[https://github.com/apache/spark/pull/46993]

> Avoid statically initialize threadpool at 
> ExecutePlanResponseReattachableIterator
> -
>
> Key: SPARK-48634
> URL: https://issues.apache.org/jira/browse/SPARK-48634
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to 
> initialize ThreadPool which might be dragged in pickling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48646) Refine Python data source API docstring and type hints

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48646?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48646.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47003
[https://github.com/apache/spark/pull/47003]

> Refine Python data source API docstring and type hints
> --
>
> Key: SPARK-48646
> URL: https://issues.apache.org/jira/browse/SPARK-48646
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Improve the type hints and docstrings for datasource.py



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48459.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46789
[https://github.com/apache/spark/pull/46789]

> Implement DataFrameQueryContext in Spark Connect
> 
>
> Key: SPARK-48459
> URL: https://issues.apache.org/jira/browse/SPARK-48459
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Implements the same https://github.com/apache/spark/pull/45377 in Spark 
> Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48459:


Assignee: Hyukjin Kwon

> Implement DataFrameQueryContext in Spark Connect
> 
>
> Key: SPARK-48459
> URL: https://issues.apache.org/jira/browse/SPARK-48459
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Implements the same https://github.com/apache/spark/pull/45377 in Spark 
> Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48647) Refine the error message for YearMonthIntervalType in df.collect

2024-06-18 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48647.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47004
[https://github.com/apache/spark/pull/47004]

> Refine the error message for YearMonthIntervalType in df.collect
> 
>
> Key: SPARK-48647
> URL: https://issues.apache.org/jira/browse/SPARK-48647
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48648) Make tags properly threadlocal

2024-06-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48648:


Assignee: Hyukjin Kwon

> Make tags properly threadlocal
> --
>
> Key: SPARK-48648
> URL: https://issues.apache.org/jira/browse/SPARK-48648
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Caused by https://github.com/apache/spark/pull/44210 which does not use 
> threadlocal properly but just use the value at the class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48648) Make tags properly threadlocal

2024-06-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48648.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 47005
[https://github.com/apache/spark/pull/47005]

> Make tags properly threadlocal
> --
>
> Key: SPARK-48648
> URL: https://issues.apache.org/jira/browse/SPARK-48648
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Caused by https://github.com/apache/spark/pull/44210 which does not use 
> threadlocal properly but just use the value at the class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48648) Make tags properly threadlocal

2024-06-17 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48648:


 Summary: Make tags properly threadlocal
 Key: SPARK-48648
 URL: https://issues.apache.org/jira/browse/SPARK-48648
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Caused by https://github.com/apache/spark/pull/44210 which does not use 
threadlocal properly but just use the value at the class.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48497) Add user guide for batch data source write API

2024-06-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48497:


Assignee: Allison Wang

> Add user guide for batch data source write API
> --
>
> Key: SPARK-48497
> URL: https://issues.apache.org/jira/browse/SPARK-48497
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Add examples for batch data source write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48497) Add user guide for batch data source write API

2024-06-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48497.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46833
[https://github.com/apache/spark/pull/46833]

> Add user guide for batch data source write API
> --
>
> Key: SPARK-48497
> URL: https://issues.apache.org/jira/browse/SPARK-48497
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Assignee: Allison Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Add examples for batch data source write.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress

2024-06-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48567:


Assignee: Wei Liu

> Pyspark StreamingQuery lastProgress and friend should return actual 
> StreamingQueryProgress
> --
>
> Key: SPARK-48567
> URL: https://issues.apache.org/jira/browse/SPARK-48567
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48567) Pyspark StreamingQuery lastProgress and friend should return actual StreamingQueryProgress

2024-06-17 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48567.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46921
[https://github.com/apache/spark/pull/46921]

> Pyspark StreamingQuery lastProgress and friend should return actual 
> StreamingQueryProgress
> --
>
> Key: SPARK-48567
> URL: https://issues.apache.org/jira/browse/SPARK-48567
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48633) Upgrade scalacheck to 1.18.0

2024-06-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48633?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48633.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46992
[https://github.com/apache/spark/pull/46992]

> Upgrade scalacheck to 1.18.0
> 
>
> Key: SPARK-48633
> URL: https://issues.apache.org/jira/browse/SPARK-48633
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Wei Guo
>Assignee: Wei Guo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48639) Add Origin to RelationCommon in protobuf defnition

2024-06-16 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48639:


 Summary: Add Origin to RelationCommon in protobuf defnition
 Key: SPARK-48639
 URL: https://issues.apache.org/jira/browse/SPARK-48639
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


SPARK-48459 adds the new protobuf message for Origin. We should reuse the 
definition in `RelationCommon` as well.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48555) Support Column type for several SQL functions in scala and python

2024-06-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48555:


Assignee: Ron Serruya

> Support Column type for several SQL functions in scala and python
> -
>
> Key: SPARK-48555
> URL: https://issues.apache.org/jira/browse/SPARK-48555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark, Spark Core
>Affects Versions: 3.5.1
>Reporter: Ron Serruya
>Assignee: Ron Serruya
>Priority: Major
>  Labels: pull-request-available
>
> Currently, several SQL functions accept both native types and Columns, but 
> only accept native types in their scala/python APIs:
> * array_remove (works in SQL, scala, not in python)
> * array_position(works in SQL, scala, not in python)
> * map_contains_key (works in SQL, scala, not in python)
> * substring (works only in SQL)
> For example, this is possible in SQL:
> {code:python}
> spark.sql("select array_remove(col1, col2) from values(array(1,2,3), 2)")
> {code}
> But not in python:
> {code:python}
> df.select(F.array_remove(F.col("col1"), F.col("col2"))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48555) Support Column type for several SQL functions in scala and python

2024-06-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48555?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48555.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46901
[https://github.com/apache/spark/pull/46901]

> Support Column type for several SQL functions in scala and python
> -
>
> Key: SPARK-48555
> URL: https://issues.apache.org/jira/browse/SPARK-48555
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, PySpark, Spark Core
>Affects Versions: 3.5.1
>Reporter: Ron Serruya
>Assignee: Ron Serruya
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Currently, several SQL functions accept both native types and Columns, but 
> only accept native types in their scala/python APIs:
> * array_remove (works in SQL, scala, not in python)
> * array_position(works in SQL, scala, not in python)
> * map_contains_key (works in SQL, scala, not in python)
> * substring (works only in SQL)
> For example, this is possible in SQL:
> {code:python}
> spark.sql("select array_remove(col1, col2) from values(array(1,2,3), 2)")
> {code}
> But not in python:
> {code:python}
> df.select(F.array_remove(F.col("col1"), F.col("col2"))
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47777) Add spark connect test for python streaming data source

2024-06-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-4.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46906
[https://github.com/apache/spark/pull/46906]

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47777) Add spark connect test for python streaming data source

2024-06-16 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-4:


Assignee: Chaoqin Li

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48302) Preserve nulls in map columns in PyArrow Tables

2024-06-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48302.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46837
[https://github.com/apache/spark/pull/46837]

> Preserve nulls in map columns in PyArrow Tables
> ---
>
> Key: SPARK-48302
> URL: https://issues.apache.org/jira/browse/SPARK-48302
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Because of a limitation in PyArrow, when PyArrow Tables containing MapArray 
> columns with nested fields or timestamps are passed to 
> {{{}spark.createDataFrame(){}}}, null values in the MapArray columns are 
> replaced with empty lists.
> The PySpark function where this happens is 
> {{{}pyspark.sql.pandas.types._check_arrow_array_timestamps_localize{}}}.
> Also see [https://github.com/apache/arrow/issues/41684].
> See the skipped tests and the TODO mentioning SPARK-48302.
> [Update] A fix for this has been implemented in PyArrow in 
> [https://github.com/apache/arrow/pull/41757] by adding a {{mask}} argument to 
> {{{}pa.MapArray.from_arrays{}}}. This will be released in PyArrow 17.0.0. 
> Since older versions of PyArrow (which PySpark will still support for a 
> while) won't have this argument, we will need to do a check like:
> {{LooseVersion(pa.\_\_version\_\_) >= LooseVersion("17.0.0")}}
> or
> {{from inspect import signature}}
> {{"mask" in signature(pa.MapArray.from_arrays).parameters}}
> and only pass {{mask}} if that's true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48302) Preserve nulls in map columns in PyArrow Tables

2024-06-15 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48302?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48302:


Assignee: Ian Cook

> Preserve nulls in map columns in PyArrow Tables
> ---
>
> Key: SPARK-48302
> URL: https://issues.apache.org/jira/browse/SPARK-48302
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> Because of a limitation in PyArrow, when PyArrow Tables containing MapArray 
> columns with nested fields or timestamps are passed to 
> {{{}spark.createDataFrame(){}}}, null values in the MapArray columns are 
> replaced with empty lists.
> The PySpark function where this happens is 
> {{{}pyspark.sql.pandas.types._check_arrow_array_timestamps_localize{}}}.
> Also see [https://github.com/apache/arrow/issues/41684].
> See the skipped tests and the TODO mentioning SPARK-48302.
> [Update] A fix for this has been implemented in PyArrow in 
> [https://github.com/apache/arrow/pull/41757] by adding a {{mask}} argument to 
> {{{}pa.MapArray.from_arrays{}}}. This will be released in PyArrow 17.0.0. 
> Since older versions of PyArrow (which PySpark will still support for a 
> while) won't have this argument, we will need to do a check like:
> {{LooseVersion(pa.\_\_version\_\_) >= LooseVersion("17.0.0")}}
> or
> {{from inspect import signature}}
> {{"mask" in signature(pa.MapArray.from_arrays).parameters}}
> and only pass {{mask}} if that's true.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48634) Avoid statically initialize threadpool at ExecutePlanResponseReattachableIterator

2024-06-14 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48634:


 Summary: Avoid statically initialize threadpool at 
ExecutePlanResponseReattachableIterator
 Key: SPARK-48634
 URL: https://issues.apache.org/jira/browse/SPARK-48634
 Project: Spark
  Issue Type: Bug
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Avoid having ExecutePlanResponseReattachableIterator._release_thread_pool to 
initialize ThreadPool which might be dragged in pickling.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48593) Fix the string representation of lambda function

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48593.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46948
[https://github.com/apache/spark/pull/46948]

> Fix the string representation of lambda function
> 
>
> Key: SPARK-48593
> URL: https://issues.apache.org/jira/browse/SPARK-48593
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48593) Fix the string representation of lambda function

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48593:


Assignee: Ruifeng Zheng

> Fix the string representation of lambda function
> 
>
> Key: SPARK-48593
> URL: https://issues.apache.org/jira/browse/SPARK-48593
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48421) SPJ: Add documentation

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48421:


Assignee: Szehon Ho

> SPJ: Add documentation
> --
>
> Key: SPARK-48421
> URL: https://issues.apache.org/jira/browse/SPARK-48421
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
>
> As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed 
> there is no documentation describing the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48421) SPJ: Add documentation

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48421.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46745
[https://github.com/apache/spark/pull/46745]

> SPJ: Add documentation
> --
>
> Key: SPARK-48421
> URL: https://issues.apache.org/jira/browse/SPARK-48421
> Project: Spark
>  Issue Type: Documentation
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> As part of SPARK-48329, we mentioned "Storage Partition Join" but noticed 
> there is no documentation describing the same.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48591) Simplify the if-else branches with `F.lit`

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48591.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46946
[https://github.com/apache/spark/pull/46946]

> Simplify the if-else branches with `F.lit`
> --
>
> Key: SPARK-48591
> URL: https://issues.apache.org/jira/browse/SPARK-48591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48591) Simplify the if-else branches with `F.lit`

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48591:


Assignee: Ruifeng Zheng

> Simplify the if-else branches with `F.lit`
> --
>
> Key: SPARK-48591
> URL: https://issues.apache.org/jira/browse/SPARK-48591
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48598) Propagate cached schema in dataframe operations

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48598:


Assignee: Ruifeng Zheng

> Propagate cached schema in dataframe operations
> ---
>
> Key: SPARK-48598
> URL: https://issues.apache.org/jira/browse/SPARK-48598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48598) Propagate cached schema in dataframe operations

2024-06-12 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48598.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46954
[https://github.com/apache/spark/pull/46954]

> Propagate cached schema in dataframe operations
> ---
>
> Key: SPARK-48598
> URL: https://issues.apache.org/jira/browse/SPARK-48598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48569) Connect - StreamingQuery.name should return null when not specified

2024-06-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48569.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46920
[https://github.com/apache/spark/pull/46920]

> Connect - StreamingQuery.name should return null when not specified
> ---
>
> Key: SPARK-48569
> URL: https://issues.apache.org/jira/browse/SPARK-48569
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, SS
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48564) Propagate cached schema in set operations

2024-06-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48564.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46915
[https://github.com/apache/spark/pull/46915]

> Propagate cached schema in set operations
> -
>
> Key: SPARK-48564
> URL: https://issues.apache.org/jira/browse/SPARK-48564
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48564) Propagate cached schema in set operations

2024-06-10 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48564?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48564:


Assignee: Ruifeng Zheng

> Propagate cached schema in set operations
> -
>
> Key: SPARK-48564
> URL: https://issues.apache.org/jira/browse/SPARK-48564
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48560) Make StreamingQueryListener.spark settable

2024-06-09 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48560?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48560.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46909
[https://github.com/apache/spark/pull/46909]

> Make StreamingQueryListener.spark settable
> --
>
> Key: SPARK-48560
> URL: https://issues.apache.org/jira/browse/SPARK-48560
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Downstream users might already implement StreamingQueryListener.spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48552.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46890
[https://github.com/apache/spark/pull/46890]

> multi-line CSV schema inference should also throw FAILED_READ_FILE
> --
>
> Key: SPARK-48552
> URL: https://issues.apache.org/jira/browse/SPARK-48552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48552) multi-line CSV schema inference should also throw FAILED_READ_FILE

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48552?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48552:


Assignee: Wenchen Fan

> multi-line CSV schema inference should also throw FAILED_READ_FILE
> --
>
> Key: SPARK-48552
> URL: https://issues.apache.org/jira/browse/SPARK-48552
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48560) Make StreamingQueryListener.spark settable

2024-06-06 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48560:


 Summary: Make StreamingQueryListener.spark settable
 Key: SPARK-48560
 URL: https://issues.apache.org/jira/browse/SPARK-48560
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Downstream users might already implement StreamingQueryListener.spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47952:


Assignee: TakawaAkirayo  (was: Adam Binford)

> Support retrieving the real SparkConnectService GRPC address and port 
> programmatically when running on Yarn
> ---
>
> Key: SPARK-47952
> URL: https://issues.apache.org/jira/browse/SPARK-47952
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: TakawaAkirayo
>Assignee: TakawaAkirayo
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> 1.User Story:
> Our data analysts and data scientists use Jupyter notebooks provisioned on 
> Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark 
> in the terminal via Yarn Client mode. However, Yarn Client mode consumes 
> significant local memory if the job is heavy, and the total resource pool of 
> k8s for notebooks is limited. To leverage the abundant resources of our 
> Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This 
> allows the driver on Yarn with SparkConnectService started and uses 
> SparkConnect client to connect to the remote driver.
> To provide a seamless experience with one command startup for both server and 
> client, we've wrapped the following processes in one script:
> 1) Start a local coordinator server (implemented by us, not in this PR) with 
> a specified port.
> 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with 
> user-input Spark configurations and the local coordinator server's address 
> and port. Append an additional listener class in the configuration for 
> SparkConnectService callback with the actual address and port on Yarn to the 
> coordinator server.
> 3) Wait for the coordinator server to receive the address callback from the 
> SparkConnectService on Yarn and export the real address.
> 4) Start the client (pyspark --remote) with the remote address.
> Finally, a remote SparkConnect Server is started on Yarn with a local 
> SparkConnect client connected. Users no longer need to start the server 
> beforehand and connect to the remote server after they manually explore the 
> address on Yarn.
> 2.Problem statement of this change:
> 1) The specified port for the SparkConnectService GRPC server might be 
> occupied on the node of the Hadoop Cluster. To increase the success rate of 
> startup, it needs to retry on conflicts rather than fail directly.
> 2) Because the final binding port could be uncertain based on #1 and the 
> remote address is unpredictable on Yarn, we need to retrieve the address and 
> port programmatically and inject it automatically on the start of `pyspark 
> --remote`. The SparkConnectService needs to communicate its location back to 
> the launcher side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47952.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46182
[https://github.com/apache/spark/pull/46182]

> Support retrieving the real SparkConnectService GRPC address and port 
> programmatically when running on Yarn
> ---
>
> Key: SPARK-47952
> URL: https://issues.apache.org/jira/browse/SPARK-47952
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: TakawaAkirayo
>Assignee: Adam Binford
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> 1.User Story:
> Our data analysts and data scientists use Jupyter notebooks provisioned on 
> Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark 
> in the terminal via Yarn Client mode. However, Yarn Client mode consumes 
> significant local memory if the job is heavy, and the total resource pool of 
> k8s for notebooks is limited. To leverage the abundant resources of our 
> Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This 
> allows the driver on Yarn with SparkConnectService started and uses 
> SparkConnect client to connect to the remote driver.
> To provide a seamless experience with one command startup for both server and 
> client, we've wrapped the following processes in one script:
> 1) Start a local coordinator server (implemented by us, not in this PR) with 
> a specified port.
> 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with 
> user-input Spark configurations and the local coordinator server's address 
> and port. Append an additional listener class in the configuration for 
> SparkConnectService callback with the actual address and port on Yarn to the 
> coordinator server.
> 3) Wait for the coordinator server to receive the address callback from the 
> SparkConnectService on Yarn and export the real address.
> 4) Start the client (pyspark --remote) with the remote address.
> Finally, a remote SparkConnect Server is started on Yarn with a local 
> SparkConnect client connected. Users no longer need to start the server 
> beforehand and connect to the remote server after they manually explore the 
> address on Yarn.
> 2.Problem statement of this change:
> 1) The specified port for the SparkConnectService GRPC server might be 
> occupied on the node of the Hadoop Cluster. To increase the success rate of 
> startup, it needs to retry on conflicts rather than fail directly.
> 2) Because the final binding port could be uncertain based on #1 and the 
> remote address is unpredictable on Yarn, we need to retrieve the address and 
> port programmatically and inject it automatically on the start of `pyspark 
> --remote`. The SparkConnectService needs to communicate its location back to 
> the launcher side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47952) Support retrieving the real SparkConnectService GRPC address and port programmatically when running on Yarn

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47952:


Assignee: Adam Binford

> Support retrieving the real SparkConnectService GRPC address and port 
> programmatically when running on Yarn
> ---
>
> Key: SPARK-47952
> URL: https://issues.apache.org/jira/browse/SPARK-47952
> Project: Spark
>  Issue Type: Story
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: TakawaAkirayo
>Assignee: Adam Binford
>Priority: Minor
>  Labels: pull-request-available
>
> 1.User Story:
> Our data analysts and data scientists use Jupyter notebooks provisioned on 
> Kubernetes (k8s) with limited CPU/memory resources to run Spark-shell/pyspark 
> in the terminal via Yarn Client mode. However, Yarn Client mode consumes 
> significant local memory if the job is heavy, and the total resource pool of 
> k8s for notebooks is limited. To leverage the abundant resources of our 
> Hadoop cluster for scalability purposes, we aim to utilize SparkConnect. This 
> allows the driver on Yarn with SparkConnectService started and uses 
> SparkConnect client to connect to the remote driver.
> To provide a seamless experience with one command startup for both server and 
> client, we've wrapped the following processes in one script:
> 1) Start a local coordinator server (implemented by us, not in this PR) with 
> a specified port.
> 2) Start SparkConnectServer by spark-submit via Yarn Cluster mode with 
> user-input Spark configurations and the local coordinator server's address 
> and port. Append an additional listener class in the configuration for 
> SparkConnectService callback with the actual address and port on Yarn to the 
> coordinator server.
> 3) Wait for the coordinator server to receive the address callback from the 
> SparkConnectService on Yarn and export the real address.
> 4) Start the client (pyspark --remote) with the remote address.
> Finally, a remote SparkConnect Server is started on Yarn with a local 
> SparkConnect client connected. Users no longer need to start the server 
> beforehand and connect to the remote server after they manually explore the 
> address on Yarn.
> 2.Problem statement of this change:
> 1) The specified port for the SparkConnectService GRPC server might be 
> occupied on the node of the Hadoop Cluster. To increase the success rate of 
> startup, it needs to retry on conflicts rather than fail directly.
> 2) Because the final binding port could be uncertain based on #1 and the 
> remote address is unpredictable on Yarn, we need to retrieve the address and 
> port programmatically and inject it automatically on the start of `pyspark 
> --remote`. The SparkConnectService needs to communicate its location back to 
> the launcher side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48550) Directly use the parent Window class

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48550.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46892
[https://github.com/apache/spark/pull/46892]

> Directly use the parent Window class
> 
>
> Key: SPARK-48550
> URL: https://issues.apache.org/jira/browse/SPARK-48550
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48550) Directly use the parent Window class

2024-06-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48550?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48550:


Assignee: Ruifeng Zheng

> Directly use the parent Window class
> 
>
> Key: SPARK-48550
> URL: https://issues.apache.org/jira/browse/SPARK-48550
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48533) Add test for cached schema

2024-06-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48533:


Assignee: Ruifeng Zheng

> Add test for cached schema
> --
>
> Key: SPARK-48533
> URL: https://issues.apache.org/jira/browse/SPARK-48533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48533) Add test for cached schema

2024-06-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48533?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48533.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46871
[https://github.com/apache/spark/pull/46871]

> Add test for cached schema
> --
>
> Key: SPARK-48533
> URL: https://issues.apache.org/jira/browse/SPARK-48533
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48534) Support interruptOperation in streaming queries

2024-06-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48534:


 Summary: Support interruptOperation in streaming queries
 Key: SPARK-48534
 URL: https://issues.apache.org/jira/browse/SPARK-48534
 Project: Spark
  Issue Type: Improvement
  Components: Connect
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Similar with https://issues.apache.org/jira/browse/SPARK-48485 but we should 
also add interruptOperation 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`

2024-06-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48523.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46862
[https://github.com/apache/spark/pull/46862]

> Add `grpc_max_message_size ` description to `client-connection-string.md`
> -
>
> Key: SPARK-48523
> URL: https://issues.apache.org/jira/browse/SPARK-48523
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48523) Add `grpc_max_message_size ` description to `client-connection-string.md`

2024-06-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48523?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48523:


Assignee: BingKun Pan

> Add `grpc_max_message_size ` description to `client-connection-string.md`
> -
>
> Key: SPARK-48523
> URL: https://issues.apache.org/jira/browse/SPARK-48523
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48485) Support interruptTag and interruptAll in streaming queries

2024-06-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48485?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48485.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46819
[https://github.com/apache/spark/pull/46819]

> Support interruptTag and interruptAll in streaming queries
> --
>
> Key: SPARK-48485
> URL: https://issues.apache.org/jira/browse/SPARK-48485
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Spark Connect's interrupt API does not interrupt streaming queries. We should 
> support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48482:


Assignee: Wei Liu

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48482) dropDuplicates and dropDuplicatesWithinWatermark should accept varargs

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48482?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48482.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46817
[https://github.com/apache/spark/pull/46817]

> dropDuplicates and dropDuplicatesWithinWatermark should accept varargs
> --
>
> Key: SPARK-48482
> URL: https://issues.apache.org/jira/browse/SPARK-48482
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Assignee: Wei Liu
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48508:


Assignee: Ruifeng Zheng

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48508) Client Side RPC optimization for Spark Connect

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48508?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48508.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46848
[https://github.com/apache/spark/pull/46848]

> Client Side RPC optimization for Spark Connect
> --
>
> Key: SPARK-48508
> URL: https://issues.apache.org/jira/browse/SPARK-48508
> Project: Spark
>  Issue Type: Umbrella
>  Components: Connect
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48507.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46846
[https://github.com/apache/spark/pull/46846]

> Use Hadoop 3.3.6 winutils in `build_sparkr_window`
> --
>
> Key: SPARK-48507
> URL: https://issues.apache.org/jira/browse/SPARK-48507
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48507) Use Hadoop 3.3.6 winutils in `build_sparkr_window`

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48507:


Assignee: BingKun Pan

> Use Hadoop 3.3.6 winutils in `build_sparkr_window`
> --
>
> Key: SPARK-48507
> URL: https://issues.apache.org/jira/browse/SPARK-48507
> Project: Spark
>  Issue Type: Improvement
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48504:


Assignee: Ruifeng Zheng

> Parent Window class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-48504
> URL: https://issues.apache.org/jira/browse/SPARK-48504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48504) Parent Window class for Spark Connect and Spark Classic

2024-06-03 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48504.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46841
[https://github.com/apache/spark/pull/46841]

> Parent Window class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-48504
> URL: https://issues.apache.org/jira/browse/SPARK-48504
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48496) Use static regex Pattern instances in common/utils JavaUtils

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48496.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/46829

> Use static regex Pattern instances in common/utils JavaUtils
> 
>
> Key: SPARK-48496
> URL: https://issues.apache.org/jira/browse/SPARK-48496
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Some methods in JavaUtils.java are recompiling regexes on every invocation; 
> we should instead store a single cached Pattern.
> This is a minor perf. issue that I spotted in the context of other profiling. 
> Not a huge bottleneck in the grand scheme of things, but simple and 
> straightforward to fix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48489.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46823
[https://github.com/apache/spark/pull/46823]

> Throw an user-facing error when reading invalid schema from text DataSource
> ---
>
> Key: SPARK-48489
> URL: https://issues.apache.org/jira/browse/SPARK-48489
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Text DataSource produces table schema with only 1 column, but it is possible 
> to try and create a table with schema having multiple columns.
> Currently, when user tries this, we have an assert in the code, which fails 
> and throws internal spark error. We should throw a better user-facing error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48489) Throw an user-facing error when reading invalid schema from text DataSource

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48489?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48489:


Assignee: Stefan Bukorovic

> Throw an user-facing error when reading invalid schema from text DataSource
> ---
>
> Key: SPARK-48489
> URL: https://issues.apache.org/jira/browse/SPARK-48489
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.3
>Reporter: Stefan Bukorovic
>Assignee: Stefan Bukorovic
>Priority: Minor
>  Labels: pull-request-available
>
> Text DataSource produces table schema with only 1 column, but it is possible 
> to try and create a table with schema having multiple columns.
> Currently, when user tries this, we have an assert in the code, which fails 
> and throws internal spark error. We should throw a better user-facing error.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48374) Support additional PyArrow Table column types

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48374:


Assignee: Ian Cook

> Support additional PyArrow Table column types
> -
>
> Key: SPARK-48374
> URL: https://issues.apache.org/jira/browse/SPARK-48374
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-48220 adds support for passing a PyArrow Table to 
> {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are 
> not yet supported:
>  * fixed-size binary
>  * fixed-size list
>  * large list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48374) Support additional PyArrow Table column types

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48374?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48374.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46688
[https://github.com/apache/spark/pull/46688]

> Support additional PyArrow Table column types
> -
>
> Key: SPARK-48374
> URL: https://issues.apache.org/jira/browse/SPARK-48374
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-48220 adds support for passing a PyArrow Table to 
> {{{}createDataFrame(){}}}, but there are a few PyArrow column types that are 
> not yet supported:
>  * fixed-size binary
>  * fixed-size list
>  * large list
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48220:


Assignee: Ian Cook

> Allow passing PyArrow Table to createDataFrame()
> 
>
> Key: SPARK-48220
> URL: https://issues.apache.org/jira/browse/SPARK-48220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
>
> SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table.
> It would be nice if we could also go in the opposite direction, enabling 
> users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow 
> Table to {{spark.createDataFrame()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48220) Allow passing PyArrow Table to createDataFrame()

2024-06-02 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48220.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46529
[https://github.com/apache/spark/pull/46529]

> Allow passing PyArrow Table to createDataFrame()
> 
>
> Key: SPARK-48220
> URL: https://issues.apache.org/jira/browse/SPARK-48220
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Input/Output, PySpark, SQL
>Affects Versions: 4.0.0, 3.5.1
>Reporter: Ian Cook
>Assignee: Ian Cook
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> SPARK-47365 added support for returning a Spark DataFrame as a PyArrow Table.
> It would be nice if we could also go in the opposite direction, enabling 
> users to create a Spark DataFrame from a PyArrow Table by passing the PyArrow 
> Table to {{spark.createDataFrame()}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48485) Support interruptTag and interruptAll in streaming queries

2024-05-31 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48485:


 Summary: Support interruptTag and interruptAll in streaming queries
 Key: SPARK-48485
 URL: https://issues.apache.org/jira/browse/SPARK-48485
 Project: Spark
  Issue Type: Improvement
  Components: Connect, Structured Streaming
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Spark Connect's interrupt API does not interrupt streaming queries. We should 
support them.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48474:


Assignee: BingKun Pan

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48474) Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48474?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48474.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46808
[https://github.com/apache/spark/pull/46808]

> Fix the class name of the log in `SparkSubmitArguments` & `SparkSubmit`
> ---
>
> Key: SPARK-48474
> URL: https://issues.apache.org/jira/browse/SPARK-48474
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48467:


Assignee: BingKun Pan

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48467) Upgrade Maven to 3.9.7

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48467.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46798
[https://github.com/apache/spark/pull/46798]

> Upgrade Maven to 3.9.7
> --
>
> Key: SPARK-48467
> URL: https://issues.apache.org/jira/browse/SPARK-48467
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47716:


Assignee: Jack Chen

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47716) SQLQueryTestSuite flaky case due to view name conflict

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47716?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47716.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45855
[https://github.com/apache/spark/pull/45855]

> SQLQueryTestSuite flaky case due to view name conflict
> --
>
> Key: SPARK-47716
> URL: https://issues.apache.org/jira/browse/SPARK-47716
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Jack Chen
>Assignee: Jack Chen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In SQLQueryTestSuite, the test case "Test logic for determining whether a 
> query is semantically sorted" can sometimes fail with an error
> {{Cannot create table or view `main`.`default`.`t1` because it already 
> exists.}}
> if run concurrently with other sql test cases that also create tables with 
> the same name.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48461) Replace NullPointerExceptions with proper error classes in AssertNotNull expression

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48461?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48461.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46793
[https://github.com/apache/spark/pull/46793]

> Replace NullPointerExceptions with proper error classes in AssertNotNull 
> expression
> ---
>
> Key: SPARK-48461
> URL: https://issues.apache.org/jira/browse/SPARK-48461
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> [Code location 
> here|https://github.com/apache/spark/blob/f5d9b809881552c0e1b5af72b2a32caa25018eb3/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/objects/objects.scala#L1929]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48446:


Assignee: Yuchen Liu

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48446) Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48446.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46797
[https://github.com/apache/spark/pull/46797]

> Update SS Doc of dropDuplicatesWithinWatermark to use the right syntax
> --
>
> Key: SPARK-48446
> URL: https://issues.apache.org/jira/browse/SPARK-48446
> Project: Spark
>  Issue Type: Documentation
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Yuchen Liu
>Assignee: Yuchen Liu
>Priority: Minor
>  Labels: easyfix, pull-request-available
> Fix For: 4.0.0
>
>   Original Estimate: 1h
>  Remaining Estimate: 1h
>
> For dropDuplicates, the example on 
> [https://spark.apache.org/docs/latest/structured-streaming-programming-guide.html#:~:text=)%20%5C%0A%20%20.-,dropDuplicates,-(%22guid%22]
>  is out of date compared with 
> [https://spark.apache.org/docs/latest/api/python/reference/pyspark.sql/api/pyspark.sql.DataFrame.dropDuplicates.html].
>  The argument should be a list.
> The discrepancy is also true for dropDuplicatesWithinWatermark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48475) Optimize _get_jvm_function in PySpark.

2024-05-30 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48475?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48475.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46809
[https://github.com/apache/spark/pull/46809]

> Optimize _get_jvm_function in PySpark.
> --
>
> Key: SPARK-48475
> URL: https://issues.apache.org/jira/browse/SPARK-48475
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Chenhao Li
>Assignee: Chenhao Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48464) Refactor SQLConfSuite and StatisticsSuite

2024-05-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48464?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48464.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46796
[https://github.com/apache/spark/pull/46796]

> Refactor SQLConfSuite and StatisticsSuite
> -
>
> Key: SPARK-48464
> URL: https://issues.apache.org/jira/browse/SPARK-48464
> Project: Spark
>  Issue Type: Sub-task
>  Components: Tests
>Affects Versions: 4.0.0
>Reporter: Rui Wang
>Assignee: Rui Wang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48454) Directly use the parent dataframe class

2024-05-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48454.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46785
[https://github.com/apache/spark/pull/46785]

> Directly use the parent dataframe class
> ---
>
> Key: SPARK-48454
> URL: https://issues.apache.org/jira/browse/SPARK-48454
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48454) Directly use the parent dataframe class

2024-05-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48454:


Assignee: Ruifeng Zheng

> Directly use the parent dataframe class
> ---
>
> Key: SPARK-48454
> URL: https://issues.apache.org/jira/browse/SPARK-48454
> Project: Spark
>  Issue Type: Improvement
>  Components: PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48442) Add parenthesis to awaitTermination call

2024-05-29 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48442?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48442.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46779
[https://github.com/apache/spark/pull/46779]

> Add parenthesis to awaitTermination call
> 
>
> Key: SPARK-48442
> URL: https://issues.apache.org/jira/browse/SPARK-48442
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.3
>Reporter: Riya Verma
>Assignee: Riya Verma
>Priority: Trivial
>  Labels: correctness, pull-request-available, starter
> Fix For: 4.0.0
>
>
> In {{test_stream_reader}} and {{test_stream_writer}} of 
> {*}test_python_streaming_datasource.py{*}, the call {{q.awaitTermination}} 
> does not invoke a function call as intended, but instead returns a python 
> function object. The fix is to change this to {{{}q.awaitTermination(){}}}.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-48459) Implement DataFrameQueryContext in Spark Connect

2024-05-29 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-48459:


 Summary: Implement DataFrameQueryContext in Spark Connect
 Key: SPARK-48459
 URL: https://issues.apache.org/jira/browse/SPARK-48459
 Project: Spark
  Issue Type: Improvement
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon


Implements the same https://github.com/apache/spark/pull/45377 in Spark Connect



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48445.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46780
[https://github.com/apache/spark/pull/46780]

> Don't inline UDFs with non-cheap children in CollapseProject
> 
>
> Key: SPARK-48445
> URL: https://issues.apache.org/jira/browse/SPARK-48445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Because UDFs (and certain other expressions) are considered cheap by 
> CollapseProject.isCheap, they are inlined and potentially duplicated (which 
> is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, 
> if the UDFs contain other non-cheap expressions, those will also be 
> duplicated and can potentially cause performance regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48445) Don't inline UDFs with non-cheap children in CollapseProject

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48445:


Assignee: Kelvin Jiang

> Don't inline UDFs with non-cheap children in CollapseProject
> 
>
> Key: SPARK-48445
> URL: https://issues.apache.org/jira/browse/SPARK-48445
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Kelvin Jiang
>Assignee: Kelvin Jiang
>Priority: Major
>  Labels: pull-request-available
>
> Because UDFs (and certain other expressions) are considered cheap by 
> CollapseProject.isCheap, they are inlined and potentially duplicated (which 
> is ok, because rules like ExtractPythonUDFs will de-duplicate them). However, 
> if the UDFs contain other non-cheap expressions, those will also be 
> duplicated and can potentially cause performance regressions.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17850230#comment-17850230
 ] 

Hyukjin Kwon commented on SPARK-23015:
--

Fixed in https://github.com/apache/spark/pull/43706

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 4.0.0
>
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Reopened] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reopened SPARK-23015:
--

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23015) spark-submit fails when submitting several jobs in parallel

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-23015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-23015.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

> spark-submit fails when submitting several jobs in parallel
> ---
>
> Key: SPARK-23015
> URL: https://issues.apache.org/jira/browse/SPARK-23015
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 2.0.0, 2.0.1, 2.0.2, 2.1.0, 2.1.1, 2.1.2, 2.2.0, 2.2.1
> Environment: Windows 10 (1709/16299.125)
> Spark 2.3.0
> Java 8, Update 151
>Reporter: Hugh Zabriskie
>Priority: Major
>  Labels: bulk-closed, pull-request-available
> Fix For: 4.0.0
>
>
> Spark Submit's launching library prints the command to execute the launcher 
> (org.apache.spark.launcher.main) to a temporary text file, reads the result 
> back into a variable, and then executes that command.
> {code}
> set LAUNCHER_OUTPUT=%temp%\spark-class-launcher-output-%RANDOM%.txt
> "%RUNNER%" -Xmx128m -cp "%LAUNCH_CLASSPATH%" org.apache.spark.launcher.Main 
> %* > %LAUNCHER_OUTPUT%
> {code}
> [bin/spark-class2.cmd, 
> L67|https://github.com/apache/spark/blob/master/bin/spark-class2.cmd#L66]
> That temporary text file is given a pseudo-random name by the %RANDOM% env 
> variable generator, which generates a number between 0 and 32767.
> This appears to be the cause of an error occurring when several spark-submit 
> jobs are launched simultaneously. The following error is returned from stderr:
> {quote}The process cannot access the file because it is being used by another 
> process. The system cannot find the file
> USER/AppData/Local/Temp/spark-class-launcher-output-RANDOM.txt.
> The process cannot access the file because it is being used by another 
> process.{quote}
> My hypothesis is that %RANDOM% is returning the same value for multiple jobs, 
> causing the launcher library to attempt to write to the same file from 
> multiple processes. Another mechanism is needed for reliably generating the 
> names of the temporary files so that the concurrency issue is resolved.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-42965) metadata mismatch for StructField when running some tests.

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-42965:


Assignee: Ruifeng Zheng

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48322) Drop internal metadata in `DataFrame.schema`

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48322.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46636
[https://github.com/apache/spark/pull/46636]

> Drop internal metadata in `DataFrame.schema`
> 
>
> Key: SPARK-48322
> URL: https://issues.apache.org/jira/browse/SPARK-48322
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-42965) metadata mismatch for StructField when running some tests.

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-42965.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46636
[https://github.com/apache/spark/pull/46636]

> metadata mismatch for StructField when running some tests.
> --
>
> Key: SPARK-42965
> URL: https://issues.apache.org/jira/browse/SPARK-42965
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, Pandas API on Spark
>Affects Versions: 3.5.0
>Reporter: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> For some reason, the metadata of `StructField` is different in a few tests 
> when using Spark Connect. However, the function works properly.
> For example, when running `python/run-tests --testnames 
> 'pyspark.pandas.tests.connect.data_type_ops.test_parity_binary_ops 
> BinaryOpsParityTests.test_add'` it complains `AssertionError: 
> ([InternalField(dtype=int64, struct_field=StructField('bool', LongType(), 
> False))], [StructField('bool', LongType(), False)])` because metadata is 
> different something like `\{'__autoGeneratedAlias': 'true'}` but they have 
> same name, type and nullable, so the function just works well.
> Therefore, we have temporarily added a branch for Spark Connect in the code 
> so that we can create InternalFrame properly to provide more pandas APIs in 
> Spark Connect. If a clear cause is found, we may need to revert it back to 
> its original state.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-48322) Drop internal metadata in `DataFrame.schema`

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48322?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-48322:


Assignee: Ruifeng Zheng

> Drop internal metadata in `DataFrame.schema`
> 
>
> Key: SPARK-48322
> URL: https://issues.apache.org/jira/browse/SPARK-48322
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PySpark, SQL
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-48438) Directly use the parent column class

2024-05-28 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-48438?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-48438.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46775
[https://github.com/apache/spark/pull/46775]

> Directly use the parent column class
> 
>
> Key: SPARK-48438
> URL: https://issues.apache.org/jira/browse/SPARK-48438
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect, PS
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   4   5   6   7   8   9   10   >