date:20230911

[jira] [Updated] (SPARK-45056) Add process termination tests for Python foreachBatch and StreamingQueryListener

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45056?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45056:
---
Labels: pull-request-available  (was: )

> Add process termination tests for Python foreachBatch and 
> StreamingQueryListener
> 
>
> Key: SPARK-45056
> URL: https://issues.apache.org/jira/browse/SPARK-45056
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Wei Liu
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45126) Multi-tenant history server

2023-09-11 Thread Ramu Ramaiah (Jira)

Ramu Ramaiah created SPARK-45126:


 Summary: Multi-tenant history server
 Key: SPARK-45126
 URL: https://issues.apache.org/jira/browse/SPARK-45126
 Project: Spark
  Issue Type: Wish
  Components: Spark Core
Affects Versions: 3.4.1
Reporter: Ramu Ramaiah


Spark history server makes use of the configuration 
"spark.history.fs.logDirectory" to locate the log events. This works well for a 
single tenant. When it is used for a multi-tenant deployment, the log events of 
multiple tenants are stored in a single directory which does not provide a 
logical separation of events for each tenant.

The proposal/wish is to have a support for Multi-tenant history server, 
where-in the configuration "spark.history.fs.logDirectory" can be a base 
directory. The sub-directories can contain the log events for each tenant. The 
sub-directories can be named after each tenant, for e.g. "tenant1", "tenant2" 
etc.

When it is combined to work with Spark Driver/Executor which makes use of the 
property "spark.eventLog.dir", the value of this property can be appropriately 
set for each tenant.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45125) Remove dev/github_jira_sync.py

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45125?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45125:
---
Labels: pull-request-available  (was: )

> Remove dev/github_jira_sync.py
> --
>
> Key: SPARK-45125
> URL: https://issues.apache.org/jira/browse/SPARK-45125
> Project: Spark
>  Issue Type: Task
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/SPARK-44942
> https://issues.apache.org/jira/browse/INFRA-24962
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45125) Remove dev/github_jira_sync.py

2023-09-11 Thread Kent Yao (Jira)

Kent Yao created SPARK-45125:


 Summary: Remove dev/github_jira_sync.py
 Key: SPARK-45125
 URL: https://issues.apache.org/jira/browse/SPARK-45125
 Project: Spark
  Issue Type: Task
  Components: Project Infra
Affects Versions: 4.0.0
Reporter: Kent Yao


https://issues.apache.org/jira/browse/SPARK-44942

https://issues.apache.org/jira/browse/INFRA-24962

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45122) Automate updating versions. json

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45122:
---
Labels: pull-request-available  (was: )

> Automate updating versions. json
> 
>
> Key: SPARK-45122
> URL: https://issues.apache.org/jira/browse/SPARK-45122
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45124) Do not use local user ID for Local Relations

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45124?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45124:
---
Labels: pull-request-available  (was: )

> Do not use local user ID for Local Relations
> 
>
> Key: SPARK-45124
> URL: https://issues.apache.org/jira/browse/SPARK-45124
> Project: Spark
>  Issue Type: Bug
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>
> Allowing a fetch of a local relation using user-provided information is a 
> potential security risk since this allows users to fetch arbitrary local 
> relations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45124) Do not use local user ID for Local Relations

2023-09-11 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-45124:


 Summary: Do not use local user ID for Local Relations
 Key: SPARK-45124
 URL: https://issues.apache.org/jira/browse/SPARK-45124
 Project: Spark
  Issue Type: Bug
  Components: Connect
Affects Versions: 3.5.0
Reporter: Hyukjin Kwon


Allowing a fetch of a local relation using user-provided information is a 
potential security risk since this allows users to fetch arbitrary local 
relations.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45120) Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45120?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45120:
---
Labels: pull-request-available  (was: )

> Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI
> 
>
> Key: SPARK-45120
> URL: https://issues.apache.org/jira/browse/SPARK-45120
> Project: Spark
>  Issue Type: Bug
>  Components: Web UI
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43123) special internal field metadata should not be leaked to catalogs

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43123?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43123:
---
Labels: pull-request-available  (was: )

> special internal field metadata should not be leaked to catalogs
> 
>
> Key: SPARK-43123
> URL: https://issues.apache.org/jira/browse/SPARK-43123
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45123) Raise TypeError for DataFrame.interpolate when all columns are object-dtype.

2023-09-11 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-45123:
---

 Summary: Raise TypeError for DataFrame.interpolate when all 
columns are object-dtype.
 Key: SPARK-45123
 URL: https://issues.apache.org/jira/browse/SPARK-45123
 Project: Spark
  Issue Type: Bug
  Components: Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


To match the pandas behavior.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45122) Automate updating versions. json

2023-09-11 Thread BingKun Pan (Jira)

BingKun Pan created SPARK-45122:
---

 Summary: Automate updating versions. json
 Key: SPARK-45122
 URL: https://issues.apache.org/jira/browse/SPARK-45122
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation
Affects Versions: 4.0.0
Reporter: BingKun Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45121) Support Series.empty for Spark Connect.

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45121:
---
Labels: pull-request-available  (was: )

> Support Series.empty for Spark Connect.
> ---
>
> Key: SPARK-45121
> URL: https://issues.apache.org/jira/browse/SPARK-45121
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>  Labels: pull-request-available
>
> We should remove JVM dependency for Pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45121) Support Series.empty for Spark Connect.

2023-09-11 Thread Haejoon Lee (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45121?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Haejoon Lee updated SPARK-45121:

Summary: Support Series.empty for Spark Connect.  (was: Support 
Series.empty for Spark Connect.d)

> Support Series.empty for Spark Connect.
> ---
>
> Key: SPARK-45121
> URL: https://issues.apache.org/jira/browse/SPARK-45121
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Priority: Major
>
> We should remove JVM dependency for Pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45121) Support Series.empty for Spark Connect.d

2023-09-11 Thread Haejoon Lee (Jira)

Haejoon Lee created SPARK-45121:
---

 Summary: Support Series.empty for Spark Connect.d
 Key: SPARK-45121
 URL: https://issues.apache.org/jira/browse/SPARK-45121
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Haejoon Lee


We should remove JVM dependency for Pandas API on Spark.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45110) Upgrade rocksdbjni to 8.5.3

2023-09-11 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-45110:

Affects Version/s: 3.5.0

> Upgrade rocksdbjni to 8.5.3
> ---
>
> Key: SPARK-45110
> URL: https://issues.apache.org/jira/browse/SPARK-45110
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.5.0, 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45110) Upgrade rocksdbjni to 8.5.3

2023-09-11 Thread BingKun Pan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

BingKun Pan updated SPARK-45110:

Issue Type: Bug  (was: Improvement)

> Upgrade rocksdbjni to 8.5.3
> ---
>
> Key: SPARK-45110
> URL: https://issues.apache.org/jira/browse/SPARK-45110
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45120) Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in UI

2023-09-11 Thread Kent Yao (Jira)

Kent Yao created SPARK-45120:


 Summary: Upgrade d3 from v3 to v7(v7.8.5) and apply api changes in 
UI
 Key: SPARK-45120
 URL: https://issues.apache.org/jira/browse/SPARK-45120
 Project: Spark
  Issue Type: Bug
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-43351) Support Golang in Spark Connect

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-43351:
---
Labels: pull-request-available  (was: )

> Support Golang in Spark Connect
> ---
>
> Key: SPARK-43351
> URL: https://issues.apache.org/jira/browse/SPARK-43351
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect
>Affects Versions: 3.5.0
>Reporter: BoYang
>Assignee: BoYang
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.0
>
>
> Support Spark Connect client side in Go programming language 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44915) Validate checksum of remounted PVC's shuffle data before recovery

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44915:
---
Labels: pull-request-available  (was: )

> Validate checksum of remounted PVC's shuffle data before recovery
> -
>
> Key: SPARK-44915
> URL: https://issues.apache.org/jira/browse/SPARK-44915
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24203) Make executor's bindAddress configurable

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-24203:
---
Labels: pull-request-available  (was: )

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-24203
> URL: https://issues.apache.org/jira/browse/SPARK-24203
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Lukas Majercak
>Assignee: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45119) Refine docstring of `inline`

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45119?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45119:
---
Labels: pull-request-available  (was: )

> Refine docstring of `inline`
> 
>
> Key: SPARK-45119
> URL: https://issues.apache.org/jira/browse/SPARK-45119
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> Refine docstring of the `inline` function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45119) Refine docstring of `inline`

2023-09-11 Thread Allison Wang (Jira)

Allison Wang created SPARK-45119:


 Summary: Refine docstring of `inline`
 Key: SPARK-45119
 URL: https://issues.apache.org/jira/browse/SPARK-45119
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Allison Wang


Refine docstring of the `inline` function



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24815) Structured Streaming should support dynamic allocation

2023-09-11 Thread Krystal Mitchell (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763940#comment-17763940
 ] 

Krystal Mitchell commented on SPARK-24815:
--

Thank you [~pavan0831]. This draft PR will impact some of the projects we are 
currently working on. 

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>  Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45118?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45118:
---
Labels: pull-request-available  (was: )

> Refactor converters for complex types to short cut when the element types 
> don't need converters
> ---
>
> Key: SPARK-45118
> URL: https://issues.apache.org/jira/browse/SPARK-45118
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Takuya Ueshin
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45118) Refactor converters for complex types to short cut when the element types don't need converters

2023-09-11 Thread Takuya Ueshin (Jira)

Takuya Ueshin created SPARK-45118:
-

 Summary: Refactor converters for complex types to short cut when 
the element types don't need converters
 Key: SPARK-45118
 URL: https://issues.apache.org/jira/browse/SPARK-45118
 Project: Spark
  Issue Type: Improvement
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Takuya Ueshin






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44912) Spark 3.4 multi-column sum slows with many columns

2023-09-11 Thread Brady Bickel (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44912?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Brady Bickel resolved SPARK-44912.
--
Resolution: Fixed

Verified build containing linked issue fix solved the problem.

> Spark 3.4 multi-column sum slows with many columns
> --
>
> Key: SPARK-44912
> URL: https://issues.apache.org/jira/browse/SPARK-44912
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0, 3.4.1
>Reporter: Brady Bickel
>Priority: Major
>
> The code below is a minimal reproducible example of an issue I discovered 
> with Pyspark 3.4.x. I want to sum the values of multiple columns and put the 
> sum of those columns (per row) into a new column. This code works and returns 
> in a reasonable amount of time in Pyspark 3.3.x, but is extremely slow in 
> Pyspark 3.4.x when the number of columns grows. See below for execution 
> timing summary as N varies.
> {code:java}
> import pyspark.sql.functions as F
> import random
> import string
> from functools import reduce
> from operator import add
> from pyspark.sql import SparkSession
> spark = SparkSession.builder.getOrCreate()
> # generate a dataframe N columns by M rows with random 8 digit column 
> # names and random integers in [-5,10]
> N = 30
> M = 100
> columns = [''.join(random.choices(string.ascii_uppercase +
>   string.digits, k=8))
>for _ in range(N)]
> data = [tuple([random.randint(-5,10) for _ in range(N)])
> for _ in range(M)]
> df = spark.sparkContext.parallelize(data).toDF(columns)
> # 3 ways to add a sum column, all of them slow for high N in spark 3.4
> df = df.withColumn("col_sum1", sum(df[col] for col in columns))
> df = df.withColumn("col_sum2", reduce(add, [F.col(col) for col in columns]))
> df = df.withColumn("col_sum3", F.expr("+".join(columns))) {code}
> Timing results for Spark 3.3:
> ||N||Exe Time (s)||
> |5|0.514|
> |10|0.248|
> |15|0.327|
> |20|0.403|
> |25|0.279|
> |30|0.322|
> |50|0.430|
> Timing results for Spark 3.4:
> ||N||Exe Time (s)||
> |5|0.379|
> |10|0.318|
> |15|0.405|
> |20|1.32|
> |25|28.8|
> |30|448|
> |50|>1 (did not finish)|



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-24815) Structured Streaming should support dynamic allocation

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-24815?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-24815:
---
Labels: pull-request-available  (was: )

> Structured Streaming should support dynamic allocation
> --
>
> Key: SPARK-24815
> URL: https://issues.apache.org/jira/browse/SPARK-24815
> Project: Spark
>  Issue Type: Improvement
>  Components: Scheduler, Spark Core, Structured Streaming
>Affects Versions: 2.3.1
>Reporter: Karthik Palaniappan
>Priority: Minor
>  Labels: pull-request-available
>
> For batch jobs, dynamic allocation is very useful for adding and removing 
> containers to match the actual workload. On multi-tenant clusters, it ensures 
> that a Spark job is taking no more resources than necessary. In cloud 
> environments, it enables autoscaling.
> However, if you set spark.dynamicAllocation.enabled=true and run a structured 
> streaming job, the batch dynamic allocation algorithm kicks in. It requests 
> more executors if the task backlog is a certain size, and removes executors 
> if they idle for a certain period of time.
> Quick thoughts:
> 1) Dynamic allocation should be pluggable, rather than hardcoded to a 
> particular implementation in SparkContext.scala (this should be a separate 
> JIRA).
> 2) We should make a structured streaming algorithm that's separate from the 
> batch algorithm. Eventually, continuous processing might need its own 
> algorithm.
> 3) Spark should print a warning if you run a structured streaming job when 
> Core's dynamic allocation is enabled



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45084) ProgressReport should include an accurate effective shuffle partition number

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45084?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45084:
---
Labels: pull-request-available  (was: )

> ProgressReport should include an accurate effective shuffle partition number
> 
>
> Key: SPARK-45084
> URL: https://issues.apache.org/jira/browse/SPARK-45084
> Project: Spark
>  Issue Type: Improvement
>  Components: Structured Streaming
>Affects Versions: 3.4.2
>Reporter: Siying Dong
>Priority: Minor
>  Labels: pull-request-available
>
> Currently, there is a numShufflePartitions "metric" reported in 
> StateOperatorProgress part of the progress report. However, the number is 
> reported by aggregating executors so in the case of task retry or speculative 
> executor, the metric is higher than number of shuffle partitions for the 
> query plan. Number of shuffle partitions can be useful for reporting purpose 
> so having a metric is helpful.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-44647) Support SPJ when join key is subset of partition keys

2023-09-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44647:
-

Assignee: Szehon Ho

> Support SPJ when join key is subset of partition keys
> -
>
> Key: SPARK-44647
> URL: https://issues.apache.org/jira/browse/SPARK-44647
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-44647) Support SPJ when join key is subset of partition keys

2023-09-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44647.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42306
[https://github.com/apache/spark/pull/42306]

> Support SPJ when join key is subset of partition keys
> -
>
> Key: SPARK-44647
> URL: https://issues.apache.org/jira/browse/SPARK-44647
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Szehon Ho
>Assignee: Szehon Ho
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45117) Implement missing otherCopyArgs for the MultiCommutativeOp expression

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45117?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45117:
---
Labels: pull-request-available  (was: )

> Implement missing otherCopyArgs for the MultiCommutativeOp expression
> -
>
> Key: SPARK-45117
> URL: https://issues.apache.org/jira/browse/SPARK-45117
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1
>Reporter: Supun Nakandala
>Priority: Major
>  Labels: pull-request-available
>
> Calling toJSON on a `MultiCommutativeOp` throws an assertion error as it does 
> not implement the `otherCopyArgs` method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45117) Implement missing otherCopyArgs for the MultiCommutativeOp expression

2023-09-11 Thread Supun Nakandala (Jira)

Supun Nakandala created SPARK-45117:
---

 Summary: Implement missing otherCopyArgs for the 
MultiCommutativeOp expression
 Key: SPARK-45117
 URL: https://issues.apache.org/jira/browse/SPARK-45117
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1
Reporter: Supun Nakandala


Calling toJSON on a `MultiCommutativeOp` throws an assertion error as it does 
not implement the `otherCopyArgs` method.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-24203) Make executor's bindAddress configurable

2023-09-11 Thread Ignite TC Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-24203?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763821#comment-17763821
 ] 

Ignite TC Bot commented on SPARK-24203:
---

User 'gedeh' has created a pull request for this issue:
https://github.com/apache/spark/pull/42870

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-24203
> URL: https://issues.apache.org/jira/browse/SPARK-24203
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.1.1
>Reporter: Lukas Majercak
>Assignee: Nishchal Venkataramana
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45075) Alter table with invalid default value will not report error

2023-09-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45075?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-45075:
--
Fix Version/s: (was: 3.4.2)

> Alter table with invalid default value will not report error
> 
>
> Key: SPARK-45075
> URL: https://issues.apache.org/jira/browse/SPARK-45075
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.1, 3.5.0
>Reporter: Jia Fan
>Assignee: Jia Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.1
>
>
> create table t(i boolean, s bigint);
> alter table t alter column s set default badvalue;
>  
> The code wouldn't report error on DataSource V2, not align with V1.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45111) Upgrade maven to 3.9.4

2023-09-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen updated SPARK-45111:
-
Priority: Minor  (was: Major)

> Upgrade maven to 3.9.4
> --
>
> Key: SPARK-45111
> URL: https://issues.apache.org/jira/browse/SPARK-45111
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45111) Upgrade maven to 3.9.4

2023-09-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-45111.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42827
[https://github.com/apache/spark/pull/42827]

> Upgrade maven to 3.9.4
> --
>
> Key: SPARK-45111
> URL: https://issues.apache.org/jira/browse/SPARK-45111
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45111) Upgrade maven to 3.9.4

2023-09-11 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-45111:


Assignee: Yang Jie

> Upgrade maven to 3.9.4
> --
>
> Key: SPARK-45111
> URL: https://issues.apache.org/jira/browse/SPARK-45111
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015

2023-09-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk resolved SPARK-43251.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42845
[https://github.com/apache/spark/pull/42845]

> Assign a name to the error class _LEGACY_ERROR_TEMP_2015
> 
>
> Key: SPARK-43251
> URL: https://issues.apache.org/jira/browse/SPARK-43251
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Deng Ziming
>Priority: Minor
>  Labels: pull-request-available, starter
> Fix For: 4.0.0
>
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-43251) Assign a name to the error class _LEGACY_ERROR_TEMP_2015

2023-09-11 Thread Max Gekk (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-43251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Max Gekk reassigned SPARK-43251:


Assignee: Deng Ziming

> Assign a name to the error class _LEGACY_ERROR_TEMP_2015
> 
>
> Key: SPARK-43251
> URL: https://issues.apache.org/jira/browse/SPARK-43251
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.5.0
>Reporter: Max Gekk
>Assignee: Deng Ziming
>Priority: Minor
>  Labels: pull-request-available, starter
>
> Choose a proper name for the error class *_LEGACY_ERROR_TEMP_2015* defined in 
> {*}core/src/main/resources/error/error-classes.json{*}. The name should be 
> short but complete (look at the example in error-classes.json).
> Add a test which triggers the error from user code if such test still doesn't 
> exist. Check exception fields by using {*}checkError(){*}. The last function 
> checks valuable error fields only, and avoids dependencies from error text 
> message. In this way, tech editors can modify error format in 
> error-classes.json, and don't worry of Spark's internal tests. Migrate other 
> tests that might trigger the error onto checkError().
> If you cannot reproduce the error from user space (using SQL query), replace 
> the error by an internal error, see {*}SparkException.internalError(){*}.
> Improve the error message format in error-classes.json if the current is not 
> clear. Propose a solution to users how to avoid and fix such kind of errors.
> Please, look at the PR below as examples:
>  * [https://github.com/apache/spark/pull/38685]
>  * [https://github.com/apache/spark/pull/38656]
>  * [https://github.com/apache/spark/pull/38490]



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45092) Avoid analyze twice for failed queries

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45092:
---
Labels: pull-request-available  (was: )

> Avoid analyze twice for failed queries
> --
>
> Key: SPARK-45092
> URL: https://issues.apache.org/jira/browse/SPARK-45092
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45069) SQL variable should always be resolved after outer reference

2023-09-11 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-45069.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42803
[https://github.com/apache/spark/pull/42803]

> SQL variable should always be resolved after outer reference
> 
>
> Key: SPARK-45069
> URL: https://issues.apache.org/jira/browse/SPARK-45069
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45069) SQL variable should always be resolved after outer reference

2023-09-11 Thread Wenchen Fan (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-45069:
---

Assignee: Wenchen Fan

> SQL variable should always be resolved after outer reference
> 
>
> Key: SPARK-45069
> URL: https://issues.apache.org/jira/browse/SPARK-45069
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-36191) Support ORDER BY and LIMIT to be on the correlation path

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-36191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-36191:
---
Labels: pull-request-available  (was: )

> Support ORDER BY and LIMIT to be on the correlation path
> 
>
> Key: SPARK-36191
> URL: https://issues.apache.org/jira/browse/SPARK-36191
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Allison Wang
>Priority: Major
>  Labels: pull-request-available
>
> A correlation path is defined as the sub-tree of all the operators that are 
> on the path from the operator hosting the correlated expressions up to the 
> operator producing the correlated values. 
> We want to support ORDER BY (Sort) and LIMT operators to be on the 
> correlation path to achieve better feature parity with Postgres. Here is an 
> example query in `postgreSQL/join.sql`:
> {code:SQL}
> select * from
>   text_tbl t1
>   left join int8_tbl i8
>   on i8.q2 = 123,
>   lateral (select i8.q1, t2.f1 from text_tbl t2 limit 1) as ss
> where t1.f1 = ss.f1;
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-42746) Add the LISTAGG() aggregate function

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-42746?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42746:
---
Labels: pull-request-available  (was: )

> Add the LISTAGG() aggregate function
> 
>
> Key: SPARK-42746
> URL: https://issues.apache.org/jira/browse/SPARK-42746
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Max Gekk
>Priority: Major
>  Labels: pull-request-available
>
> {{listagg()}} is a common and useful aggregation function to concatenate 
> string values in a column, optionally by a certain order. The systems below 
> have supported such function already:
>  * Oracle: 
> [https://docs.oracle.com/cd/E11882_01/server.112/e41084/functions089.htm#SQLRF30030]
>  * Snowflake: [https://docs.snowflake.com/en/sql-reference/functions/listagg]
>  * Amazon Redshift: 
> [https://docs.aws.amazon.com/redshift/latest/dg/r_LISTAGG.html]
>  * Google BigQuery: 
> [https://cloud.google.com/bigquery/docs/reference/standard-sql/functions-and-operators#string_agg]
> Need to introduce this new aggregate in Spark, both as a regular aggregate 
> and as a window function.
> Proposed syntax:
> {code:sql}
> LISTAGG( [ DISTINCT ]  [,  ] ) [ WITHIN GROUP ( 
>  ) ]
> {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45116) Add some comment for param of JdbcDialect createTable

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45116?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45116:
---
Labels: pull-request-available  (was: )

> Add some comment for param of JdbcDialect createTable
> -
>
> Key: SPARK-45116
> URL: https://issues.apache.org/jira/browse/SPARK-45116
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.1
>Reporter: Jia Fan
>Priority: Minor
>  Labels: pull-request-available
>
> Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't 
> add comment for param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45116) Add some comment for param of JdbcDialect createTable

2023-09-11 Thread Jia Fan (Jira)

Jia Fan created SPARK-45116:
---

 Summary: Add some comment for param of JdbcDialect createTable
 Key: SPARK-45116
 URL: https://issues.apache.org/jira/browse/SPARK-45116
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.1
Reporter: Jia Fan


Since SPARK-41516 , add {{createTable}} to {{{}JdbcDialect{}}}. But doesn't add 
comment for param.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38958) Override S3 Client in Spark Write/Read calls

2023-09-11 Thread Steve Loughran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763725#comment-17763725
 ] 

Steve Loughran commented on SPARK-38958:


[~hershalb] hadoop trunk is now on v2 sdk, but we are still stabilising client 
binding. 



> Override S3 Client in Spark Write/Read calls
> 
>
> Key: SPARK-38958
> URL: https://issues.apache.org/jira/browse/SPARK-38958
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.2.1
>Reporter: Hershal
>Priority: Major
>
> Hello,
> I have been working to use spark to read and write data to S3. Unfortunately, 
> there are a few S3 headers that I need to add to my spark read/write calls. 
> After much looking, I have not found a way to replace the S3 client that 
> spark uses to make the read/write calls. I also have not found a 
> configuration that allows me to pass in S3 headers. Here is an example of 
> some common S3 request headers 
> ([https://docs.aws.amazon.com/AmazonS3/latest/API/RESTCommonRequestHeaders.html).]
>  Does there already exist functionality to add S3 headers to spark read/write 
> calls or pass in a custom client that would pass these headers on every 
> read/write request? Appreciate the help and feedback
>  
> Thanks,



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-29670) Make executor's bindAddress configurable

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-29670?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-29670:
---
Labels: pull-request-available  (was: )

> Make executor's bindAddress configurable
> 
>
> Key: SPARK-29670
> URL: https://issues.apache.org/jira/browse/SPARK-29670
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.4
>Reporter: Nishchal Venkataramana
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45069) SQL variable should always be resolved after outer reference

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45069:
---
Labels: pull-request-available  (was: )

> SQL variable should always be resolved after outer reference
> 
>
> Key: SPARK-45069
> URL: https://issues.apache.org/jira/browse/SPARK-45069
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32014) Support calling stored procedure on JDBC data source

2023-09-11 Thread Sumanto Pal (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763688#comment-17763688
 ] 

Sumanto Pal commented on SPARK-32014:
-

Why isn't this prioritiezed? 

> Support calling stored procedure on JDBC data source
> 
>
> Key: SPARK-32014
> URL: https://issues.apache.org/jira/browse/SPARK-32014
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yoshi Matsuzaki
>Priority: Major
>
> Currently, all queries via JDBC data source are enveloped by outer SELECT as 
> described below:
> [https://spark.apache.org/docs/latest/sql-data-sources-jdbc.html]
> {quote}
> A query that will be used to read data into Spark. The specified query will 
> be parenthesized and used as a subquery in the FROM clause. Spark will also 
> assign an alias to the subquery clause. As an example, spark will issue a 
> query of the following form to the JDBC Source.
> SELECT  FROM () spark_gen_alias
> {quote}
> Because of the behavior, we cannot call a stored procedure in major 
> databases, because stored procedure call syntax is usually not allowed to be 
> used in a subquery because its returned value is optional.
> For example, below Scala code to execute a query on Snowflake as JDBC data 
> source raises a syntax error, because the query "call proc()" is rewritten to 
> "select * from (call proc()) where 1 = 0", and it is invalid because CALL 
> cannot be in the middle of a query.
> {code:scala}
> val df: DataFrame = spark.read
>   .format("snowflake")
>   .options(options)
>   .option("query", "call proc()")
>   .load()
> display(df)
> {code}
> I tested this with Snowflake, but it should happen in any major database 
> systems.
> I understand JDBC data source is to read and write data through Dataframe, 
> then the interfaces implemented are just to read and write, but sometimes we 
> need to just execute some queries before or after reading/writing, for 
> example, to preprocess the data by stored procedure.
> I would appreciate it if you could consider to implement some interface/way 
> to allow us to call a stored procedure.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit

2023-09-11 Thread Sumanto Pal (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanto Pal updated SPARK-45115:

Issue Type: Improvement  (was: New Feature)

> No way to exclude jars setting to classpath while doing spark-submit
> 
>
> Key: SPARK-45115
> URL: https://issues.apache.org/jira/browse/SPARK-45115
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Submit
>Affects Versions: 3.4.1
>Reporter: Sumanto Pal
>Priority: Blocker
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The challenge is whenever you do spark-submit to start the application, the 
> jars present in spark home directory gets added to classpath automatically 
> and there is no way to exclude specific jars from there. For example, we dont 
> want slf4j jars present in spark home directory to be setted in classpath as 
> in codebase slf4j is already there. Thus it causes conflicts in jars. This 
> forces user to change there codebase to support spark-submit or to manually 
> remove the jars from spark-home directory. This i believe is not right 
> practice as we deviating from using spark as it supposed to be and it causes 
> unfixable behaviors at various instances with no clue. Example linkages 
> errors are common with the jar conflicts. 
>  
> There is detailed stackoverflow question on this issue. 
> refer : 
> https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit

2023-09-11 Thread Sumanto Pal (Jira)

Sumanto Pal created SPARK-45115:
---

 Summary: No way to exclude jars setting to classpath while doing 
spark-submit
 Key: SPARK-45115
 URL: https://issues.apache.org/jira/browse/SPARK-45115
 Project: Spark
  Issue Type: Bug
  Components: Spark Submit
Affects Versions: 3.4.1
Reporter: Sumanto Pal


The challenge is whenever you do spark-submit to start the application, the 
jars present in spark home directory gets added to classpath automatically and 
there is no way to exclude specific jars from there. For example, we dont want 
slf4j jars present in spark home directory to be setted in classpath as in 
codebase slf4j is already there. Thus it causes conflicts in jars. This forces 
user to change there codebase to support spark-submit or to manually remove the 
jars from spark-home directory. This i believe is not right practice as we 
deviating from using spark as it supposed to be and it causes unfixable 
behaviors at various instances with no clue. Example linkages errors are common 
with the jar conflicts. 

 

There is detailed stackoverflow question on this issue. 

refer : 
https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit

2023-09-11 Thread Sumanto Pal (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanto Pal updated SPARK-45115:

Target Version/s:   (was: 3.4.1)

> No way to exclude jars setting to classpath while doing spark-submit
> 
>
> Key: SPARK-45115
> URL: https://issues.apache.org/jira/browse/SPARK-45115
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Submit
>Affects Versions: 3.4.1
>Reporter: Sumanto Pal
>Priority: Blocker
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The challenge is whenever you do spark-submit to start the application, the 
> jars present in spark home directory gets added to classpath automatically 
> and there is no way to exclude specific jars from there. For example, we dont 
> want slf4j jars present in spark home directory to be setted in classpath as 
> in codebase slf4j is already there. Thus it causes conflicts in jars. This 
> forces user to change there codebase to support spark-submit or to manually 
> remove the jars from spark-home directory. This i believe is not right 
> practice as we deviating from using spark as it supposed to be and it causes 
> unfixable behaviors at various instances with no clue. Example linkages 
> errors are common with the jar conflicts. 
>  
> There is detailed stackoverflow question on this issue. 
> refer : 
> https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45115) No way to exclude jars setting to classpath while doing spark-submit

2023-09-11 Thread Sumanto Pal (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sumanto Pal updated SPARK-45115:

Issue Type: New Feature  (was: Bug)

> No way to exclude jars setting to classpath while doing spark-submit
> 
>
> Key: SPARK-45115
> URL: https://issues.apache.org/jira/browse/SPARK-45115
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Submit
>Affects Versions: 3.4.1
>Reporter: Sumanto Pal
>Priority: Blocker
>   Original Estimate: 336h
>  Remaining Estimate: 336h
>
> The challenge is whenever you do spark-submit to start the application, the 
> jars present in spark home directory gets added to classpath automatically 
> and there is no way to exclude specific jars from there. For example, we dont 
> want slf4j jars present in spark home directory to be setted in classpath as 
> in codebase slf4j is already there. Thus it causes conflicts in jars. This 
> forces user to change there codebase to support spark-submit or to manually 
> remove the jars from spark-home directory. This i believe is not right 
> practice as we deviating from using spark as it supposed to be and it causes 
> unfixable behaviors at various instances with no clue. Example linkages 
> errors are common with the jar conflicts. 
>  
> There is detailed stackoverflow question on this issue. 
> refer : 
> https://stackoverflow.com/questions/76476618/linkageerror-facing-while-doing-spark-submit
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters

2023-09-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-45114:
-

Assignee: Ruifeng Zheng

> Adjust the `versionadded` and `versionchanged` information to the parameters
> 
>
> Key: SPARK-45114
> URL: https://issues.apache.org/jira/browse/SPARK-45114
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters

2023-09-11 Thread Ruifeng Zheng (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-45114.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42867
[https://github.com/apache/spark/pull/42867]

> Adjust the `versionadded` and `versionchanged` information to the parameters
> 
>
> Key: SPARK-45114
> URL: https://issues.apache.org/jira/browse/SPARK-45114
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-44635) Handle shuffle fetch failures in decommissions

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-44635?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-44635:
---
Labels: pull-request-available  (was: )

> Handle shuffle fetch failures in decommissions
> --
>
> Key: SPARK-44635
> URL: https://issues.apache.org/jira/browse/SPARK-44635
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Bo Zhang
>Priority: Major
>  Labels: pull-request-available
>
> Spark's decommission feature supports migration of shuffle data. However 
> shuffle data fetcher will only look at the location (`BlockManagerId`) when 
> it is initialized. This can lead to shuffle fetch failures when the shuffle 
> read tasks are long.
>  
> To mitigate this, shuffle data fetchers should be able to look for the 
> updated locations after decommissions, and fetch from there instead.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-45113) Refine docstrings of `collect_list/collect_set`

2023-09-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-45113.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42866
[https://github.com/apache/spark/pull/42866]

> Refine docstrings of `collect_list/collect_set`
> ---
>
> Key: SPARK-45113
> URL: https://issues.apache.org/jira/browse/SPARK-45113
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45113) Refine docstrings of `collect_list/collect_set`

2023-09-11 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45113?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-45113:


Assignee: Yang Jie

> Refine docstrings of `collect_list/collect_set`
> ---
>
> Key: SPARK-45113
> URL: https://issues.apache.org/jira/browse/SPARK-45113
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-38215) InsertIntoHiveDir support convert metadata

2023-09-11 Thread Penglei Shi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763577#comment-17763577
 ] 

Penglei Shi edited comment on SPARK-38215 at 9/11/23 10:48 AM:
---

[~angerszhuuu] Hi, I found spark does not throw 
QueryCompilationErrors.cannotOverwritePathBeingReadFromError()

when insert dir select from table which has same path as inserted dir, this 
will delete table files. Because DDLUtils.verifyNotReadPath just collect 
LogicalRelation rather than HiveTableRelation, this cause InsertIntoDir will be 
converted to InsertIntoDataSourceDirCommand in RelationConversions even though 
HiveTableRelation's location is same as the inserted dir. And 
DataSourceAnalysis will not notice that.


was (Author: penglei shi):
[~angerszhuuu] Hi, I found when insert dir select from table which has same 
path as inserted dir, it will delete table files in advance, and the directory 
will be empty. Because DDLUtils.verifyNotReadPath just collect LogicalRelation 
rather than HiveTableRelation, this cause InsertIntoDir will be converted to 
InsertIntoDataSourceDirCommand even though HiveTableRelation's location is same 
as the inserted dir.

> InsertIntoHiveDir support convert metadata
> --
>
> Key: SPARK-38215
> URL: https://issues.apache.org/jira/browse/SPARK-38215
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current InsertIntoHiveDir command use hive serde write data, con't supporot 
> convert, cause such SQL can't write  parquet with zstd.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45085:
--

Assignee: (was: Apache Spark)

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45085:
--

Assignee: Apache Spark

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45085:
--

Assignee: (was: Apache Spark)

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45085:
--

Assignee: Apache Spark

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Assignee: Apache Spark
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45085:
---
Labels: pull-request-available  (was: )

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45085) Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and refactor some logic

2023-09-11 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45085?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763619#comment-17763619
 ] 

ASF GitHub Bot commented on SPARK-45085:


User 'panbingkun' has created a pull request for this issue:
https://github.com/apache/spark/pull/42824

> Merge UNSUPPORTED_TEMP_VIEW_OPERATION into UNSUPPORTED_VIEW_OPERATION and 
> refactor some logic
> -
>
> Key: SPARK-45085
> URL: https://issues.apache.org/jira/browse/SPARK-45085
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: BingKun Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45069) SQL variable should always be resolved after outer reference

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45069:
--

Assignee: Apache Spark

> SQL variable should always be resolved after outer reference
> 
>
> Key: SPARK-45069
> URL: https://issues.apache.org/jira/browse/SPARK-45069
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-45069) SQL variable should always be resolved after outer reference

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-45069:
--

Assignee: (was: Apache Spark)

> SQL variable should always be resolved after outer reference
> 
>
> Key: SPARK-45069
> URL: https://issues.apache.org/jira/browse/SPARK-45069
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-45069) SQL variable should always be resolved after outer reference

2023-09-11 Thread ASF GitHub Bot (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-45069?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763616#comment-17763616
 ] 

ASF GitHub Bot commented on SPARK-45069:


User 'cloud-fan' has created a pull request for this issue:
https://github.com/apache/spark/pull/42803

> SQL variable should always be resolved after outer reference
> 
>
> Key: SPARK-45069
> URL: https://issues.apache.org/jira/browse/SPARK-45069
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Wenchen Fan
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45102) Support keyword columns on filters that interact with HMS

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45102:
---
Labels: pull-request-available  (was: )

> Support keyword columns on filters that interact with HMS
> -
>
> Key: SPARK-45102
> URL: https://issues.apache.org/jira/browse/SPARK-45102
> Project: Spark
>  Issue Type: Improvement
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Steve Carlin
>Priority: Major
>  Labels: pull-request-available
>
> Recently, https://issues.apache.org/jira/browse/HIVE-27665 was pushed on 
> Hive. This will allow HMS to handle columns that are surrounded by backticks 
> in filters.  An example of a customer who hit this problem had a filter in 
> Spark like this:
> where date='2015-01-06'
> This didn't work because the word "date" is a keyword.  In order for the 
> customer to work, the where clause should be changed to this:
> where `date`='2015-01-06'
> Spark strips out the backticks before passing the filter to HMS.  We need to 
> no longer strip the backticks as a configurable flag.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45112) Use UnresolvedFunction based resolution in SQL Dataset functions

2023-09-11 Thread Peter Toth (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45112?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Peter Toth updated SPARK-45112:
---
Summary: Use UnresolvedFunction based resolution in SQL Dataset functions  
(was: Use UnresolvedFunction in dataset functions)

> Use UnresolvedFunction based resolution in SQL Dataset functions
> 
>
> Key: SPARK-45112
> URL: https://issues.apache.org/jira/browse/SPARK-45112
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Peter Toth
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters

2023-09-11 Thread Ruifeng Zheng (Jira)

Ruifeng Zheng created SPARK-45114:
-

 Summary: Adjust the `versionadded` and `versionchanged` 
information to the parameters
 Key: SPARK-45114
 URL: https://issues.apache.org/jira/browse/SPARK-45114
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45114) Adjust the `versionadded` and `versionchanged` information to the parameters

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45114?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45114:
---
Labels: pull-request-available  (was: )

> Adjust the `versionadded` and `versionchanged` information to the parameters
> 
>
> Key: SPARK-45114
> URL: https://issues.apache.org/jira/browse/SPARK-45114
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-38215) InsertIntoHiveDir support convert metadata

2023-09-11 Thread Penglei Shi (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-38215?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17763577#comment-17763577
 ] 

Penglei Shi commented on SPARK-38215:
-

[~angerszhuuu] Hi, I found when insert dir select from table which has same 
path as inserted dir, it will delete table files in advance, and the directory 
will be empty. Because DDLUtils.verifyNotReadPath just collect LogicalRelation 
rather than HiveTableRelation, this cause InsertIntoDir will be converted to 
InsertIntoDataSourceDirCommand even though HiveTableRelation's location is same 
as the inserted dir.

> InsertIntoHiveDir support convert metadata
> --
>
> Key: SPARK-38215
> URL: https://issues.apache.org/jira/browse/SPARK-38215
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 3.2.1
>Reporter: angerszhu
>Assignee: angerszhu
>Priority: Major
> Fix For: 3.3.0
>
>
> Current InsertIntoHiveDir command use hive serde write data, con't supporot 
> convert, cause such SQL can't write  parquet with zstd.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-45020) org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 'default' not found (state=08S01,code=0)

2023-09-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-45020?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-45020:
---
Labels: pull-request-available  (was: )

> org.apache.spark.sql.catalyst.analysis.NoSuchDatabaseException: Database 
> 'default' not found (state=08S01,code=0)
> -
>
> Key: SPARK-45020
> URL: https://issues.apache.org/jira/browse/SPARK-45020
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.1.0
>Reporter: Sruthi Mooriyathvariam
>Priority: Minor
>  Labels: pull-request-available
>
> There is an alert that fires up when a Spark 3.1 cluster is created using 
> shared metastore with Spark 2.4. The alert says DefaultDatabase does not 
> exist. This is misleading and thus we need to suppress this alert. 
> In the class SessionCatalog.scala, the method requireDbExists() is not 
> handling the case when the db = defaultDB. This needs to be added to suppress 
> this misleading alert. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

75 matches

Mail list logo