[jira] [Updated] (SPARK-47793) Implement SimpleDataSourceStreamReader for python streaming data source

2024-04-21 Thread Chaoqin Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47793?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-47793:
---
Epic Link: SPARK-46866

> Implement SimpleDataSourceStreamReader for python streaming data source
> ---
>
> Key: SPARK-47793
> URL: https://issues.apache.org/jira/browse/SPARK-47793
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SS
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
>  SimpleDataSourceStreamReader is a simplified version of the DataStreamReader 
> interface.
>  # It doesn’t require developers to reason about data partitioning.
>  # It doesn’t require getting the latest offset before reading data.
> There are 3 functions that needs to be defined 
> 1. Read data and return the end offset.
> _def read(self, start: Offset) -> (Iterator[Tuple], Offset)_
> 2. Read data between start and end offset, this is required for exactly once 
> read.
> _def read2(self, start: Offset, end: Offset) -> Iterator[Tuple]_
> 3. initial start offset of the streaming query.
> def initialOffset() -> dict
> Implementation: Wrap the SimpleDataSourceStreamReader instance in a 
> DataSourceStreamReader internally and make the prefetching and caching 
> transparent to the data source developer. The record prefetched in python 
> process will be sent to JVM as arrow record batches.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47920) Add documentation for python streaming data source

2024-04-21 Thread Chaoqin Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47920?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-47920:
---
Epic Link: SPARK-46866

> Add documentation for python streaming data source
> --
>
> Key: SPARK-47920
> URL: https://issues.apache.org/jira/browse/SPARK-47920
> Project: Spark
>  Issue Type: New Feature
>  Components: PySpark, SS
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
>
> Add documentation (user guide) for Python data source API.
> The DOC should explain how to develop and use DataSourceStreamReader and 
> DataSourceStreamWriter



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47777) Add spark connect test for python streaming data source

2024-04-21 Thread Chaoqin Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-4?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-4:
---
Epic Link: SPARK-46866

> Add spark connect test for python streaming data source
> ---
>
> Key: SPARK-4
> URL: https://issues.apache.org/jira/browse/SPARK-4
> Project: Spark
>  Issue Type: Test
>  Components: PySpark, SS, Tests
>Affects Versions: 3.5.1
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Make python streaming data source pyspark test also runs on spark connect. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47273) Implement python stream writer interface

2024-04-21 Thread Chaoqin Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47273?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-47273:
---
Epic Link: SPARK-46866

> Implement python stream writer interface
> 
>
> Key: SPARK-47273
> URL: https://issues.apache.org/jira/browse/SPARK-47273
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> In order to support developing spark streaming sink in python, we need to 
> implement python stream writer interface.
> Reuse PythonPartitionWriter to implement the serialization and execution of 
> write callback in executor.
> Implement python worker process to run python streaming data sink committer 
> and communicate with JVM through socket in spark driver. For each python 
> streaming data sink instance there will be a long live python worker process 
> created. Inside the python process, the python write committer will receive 
> abort or commit function call and send back result through socket.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47107) Implement partition reader for python streaming data source

2024-04-21 Thread Chaoqin Li (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47107?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Chaoqin Li updated SPARK-47107:
---
Epic Link: SPARK-46866

> Implement partition reader for python streaming data source
> ---
>
> Key: SPARK-47107
> URL: https://issues.apache.org/jira/browse/SPARK-47107
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, SS
>Affects Versions: 4.0.0
>Reporter: Chaoqin Li
>Assignee: Chaoqin Li
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Piggy back the PythonPartitionReaderFactory to implement reading a data 
> partition for python streaming data source. Add test case to verify that 
> python streaming data source can read and process data end to end.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47933?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47933:
---
Labels: pull-request-available  (was: )

> Parent Column class for Spark Connect and Spark Classic
> ---
>
> Key: SPARK-47933
> URL: https://issues.apache.org/jira/browse/SPARK-47933
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47932) Avoid using legacy commons-lang

2024-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47932:
---
Labels: pull-request-available  (was: )

> Avoid using legacy commons-lang
> ---
>
> Key: SPARK-47932
> URL: https://issues.apache.org/jira/browse/SPARK-47932
> Project: Spark
>  Issue Type: Test
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Minor
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47933) Parent Column class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47933:


 Summary: Parent Column class for Spark Connect and Spark Classic
 Key: SPARK-47933
 URL: https://issues.apache.org/jira/browse/SPARK-47933
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47902) Compute Current Time* expressions should be foldable

2024-04-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47902.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46120
[https://github.com/apache/spark/pull/46120]

> Compute Current Time* expressions should be foldable
> 
>
> Key: SPARK-47902
> URL: https://issues.apache.org/jira/browse/SPARK-47902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Following PR - https://github.com/apache/spark/pull/44261 changed "compute 
> current time" family of expressions to be unevaluable, given that these 
> expressions are supposed to be replaced with literals by QO. Unevaluable 
> implies that these expressions are not foldable, even though they will be 
> replaced by literals. 
> If these expressions were used in places that require constant folding  (e.g. 
> RAND()) new behavior would be to raise an error which is a regression 
> comparing to behavior prior to spark 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47902) Compute Current Time* expressions should be foldable

2024-04-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47902:
---

Assignee: Aleksandar Tomic

> Compute Current Time* expressions should be foldable
> 
>
> Key: SPARK-47902
> URL: https://issues.apache.org/jira/browse/SPARK-47902
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Aleksandar Tomic
>Assignee: Aleksandar Tomic
>Priority: Major
>  Labels: pull-request-available
>
> Following PR - https://github.com/apache/spark/pull/44261 changed "compute 
> current time" family of expressions to be unevaluable, given that these 
> expressions are supposed to be replaced with literals by QO. Unevaluable 
> implies that these expressions are not foldable, even though they will be 
> replaced by literals. 
> If these expressions were used in places that require constant folding  (e.g. 
> RAND()) new behavior would be to raise an error which is a regression 
> comparing to behavior prior to spark 4.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-33826) InsertIntoHiveTable generate HDFS file with invalid user

2024-04-21 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-33826?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839449#comment-17839449
 ] 

angerszhu commented on SPARK-33826:
---

What is RIK? Not sure what do you mean

> InsertIntoHiveTable generate HDFS file with invalid user
> 
>
> Key: SPARK-33826
> URL: https://issues.apache.org/jira/browse/SPARK-33826
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.3.2, 3.0.0
>Reporter: Zhang Jianguo
>Priority: Minor
>
> *Arch:* Hive on Spark.
>  
> *Version:* Spark 2.3.2
>  
> *Conf:*
> Enable user impersonation
> hive.server2.enable.doAs=true
>  
> *Scenario:*
> Thriftserver is running with loginUser A, and Task  run as User A too.
> Client execute SQL with user B
>  
> Data generated by sql "insert into TABLE  \[tbl\] select XXX form ." is 
> written to HDFS on executor, executor doesn't know B.
>  
> *{color:#de350b}So the user file written to HDFS will be user A which should 
> be B.{color}*
>  
> I also check the inplementation of Spark 3.0.0, It could have the same issue.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47845) Support column type in split function in scala and python

2024-04-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-47845.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46045
[https://github.com/apache/spark/pull/46045]

> Support column type in split function in scala and python
> -
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Spark Core
>Affects Versions: 3.5.1
>Reporter: Liu Cao
>Assignee: Liu Cao
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> I have a use case to split a String typed column with different delimiters 
> defined in other columns of the dataframe. SQL already supports this, but 
> scala / python functions currently don't.
>  
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
>   ("Doe, John", ", ", 2),
>   ("Smith,Jane", ",", 2),
>   ("Johnson", ",", 1)
> )
>   )
>   .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
> test_data").show()
> // currently doesn't compile for scala, but easy to support
> example.withColumn("name_parts", split(col("name"), col("delim"), 
> col("expected_parts_count"))).show() {code}
>  
> Pretty simple patch that I can make a PR soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47845) Support column type in split function in scala and python

2024-04-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47845?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-47845:
-

Assignee: Liu Cao

> Support column type in split function in scala and python
> -
>
> Key: SPARK-47845
> URL: https://issues.apache.org/jira/browse/SPARK-47845
> Project: Spark
>  Issue Type: New Feature
>  Components: Connect, Spark Core
>Affects Versions: 3.5.1
>Reporter: Liu Cao
>Assignee: Liu Cao
>Priority: Major
>  Labels: pull-request-available
>
> I have a use case to split a String typed column with different delimiters 
> defined in other columns of the dataframe. SQL already supports this, but 
> scala / python functions currently don't.
>  
> A hypothetical example to illustrate:
> {code:java}
> import org.apache.spark.sql.functions.{col, split}
> val example = spark.createDataFrame(
> Seq(
>   ("Doe, John", ", ", 2),
>   ("Smith,Jane", ",", 2),
>   ("Johnson", ",", 1)
> )
>   )
>   .toDF("name", "delim", "expected_parts_count")
> example.createOrReplaceTempView("test_data")
> // works for SQL
> spark.sql("SELECT split(name, delim, expected_parts_count) AS name_parts FROM 
> test_data").show()
> // currently doesn't compile for scala, but easy to support
> example.withColumn("name_parts", split(col("name"), col("delim"), 
> col("expected_parts_count"))).show() {code}
>  
> Pretty simple patch that I can make a PR soon



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839436#comment-17839436
 ] 

Hyukjin Kwon commented on SPARK-47909:
--

Yes, I am working on it today :-).

> Parent DataFrame class for Spark Connect and Spark Classic
> --
>
> Key: SPARK-47909
> URL: https://issues.apache.org/jira/browse/SPARK-47909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47909.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 46129
[https://github.com/apache/spark/pull/46129]

> Parent DataFrame class for Spark Connect and Spark Classic
> --
>
> Key: SPARK-47909
> URL: https://issues.apache.org/jira/browse/SPARK-47909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47909) Parent DataFrame class for Spark Connect and Spark Classic

2024-04-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47909:


Assignee: Hyukjin Kwon

> Parent DataFrame class for Spark Connect and Spark Classic
> --
>
> Key: SPARK-47909
> URL: https://issues.apache.org/jira/browse/SPARK-47909
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47600) MLLib: Migrate logInfo with variables to structured logging framework

2024-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47600?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47600:
---
Labels: pull-request-available  (was: )

> MLLib: Migrate logInfo with variables to structured logging framework
> -
>
> Key: SPARK-47600
> URL: https://issues.apache.org/jira/browse/SPARK-47600
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"

2024-04-21 Thread Cheng Pan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Cheng Pan updated SPARK-47928:
--
Affects Version/s: 3.2.0
   (was: 4.0.0)

> Speed up test "Add jar support Ivy URI in SQL"
> --
>
> Key: SPARK-47928
> URL: https://issues.apache.org/jira/browse/SPARK-47928
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.2.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47351) StringToMap (all collations)

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47351:
-
Summary: StringToMap (all collations)  (was: StringToMap)

> StringToMap (all collations)
> 
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47354) ParseJson & VariantExplode (all collations)

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47354:
-
Summary: ParseJson & VariantExplode (all collations)  (was: ParseJson (all 
collations))

> ParseJson & VariantExplode (all collations)
> ---
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47421) Mask (all collations)

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47421:
-
Summary: Mask (all collations)  (was: Mask)

> Mask (all collations)
> -
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47353) Mode (all collations)

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47353?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47353:
-
Summary: Mode (all collations)  (was: Mode)

> Mode (all collations)
> -
>
> Key: SPARK-47353
> URL: https://issues.apache.org/jira/browse/SPARK-47353
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47354) ParseJson (all collations)

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47354?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47354:
-
Summary: ParseJson (all collations)  (was: TBD)

> ParseJson (all collations)
> --
>
> Key: SPARK-47354
> URL: https://issues.apache.org/jira/browse/SPARK-47354
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"

2024-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47928?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47928:
---
Labels: pull-request-available  (was: )

> Speed up test "Add jar support Ivy URI in SQL"
> --
>
> Key: SPARK-47928
> URL: https://issues.apache.org/jira/browse/SPARK-47928
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Cheng Pan
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47928) Speed up test "Add jar support Ivy URI in SQL"

2024-04-21 Thread Cheng Pan (Jira)
Cheng Pan created SPARK-47928:
-

 Summary: Speed up test "Add jar support Ivy URI in SQL"
 Key: SPARK-47928
 URL: https://issues.apache.org/jira/browse/SPARK-47928
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 4.0.0
Reporter: Cheng Pan






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-41469) Task rerun on decommissioned executor can be avoided if shuffle data has migrated

2024-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-41469?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-41469:
---
Labels: pull-request-available  (was: )

> Task rerun on decommissioned executor can be avoided if shuffle data has 
> migrated
> -
>
> Key: SPARK-41469
> URL: https://issues.apache.org/jira/browse/SPARK-41469
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.1.3, 3.2.2, 3.3.1
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.0
>
>
> Currently, we will always rerun a finished shuffle map task if it once runs 
> the lost executor. However, in the case of the executor loss is caused by 
> decommission, the shuffle data might be migrated so that task doesn't need to 
> rerun.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47692) Fix default StringType meaning in implicit casting

2024-04-21 Thread Mihailo Milosevic (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47692?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mihailo Milosevic updated SPARK-47692:
--
Summary: Fix default StringType meaning in implicit casting  (was: Addition 
of priority flag to StringType)

> Fix default StringType meaning in implicit casting
> --
>
> Key: SPARK-47692
> URL: https://issues.apache.org/jira/browse/SPARK-47692
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Mihailo Milosevic
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47421) Mask

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47421?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47421:
-
Summary: Mask  (was: TBD)

> Mask
> 
>
> Key: SPARK-47421
> URL: https://issues.apache.org/jira/browse/SPARK-47421
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47351) StringToMap

2024-04-21 Thread Jira


 [ 
https://issues.apache.org/jira/browse/SPARK-47351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Uroš Bojanić updated SPARK-47351:
-
Summary: StringToMap  (was: TBD)

> StringToMap
> ---
>
> Key: SPARK-47351
> URL: https://issues.apache.org/jira/browse/SPARK-47351
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-21 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47730?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47730:
---
Labels: pull-request-available  (was: )

> Support APP_ID and EXECUTOR_ID placeholder in labels
> 
>
> Key: SPARK-47730
> URL: https://issues.apache.org/jira/browse/SPARK-47730
> Project: Spark
>  Issue Type: Improvement
>  Components: k8s
>Affects Versions: 3.5.1
>Reporter: Xi Chen
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47927) Nullability after join not respected in UDF

2024-04-21 Thread Emil Ejbyfeldt (Jira)
Emil Ejbyfeldt created SPARK-47927:
--

 Summary: Nullability after join not respected in UDF
 Key: SPARK-47927
 URL: https://issues.apache.org/jira/browse/SPARK-47927
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.3, 3.5.1, 4.0.0
Reporter: Emil Ejbyfeldt


{code:java}
val ds1 = Seq(1).toDS()
val ds2 = Seq[Int]().toDS()
val f = udf[(Int, Option[Int]), (Int, Option[Int])](identity)
ds1.join(ds2, ds1("value") === ds2("value"), 
"outer").select(f(struct(ds1("value"), ds2("value".show()
ds1.join(ds2, ds1("value") === ds2("value"), 
"outer").select(struct(ds1("value"), ds2("value"))).show() {code}
outputs
{code:java}
+---+
|UDF(struct(value, value, value, value))|
+---+
|                                 {1, 0}|
+---+

++
|struct(value, value)|
++
|           {1, NULL}|
++ {code}

So when the result is passed to UDF the null-ability after the the join is not 
respected and we incorrectly end up with a 0 value instead of a null/None value.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org