[jira] [Created] (SPARK-44906) Move substituteAppNExecIds logic into kubernetesConf.annotations method

2023-08-21 Thread Binjie Yang (Jira)
Binjie Yang created SPARK-44906:
---

 Summary: Move substituteAppNExecIds logic into 
kubernetesConf.annotations method 
 Key: SPARK-44906
 URL: https://issues.apache.org/jira/browse/SPARK-44906
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 3.4.1
Reporter: Binjie Yang


Move Utils. SubstituteAppNExecIds logic  into KubernetesConf.annotations as the 
default logic, easy for users to reuse, rather than to rewrite it again at the 
same logic.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44905) NullPointerException on stateful expression evaluation

2023-08-21 Thread Kent Yao (Jira)
Kent Yao created SPARK-44905:


 Summary: NullPointerException on stateful expression evaluation
 Key: SPARK-44905
 URL: https://issues.apache.org/jira/browse/SPARK-44905
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.1, 3.5.0, 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44728) Improve PySpark documentations

2023-08-21 Thread Ruifeng Zheng (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757209#comment-17757209
 ] 

Ruifeng Zheng commented on SPARK-44728:
---

A good docstring should contain the following sections:
 # Brief Description: A concise summary explaining the function's purpose.
 # xVersion Annotations: Annotations like versionadded and versionchanged to 
signify the addition or modifications of the function in different versions of 
the software.
 # Parameters: This section should list and describe all input parameters. If 
the function doesn't accept any parameters, this section can be omitted.
 # Returns: Detail what the function returns. If the function doesn't return 
anything, this section can be omitted.
 # See Also: A list of related API functions or methods. This section can be 
omitted if no related APIs exist.
 # Notes: Include additional information or warnings about the function's usage 
here.
 # Examples: Every example should begin with a brief description, followed by 
the example code, and conclude with the expected output. Any necessary import 
statements should be included at the beginning of each example.

> Improve PySpark documentations
> --
>
> Key: SPARK-44728
> URL: https://issues.apache.org/jira/browse/SPARK-44728
> Project: Spark
>  Issue Type: Umbrella
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> An umbrella Jira ticket to improve the PySpark documentation.
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44904) Correct the ‘versionadded’ of `sql.functions.approx_percentile` to 3.5.0.

2023-08-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44904:
-
Summary: Correct the ‘versionadded’ of `sql.functions.approx_percentile` to 
3.5.0.  (was: Correct the ‘versionchanged’ of `sql.functions.approx_percentile` 
to 3.5.0.)

> Correct the ‘versionadded’ of `sql.functions.approx_percentile` to 3.5.0.
> -
>
> Key: SPARK-44904
> URL: https://issues.apache.org/jira/browse/SPARK-44904
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Yang Jie
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44904) Correct the ‘versionchanged’ of `sql.functions.approx_percentile` to 3.5.0.

2023-08-21 Thread Yang Jie (Jira)
Yang Jie created SPARK-44904:


 Summary: Correct the ‘versionchanged’ of 
`sql.functions.approx_percentile` to 3.5.0.
 Key: SPARK-44904
 URL: https://issues.apache.org/jira/browse/SPARK-44904
 Project: Spark
  Issue Type: Improvement
  Components: Documentation, PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43327) Trigger `committer.setupJob` before plan execute in `FileFormatWriter`

2023-08-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-43327.
-
Fix Version/s: 3.3.4
   Resolution: Fixed

Issue resolved by pull request 41154
[https://github.com/apache/spark/pull/41154]

> Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
> --
>
> Key: SPARK-43327
> URL: https://issues.apache.org/jira/browse/SPARK-43327
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: ming95
>Assignee: ming95
>Priority: Major
> Fix For: 3.3.4
>
>
> In this jira, the case where `outputOrdering` might not work if AQE is 
> enabled has been resolved.
> https://issues.apache.org/jira/browse/SPARK-40588
> However, since it materializes the AQE plan in advance (triggers 
> getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not 
> execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with 
> an error.
> Normally this step should be executed after committer.setupJob(job).
> This may eventually result in the insertoverwrite directory being deleted.
>  
> {code:java}
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.spark.sql.QueryTest
> import org.apache.spark.sql.catalyst.TableIdentifier
> sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC")
> sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC")
> sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 644164")
> sql("set spark.sql.ansi.enabled=true")
> val loc =
>   
> spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location
> val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration)
> println("Location exists: " + fs.exists(new Path(loc)))
> try {
>   sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " +
> "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by 
> amt1)")
> } finally {
>   println("Location exists: " + fs.exists(new Path(loc)))
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43327) Trigger `committer.setupJob` before plan execute in `FileFormatWriter`

2023-08-21 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-43327:
---

Assignee: ming95

> Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
> --
>
> Key: SPARK-43327
> URL: https://issues.apache.org/jira/browse/SPARK-43327
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.2.3
>Reporter: ming95
>Assignee: ming95
>Priority: Major
>
> In this jira, the case where `outputOrdering` might not work if AQE is 
> enabled has been resolved.
> https://issues.apache.org/jira/browse/SPARK-40588
> However, since it materializes the AQE plan in advance (triggers 
> getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not 
> execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with 
> an error.
> Normally this step should be executed after committer.setupJob(job).
> This may eventually result in the insertoverwrite directory being deleted.
>  
> {code:java}
> import org.apache.hadoop.fs.{FileSystem, Path}
> import org.apache.spark.sql.QueryTest
> import org.apache.spark.sql.catalyst.TableIdentifier
> sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC")
> sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC")
> sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 644164")
> sql("set spark.sql.ansi.enabled=true")
> val loc =
>   
> spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location
> val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration)
> println("Location exists: " + fs.exists(new Path(loc)))
> try {
>   sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " +
> "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by 
> amt1)")
> } finally {
>   println("Location exists: " + fs.exists(new Path(loc)))
> } {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44854) Python timedelta to DayTimeIntervalType edge cases bug

2023-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44854.
--
Fix Version/s: 3.5.0
   4.0.0
   3.4.2
   Resolution: Fixed

Issue resolved by pull request 42541
[https://github.com/apache/spark/pull/42541]

> Python timedelta to DayTimeIntervalType edge cases bug
> --
>
> Key: SPARK-44854
> URL: https://issues.apache.org/jira/browse/SPARK-44854
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.4.0
>Reporter: Ocean HD
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 3.5.0, 4.0.0, 3.4.2
>
>   Original Estimate: 3h
>  Remaining Estimate: 3h
>
> h1. Python Timedelta to PySpark DayTimeIntervalType bug
> There is a bug that exists which means certain Python datetime.timedelta 
> objects get converted to a PySpark DayTimeIntervalType column with a 
> different value to that which is stored in the Python timedelta.
> A simple illustrative example can be produced with the below code:
>  
> {code:java}
> from datetime import timedelta
> from pyspark.sql.types import DayTimeIntervalType, StructField, StructType
> spark = ...spark session setup here...
> td = timedelta(days=4498031, seconds=16054, microseconds=81)
> df = spark.createDataFrame([(td,)], 
> StructType([StructField(name="timedelta_col", dataType=DayTimeIntervalType(), 
> nullable=False)]))
> df.show(truncate=False)
> > ++
> > |timedelta_col                                   | 
> > ++ 
> > |INTERVAL '4498031 04:27:35.81' DAY TO SECOND| 
> > ++
> print(str(td))
> >  '4498031 days, 4:27:34.81' {code}
> In the above example, look at the seconds. The original python timedelta 
> object has 34 seconds, the pyspark DayTimeIntervalType column has 35 seconds.
> h1. Fix
> This issue arises because the current conversion from python timedelta uses 
> the timedelta function `.total_seconds()` to get the number of seconds, and 
> then adds the microsecond component back in afterwards. Unfortunately the 
> `.total_seconds()` function with some timedeltas (ones with microsecond 
> entries close to 1_000_000 I believe) ends up rounding *up* to the nearest 
> second (probably due to floating point precision), with the microseconds then 
> added on top of that. The effect is that 1 second gets added incorrectly.
> The issue can be fixed by doing the processing in a slightly different way. 
> Instead of doing:
>  
> {code:java}
> (math.floor(dt.total_seconds()) * 100) + dt.microseconds{code}
>  
> Instead we construct the timedelta from its components:
>  
> {code:java}
> (((dt.days * 86400) + dt.seconds) * 1_000_000) + dt.microseconds {code}
>  
> h1. Tests
> An illustrative edge case example for timedeltas is the above (which can also 
> be written as `datetime.timedelta(microseconds=38862989445481)`)
>  
> A related edge case which is already handled but not tested exists for the 
> situation where there are positive and negative components to the created 
> timedelta object. An entry for this edge case is also included as it is 
> related.
> h1. PR
> Link to the PR addressing this issue: 
> https://github.com/apache/spark/pull/42541
> h1. Keywords to help people searching for this issue:
> datetime.timedelta
> timedelta
> pyspark.sql.types.DayTimeIntervalType
> DayTimeIntervalType
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44903) Refine docstring of `approx_count_distinct`

2023-08-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44903:
-
Affects Version/s: 3.5.0

> Refine docstring of `approx_count_distinct`
> ---
>
> Key: SPARK-44903
> URL: https://issues.apache.org/jira/browse/SPARK-44903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44903) Refine docstring of `approx_count_distinct`

2023-08-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-44903:
-
Component/s: Documentation
 PySpark
 (was: Pandas API on Spark)

> Refine docstring of `approx_count_distinct`
> ---
>
> Key: SPARK-44903
> URL: https://issues.apache.org/jira/browse/SPARK-44903
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44903) Refine docstring of `approx_count_distinct`

2023-08-21 Thread Yang Jie (Jira)
Yang Jie created SPARK-44903:


 Summary: Refine docstring of `approx_count_distinct`
 Key: SPARK-44903
 URL: https://issues.apache.org/jira/browse/SPARK-44903
 Project: Spark
  Issue Type: Sub-task
  Components: Pandas API on Spark
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44751) XML: Implement FIleFormat Interface

2023-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44751:


Assignee: Sandip Agarwala

> XML: Implement FIleFormat Interface
> ---
>
> Key: SPARK-44751
> URL: https://issues.apache.org/jira/browse/SPARK-44751
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
>
> This will also address most of the review comments from the first XML PR:
> https://github.com/apache/spark/pull/41832



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44751) XML: Implement FIleFormat Interface

2023-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44751.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42462
[https://github.com/apache/spark/pull/42462]

> XML: Implement FIleFormat Interface
> ---
>
> Key: SPARK-44751
> URL: https://issues.apache.org/jira/browse/SPARK-44751
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Sandip Agarwala
>Assignee: Sandip Agarwala
>Priority: Major
> Fix For: 4.0.0
>
>
> This will also address most of the review comments from the first XML PR:
> https://github.com/apache/spark/pull/41832



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44214) Support Spark Driver Live Log UI

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44214:
-

Assignee: Dongjoon Hyun

> Support Spark Driver Live Log UI
> 
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44214) Support Spark Driver Live Log UI

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44214.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42542
[https://github.com/apache/spark/pull/42542]

> Support Spark Driver Live Log UI
> 
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44902) The precision of LongDecimal is inconsistent with Hive.

2023-08-21 Thread Zhen Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhen Wang updated SPARK-44902:
--
Description: 
The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads to 
type conversion errors in some cases.

 

Relevant code:

[https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51]

[https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76]

 

Reproduce:

create table and view in hive:
{code:java}
create table t (value bigint);
create view v as select value * 0.1 from t; {code}
read in spark:
{code:java}
select * from v; {code}
error occurred:
{code:java}
org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up 
cast `(value * 0.1)` from "DECIMAL(22,1)" to "DECIMAL(21,1)".The type path of 
the target object is:
You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object   at 
org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3627)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3658)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3635)
 {code}

  was:
The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads to 
type conversion errors in some cases.

 

Relevant code:

[https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51]

[https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76]

 

Reproduce:

create table and view in hive:

 
{code:java}
create table t (value bigint);
create view v as select value * 0.1 from t; {code}
read in spark:

 

 
{code:java}
select * from v; {code}
error occurred:

 

 
{code:java}
org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up 
cast `(value * 0.1)` from "DECIMAL(22,1)" to "DECIMAL(21,1)".The type path of 
the target object is:
You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object   at 
org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3627)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3658)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3635)
 {code}
 

 


> The precision of LongDecimal is inconsistent with Hive.
> ---
>
> Key: SPARK-44902
> URL: https://issues.apache.org/jira/browse/SPARK-44902
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: Zhen Wang
>Priority: Major
>
> The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads 
> to type conversion errors in some cases.
>  
> Relevant code:
> [https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51]
> [https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76]
>  
> Reproduce:
> create table and view in hive:
> {code:java}
> create table t (value bigint);
> create view v as select value * 0.1 from t; {code}
> read in spark:
> {code:java}
> select * from v; {code}
> error occurred:
> {code:java}
> org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] 

[jira] [Created] (SPARK-44902) The precision of LongDecimal is inconsistent with Hive.

2023-08-21 Thread Zhen Wang (Jira)
Zhen Wang created SPARK-44902:
-

 Summary: The precision of LongDecimal is inconsistent with Hive.
 Key: SPARK-44902
 URL: https://issues.apache.org/jira/browse/SPARK-44902
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Zhen Wang


The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads to 
type conversion errors in some cases.

 

Relevant code:

[https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51]

[https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76]

 

Reproduce:

create table and view in hive:

 
{code:java}
create table t (value bigint);
create view v as select value * 0.1 from t; {code}
read in spark:

 

 
{code:java}
select * from v; {code}
error occurred:

 

 
{code:java}
org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up 
cast `(value * 0.1)` from "DECIMAL(22,1)" to "DECIMAL(21,1)".The type path of 
the target object is:
You can either add an explicit cast to the input data or choose a higher 
precision type of the field in the target object   at 
org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285)
 at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3627)
  at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3658)
at 
org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3635)
 {code}
 

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44856) Improve Python UDTF arrow serializer performance

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-44856:
-

Assignee: Michael Zhang

> Improve Python UDTF arrow serializer performance
> 
>
> Key: SPARK-44856
> URL: https://issues.apache.org/jira/browse/SPARK-44856
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Assignee: Michael Zhang
>Priority: Major
>
> Currently, there is a lot of overhead in the arrow serializer for Python 
> UDTFs. The overhead is largely from converting arrow batches into pandas 
> series and converting UDTF's results back to a pandas dataframe.
> We should try directly converting Python object into arrow and vice versa to 
> avoid the expensive pandas conversion. Similar to this converter: 
> [https://github.com/apache/spark/blob/be04ac1ace91f6da34b08a1510e41d3ab6f0377b/python/pyspark/sql/connect/conversion.py#L56]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44900) Cached DataFrame keeps growing

2023-08-21 Thread Varun Nalla (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Varun Nalla updated SPARK-44900:

Priority: Critical  (was: Major)

> Cached DataFrame keeps growing
> --
>
> Key: SPARK-44900
> URL: https://issues.apache.org/jira/browse/SPARK-44900
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.0
>Reporter: Varun Nalla
>Priority: Critical
>
> Scenario :
> We have a kafka streaming application where the data lookups are happening by 
> joining  another DF which is cached, and the caching strategy is 
> MEMORY_AND_DISK.
> However the size of the cached DataFrame keeps on growing for every micro 
> batch the streaming application process and that's being visible under 
> storage tab.
> A similar stack overflow thread was already raised.
> https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44776) Add ProducedRowCount to SparkListenerConnectOperationFinished

2023-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-44776:


Assignee: Lingkai Kong

> Add ProducedRowCount to SparkListenerConnectOperationFinished
> -
>
> Key: SPARK-44776
> URL: https://issues.apache.org/jira/browse/SPARK-44776
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Lingkai Kong
>Assignee: Lingkai Kong
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> As title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44776) Add ProducedRowCount to SparkListenerConnectOperationFinished

2023-08-21 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-44776.
--
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42454
[https://github.com/apache/spark/pull/42454]

> Add ProducedRowCount to SparkListenerConnectOperationFinished
> -
>
> Key: SPARK-44776
> URL: https://issues.apache.org/jira/browse/SPARK-44776
> Project: Spark
>  Issue Type: Task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Lingkai Kong
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>
> As title



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43506) Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43506:
-

Assignee: Haejoon Lee

> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.
> ---
>
> Key: SPARK-43506
> URL: https://issues.apache.org/jira/browse/SPARK-43506
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43451) Enable RollingTests.test_rolling_count for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43451.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42551
[https://github.com/apache/spark/pull/42551]

> Enable RollingTests.test_rolling_count for pandas 2.0.0.
> 
>
> Key: SPARK-43451
> URL: https://issues.apache.org/jira/browse/SPARK-43451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable RollingTests.test_rolling_count for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43506) Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43506.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42551
[https://github.com/apache/spark/pull/42551]

> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.
> ---
>
> Key: SPARK-43506
> URL: https://issues.apache.org/jira/browse/SPARK-43506
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43451) Enable RollingTests.test_rolling_count for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43451:
-

Assignee: Haejoon Lee

> Enable RollingTests.test_rolling_count for pandas 2.0.0.
> 
>
> Key: SPARK-43451
> URL: https://issues.apache.org/jira/browse/SPARK-43451
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable RollingTests.test_rolling_count for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43563) Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43563:
-

Assignee: Haejoon Lee

> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.
> 
>
> Key: SPARK-43563
> URL: https://issues.apache.org/jira/browse/SPARK-43563
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-43459) Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-43459:
-

Assignee: Haejoon Lee

> Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0.
> 
>
> Key: SPARK-43459
> URL: https://issues.apache.org/jira/browse/SPARK-43459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
>
> Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43563) Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43563.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42551
[https://github.com/apache/spark/pull/42551]

> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.
> 
>
> Key: SPARK-43563
> URL: https://issues.apache.org/jira/browse/SPARK-43563
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-43459) Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0.

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-43459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-43459.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42551
[https://github.com/apache/spark/pull/42551]

> Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0.
> 
>
> Key: SPARK-43459
> URL: https://issues.apache.org/jira/browse/SPARK-43459
> Project: Spark
>  Issue Type: Sub-task
>  Components: Pandas API on Spark
>Affects Versions: 4.0.0
>Reporter: Haejoon Lee
>Assignee: Haejoon Lee
>Priority: Major
> Fix For: 4.0.0
>
>
> Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas 
> 2.0.0.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44901) Add API in 'analyze' method to return partitioning/ordering expressions

2023-08-21 Thread Daniel (Jira)
Daniel created SPARK-44901:
--

 Summary: Add API in 'analyze' method to return 
partitioning/ordering expressions
 Key: SPARK-44901
 URL: https://issues.apache.org/jira/browse/SPARK-44901
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 4.0.0
Reporter: Daniel






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-44891.
---
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42584
[https://github.com/apache/spark/pull/42584]

> Enable Doctests of `rand`, `randn` and `log`
> 
>
> Key: SPARK-44891
> URL: https://issues.apache.org/jira/browse/SPARK-44891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-44891:
-

Assignee: Ruifeng Zheng

> Enable Doctests of `rand`, `randn` and `log`
> 
>
> Key: SPARK-44891
> URL: https://issues.apache.org/jira/browse/SPARK-44891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44214) Support Spark Driver Live Log UI

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44214:
--
Summary: Support Spark Driver Live Log UI  (was: Add driver log live UI)

> Support Spark Driver Live Log UI
> 
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44900) Cached DataFrame keeps growing

2023-08-21 Thread Varun Nalla (Jira)
Varun Nalla created SPARK-44900:
---

 Summary: Cached DataFrame keeps growing
 Key: SPARK-44900
 URL: https://issues.apache.org/jira/browse/SPARK-44900
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 2.4.0
Reporter: Varun Nalla


Scenario :

We have a kafka streaming application where the data lookups are happening by 
joining  another DF which is cached, and the caching strategy is 
MEMORY_AND_DISK.

However the size of the cached DataFrame keeps on growing for every micro batch 
the streaming application process and that's being visible under storage tab.

A similar stack overflow thread was already raised.

https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44748) Query execution to support PARTITION BY and ORDER BY clause for table arguments

2023-08-21 Thread Takuya Ueshin (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takuya Ueshin resolved SPARK-44748.
---
Fix Version/s: 4.0.0
 Assignee: Daniel
   Resolution: Fixed

Issue resolved by pull request 42420
https://github.com/apache/spark/pull/42420

> Query execution to support PARTITION BY and ORDER BY clause for table 
> arguments
> ---
>
> Key: SPARK-44748
> URL: https://issues.apache.org/jira/browse/SPARK-44748
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Daniel
>Assignee: Daniel
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44214) Add driver log live UI

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44214:
--
Summary: Add driver log live UI  (was: Add driver log live UI for K8s 
environment)

> Add driver log live UI
> --
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes, Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44214) Add driver log live UI

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44214:
--
Affects Version/s: 4.0.0
   (was: 3.5.0)

> Add driver log live UI
> --
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44214) Add driver log live UI

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-44214:
--
Component/s: (was: Kubernetes)

> Add driver log live UI
> --
>
> Key: SPARK-44214
> URL: https://issues.apache.org/jira/browse/SPARK-44214
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core, Web UI
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44898) Upgrade `gcs-connector` to 2.2.17

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-44898:
-

Assignee: Dongjoon Hyun

> Upgrade `gcs-connector` to 2.2.17
> -
>
> Key: SPARK-44898
> URL: https://issues.apache.org/jira/browse/SPARK-44898
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44898) Upgrade `gcs-connector` to 2.2.17

2023-08-21 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-44898.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42588
[https://github.com/apache/spark/pull/42588]

> Upgrade `gcs-connector` to 2.2.17
> -
>
> Key: SPARK-44898
> URL: https://issues.apache.org/jira/browse/SPARK-44898
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 4.0.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44899) Refine the docstring of `DataFrame.collect`

2023-08-21 Thread Allison Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Allison Wang updated SPARK-44899:
-
Summary: Refine the docstring of `DataFrame.collect`  (was: Refine the 
docstring of `DataFrame.collect()`)

> Refine the docstring of `DataFrame.collect`
> ---
>
> Key: SPARK-44899
> URL: https://issues.apache.org/jira/browse/SPARK-44899
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Allison Wang
>Priority: Major
>
> Make the docstring of DataFrame.collect() better and add more examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44899) Refine the docstring of `DataFrame.collect()`

2023-08-21 Thread Allison Wang (Jira)
Allison Wang created SPARK-44899:


 Summary: Refine the docstring of `DataFrame.collect()`
 Key: SPARK-44899
 URL: https://issues.apache.org/jira/browse/SPARK-44899
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Allison Wang


Make the docstring of DataFrame.collect() better and add more examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44818) Fix race for pending interrupt issued before taskThread is initialized

2023-08-21 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen resolved SPARK-44818.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42504
[https://github.com/apache/spark/pull/42504]

> Fix race for pending interrupt issued before taskThread is initialized
> --
>
> Key: SPARK-44818
> URL: https://issues.apache.org/jira/browse/SPARK-44818
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
> Fix For: 4.0.0
>
>
> Fix race for pending interrupt issued before taskThread is initialized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44818) Fix race for pending interrupt issued before taskThread is initialized

2023-08-21 Thread Josh Rosen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Josh Rosen reassigned SPARK-44818:
--

Assignee: Anish Shrigondekar

> Fix race for pending interrupt issued before taskThread is initialized
> --
>
> Key: SPARK-44818
> URL: https://issues.apache.org/jira/browse/SPARK-44818
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, Structured Streaming
>Affects Versions: 3.5.1
>Reporter: Anish Shrigondekar
>Assignee: Anish Shrigondekar
>Priority: Major
>
> Fix race for pending interrupt issued before taskThread is initialized



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener

2023-08-21 Thread Raghu Angadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Raghu Angadi resolved SPARK-44460.
--
Resolution: Won't Fix

Not an issue in Apache Spark. 

> Pass user auth credential to Python workers for foreachBatch and listener
> -
>
> Key: SPARK-44460
> URL: https://issues.apache.org/jira/browse/SPARK-44460
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
>
> No user specific credentials are sent to Python worker that runs user 
> functions like foreachBatch() and streaming listener. 
> We might need to pass in these. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener

2023-08-21 Thread Wei Liu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757087#comment-17757087
 ] 

Wei Liu commented on SPARK-44460:
-

[~rangadi] This seems to be a Databricks internal issue. See the updates in 
SC-138245

> Pass user auth credential to Python workers for foreachBatch and listener
> -
>
> Key: SPARK-44460
> URL: https://issues.apache.org/jira/browse/SPARK-44460
> Project: Spark
>  Issue Type: Task
>  Components: Connect, Structured Streaming
>Affects Versions: 3.4.1
>Reporter: Raghu Angadi
>Priority: Major
>
> No user specific credentials are sent to Python worker that runs user 
> functions like foreachBatch() and streaming listener. 
> We might need to pass in these. 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44898) Upgrade `gcs-connector` to 2.2.17

2023-08-21 Thread Dongjoon Hyun (Jira)
Dongjoon Hyun created SPARK-44898:
-

 Summary: Upgrade `gcs-connector` to 2.2.17
 Key: SPARK-44898
 URL: https://issues.apache.org/jira/browse/SPARK-44898
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed

2023-08-21 Thread Dipayan Dev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757052#comment-17757052
 ] 

Dipayan Dev commented on SPARK-44884:
-

There is no reason to disable this feature in Spark 3.3.0. We have a lot of 
downstream applications that are dependent on the _SUCCESS file and this 
feature change wasn't mention anywhere in the release. Any workaround for this 
or anyway I can contribute? [~ste...@apache.org] 

> Spark doesn't create SUCCESS file when external path is passed
> --
>
> Key: SPARK-44884
> URL: https://issues.apache.org/jira/browse/SPARK-44884
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dipayan Dev
>Priority: Critical
> Attachments: image-2023-08-20-18-08-38-531.png, 
> image-2023-08-20-18-46-53-342.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0
> Code to reproduce the issue.
>  
> {code:java}
> scala> spark.conf.set("spark.sql.orc.char.enabled", true)
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path", 
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name")
> 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.   {code}
> The above code succeeds and creates the External Hive table, but {*}there is 
> no SUCCESS file generated{*}. The same code when running spark 2.4.0, 
> generating a SUCCESS file.
> Adding the content of the bucket after table creation
>  
> !image-2023-08-20-18-08-38-531.png|width=453,height=162!
>  
> But when I don’t pass the external path as following, the SUCCESS file is 
> generated
> {code:java}
> scala> 
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1")
>  {code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed

2023-08-21 Thread Dipayan Dev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757050#comment-17757050
 ] 

Dipayan Dev commented on SPARK-44884:
-

[~ste...@apache.org] , have set that also, but still no _SUCCESS file when we 
pass an external path. I am not using any custom committer. Its the default 
Hadoop-mapreduce one. Can you please point me to the code? 
{code:java}
spark.conf.set("spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs", 
true) {code}

> Spark doesn't create SUCCESS file when external path is passed
> --
>
> Key: SPARK-44884
> URL: https://issues.apache.org/jira/browse/SPARK-44884
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dipayan Dev
>Priority: Critical
> Attachments: image-2023-08-20-18-08-38-531.png, 
> image-2023-08-20-18-46-53-342.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0
> Code to reproduce the issue.
>  
> {code:java}
> scala> spark.conf.set("spark.sql.orc.char.enabled", true)
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path", 
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name")
> 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.   {code}
> The above code succeeds and creates the External Hive table, but {*}there is 
> no SUCCESS file generated{*}. The same code when running spark 2.4.0, 
> generating a SUCCESS file.
> Adding the content of the bucket after table creation
>  
> !image-2023-08-20-18-08-38-531.png|width=453,height=162!
>  
> But when I don’t pass the external path as following, the SUCCESS file is 
> generated
> {code:java}
> scala> 
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1")
>  {code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed

2023-08-21 Thread Steve Loughran (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757036#comment-17757036
 ] 

Steve Loughran commented on SPARK-44884:


this is created in the committer; for hadoop-mapreduce ones  
"mapreduce.fileoutputcommitter.marksuccessfuljobs"; is the flag to enable this; 
if it is not being created then it'll be down to how saveAsTable commits work

> Spark doesn't create SUCCESS file when external path is passed
> --
>
> Key: SPARK-44884
> URL: https://issues.apache.org/jira/browse/SPARK-44884
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dipayan Dev
>Priority: Critical
> Attachments: image-2023-08-20-18-08-38-531.png, 
> image-2023-08-20-18-46-53-342.png
>
>
> The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0
> Code to reproduce the issue.
>  
> {code:java}
> scala> spark.conf.set("spark.sql.orc.char.enabled", true)
> scala> val DF = Seq(("test1", 123)).toDF("name", "num")
> scala> DF.write.option("path", 
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name")
> 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, 
> since hive.security.authorization.manager is set to instance of 
> HiveAuthorizerFactory.   {code}
> The above code succeeds and creates the External Hive table, but {*}there is 
> no SUCCESS file generated{*}. The same code when running spark 2.4.0, 
> generating a SUCCESS file.
> Adding the content of the bucket after table creation
>  
> !image-2023-08-20-18-08-38-531.png|width=453,height=162!
>  
> But when I don’t pass the external path as following, the SUCCESS file is 
> generated
> {code:java}
> scala> 
> DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1")
>  {code}
> !image-2023-08-20-18-46-53-342.png|width=465,height=166!
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44897) Local Property Propagation to Subquery Broadcast Exec

2023-08-21 Thread Michael Chen (Jira)
Michael Chen created SPARK-44897:


 Summary: Local Property Propagation to Subquery Broadcast Exec
 Key: SPARK-44897
 URL: https://issues.apache.org/jira/browse/SPARK-44897
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.4.0
Reporter: Michael Chen


https://issues.apache.org/jira/browse/SPARK-32748 was opened and then I believe 
mistakenly reverted to address this issue. The claim was local properties 
propagation in SubqueryBroadcastExec to the dynamic pruning thread is not 
necessary because they will be propagated by broadcast threads anyways. 
However, in a scenario where the dynamic pruning thread is first to initialize 
the broadcast relation future, the local properties will not be propagated 
correctly. This is because the local properties being propagated to the 
broadcast threads would already be incorrect.
I do not have a good way of reproducing this consistently because generally the 
SubqueryBroadcastExec is not the first to initialize the broadcast relation 
future, but by adding a Thread.sleep(1) into the doPrepare method of 
SubqueryBroadcastExec, the following test always fails.
{code:java}
withSQLConf(StaticSQLConf.SUBQUERY_BROADCAST_MAX_THREAD_THRESHOLD.key -> "1") {
  withTable("a", "b") {
val confKey = "spark.sql.y"
val confValue1 = UUID.randomUUID().toString()
val confValue2 = UUID.randomUUID().toString()
Seq((confValue1, "1")).toDF("key", "value")
  .write
  .format("parquet")
  .partitionBy("key")
  .mode("overwrite")
  .saveAsTable("a")
val df1 = spark.table("a")

def generateBroadcastDataFrame(confKey: String, confValue: String): 
Dataset[String] = {
  val df = spark.range(1).mapPartitions { _ =>
Iterator(TaskContext.get.getLocalProperty(confKey))
  }.filter($"value".contains(confValue)).as("c")
  df.hint("broadcast")
}

// set local property and assert
val df2 = generateBroadcastDataFrame(confKey, confValue1)
spark.sparkContext.setLocalProperty(confKey, confValue1)
val checkDF = df1.join(df2).where($"a.key" === $"c.value").select($"a.key", 
$"c.value")
val checks = checkDF.collect()
assert(checks.forall(_.toSeq == Seq(confValue1, confValue1)))

// change local property and re-assert
Seq((confValue2, "1")).toDF("key", "value")
  .write
  .format("parquet")
  .partitionBy("key")
  .mode("overwrite")
  .saveAsTable("b")
val df3 = spark.table("b")
val df4 = generateBroadcastDataFrame(confKey, confValue2)
spark.sparkContext.setLocalProperty(confKey, confValue2)
val checks2DF = df3.join(df4).where($"b.key" === 
$"c.value").select($"b.key", $"c.value")
val checks2 = checks2DF.collect()
assert(checks2.forall(_.toSeq == Seq(confValue2, confValue2)))
assert(checks2.nonEmpty)
  }
} {code}



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-43288) DataSourceV2: CREATE TABLE LIKE

2023-08-21 Thread Ignite TC Bot (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-43288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756860#comment-17756860
 ] 

Ignite TC Bot commented on SPARK-43288:
---

User 'Hisoka-X' has created a pull request for this issue:
https://github.com/apache/spark/pull/42586

> DataSourceV2: CREATE TABLE LIKE
> ---
>
> Key: SPARK-43288
> URL: https://issues.apache.org/jira/browse/SPARK-43288
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.4.0
>Reporter: John Zhuge
>Priority: Major
>
> Support CREATE TABLE LIKE in DSv2.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44896) Consider adding information os_prio, cpu, elapsed, tid, nid, etc., from the jstack tool

2023-08-21 Thread Kent Yao (Jira)
Kent Yao created SPARK-44896:


 Summary: Consider adding information os_prio, cpu, elapsed, tid, 
nid, etc.,  from the jstack tool
 Key: SPARK-44896
 URL: https://issues.apache.org/jira/browse/SPARK-44896
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44894) Upgrade tink to 1.10

2023-08-21 Thread Yang Jie (Jira)
Yang Jie created SPARK-44894:


 Summary: Upgrade tink to 1.10
 Key: SPARK-44894
 URL: https://issues.apache.org/jira/browse/SPARK-44894
 Project: Spark
  Issue Type: Improvement
  Components: Build
Affects Versions: 4.0.0
Reporter: Yang Jie






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44895) Considering 'daemon', 'priority' from higher JDKs for ThreadStackTrace class

2023-08-21 Thread Kent Yao (Jira)
Kent Yao created SPARK-44895:


 Summary: Considering 'daemon', 'priority' from higher JDKs for 
ThreadStackTrace class
 Key: SPARK-44895
 URL: https://issues.apache.org/jira/browse/SPARK-44895
 Project: Spark
  Issue Type: Sub-task
  Components: Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao


{code:java}
jshell> var t = java.lang.management.ManagementFactory.getThreadMXBean()t ==> 
com.sun.management.internal.HotSpotThreadImpl@7daf6ecc
jshell> var tt = t.dumpAllThreads(true, true)tt ==> ThreadInfo[10] { "main" 
prio=5 Id=1 RUNNABLEat  ... k$NonfairSync@27fa135a
 }
jshell> for (java.lang.management.ThreadInfo t1: tt) 
{System.out.println(t1.toString());}"main" prio=5 Id=1 RUNNABLEat 
java.management@20.0.1/sun.management.ThreadImpl.dumpThreads0(Native Method) at 
java.management@20.0.1/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:540)
 at 
java.management@20.0.1/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:527)
 at REPL.$JShell$12.do_it$Aux($JShell$12.java:7) at 
REPL.$JShell$12.do_it$($JShell$12.java:11)   at 
java.base@20.0.1/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder)
  at 
java.base@20.0.1/java.lang.invoke.LambdaForm$MH/0x007001008c00.invoke(LambdaForm$MH)
 at 
java.base@20.0.1/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder)
...

"Reference Handler" daemon prio=10 Id=8 RUNNABLEat 
java.base@20.0.1/java.lang.ref.Reference.waitForReferencePendingList(Native 
Method)  at 
java.base@20.0.1/java.lang.ref.Reference.processPendingReferences(Reference.java:246)
at 
java.base@20.0.1/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:208)
 {code}
the `daemon prio=10` is not available for ThreadInfo of jdk8

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44883) Spark insertInto with location GCS bucket root causes NPE

2023-08-21 Thread Steve Loughran (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Steve Loughran resolved SPARK-44883.

Resolution: Duplicate

> Spark insertInto with location GCS bucket root causes NPE
> -
>
> Key: SPARK-44883
> URL: https://issues.apache.org/jira/browse/SPARK-44883
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.0
>Reporter: Dipayan Dev
>Priority: Minor
>
> In our Organisation, we are using GCS bucket root location to point to our 
> Hive table. Dataproc's latest 2.1 uses *Spark* *3.3.0* and this needs to be 
> fixed.
> Spark Scala code to reproduce this issue
> {noformat}
> val DF = Seq(("test1", 123)).toDF("name", "num")
> DF.write.option("path", 
> "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name")
> val DF1 = Seq(("test2", 125)).toDF("name", "num")
> DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name")
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:141)
>   at org.apache.hadoop.fs.Path.(Path.java:120)
>   at org.apache.hadoop.fs.Path.suffix(Path.java:441)
>   at 
> org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254)
>  {noformat}
> Looks like the issue is coming from Hadoop Path. 
> {noformat}
> scala> import org.apache.hadoop.fs.Path
> import org.apache.hadoop.fs.Path
> scala> val path: Path = new Path("gs://test_dd123/")
> path: org.apache.hadoop.fs.Path = gs://test_dd123/
> scala> path.suffix("/num=123")
> java.lang.NullPointerException
>   at org.apache.hadoop.fs.Path.(Path.java:150)
>   at org.apache.hadoop.fs.Path.(Path.java:129)
>   at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat}
> Path.suffix throughs NPE when writing into GS buckets root. 
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44893) ThreadInfo improvements for monitoring APIs

2023-08-21 Thread Kent Yao (Jira)
Kent Yao created SPARK-44893:


 Summary: ThreadInfo improvements for monitoring APIs
 Key: SPARK-44893
 URL: https://issues.apache.org/jira/browse/SPARK-44893
 Project: Spark
  Issue Type: Umbrella
  Components: Spark Core, Web UI
Affects Versions: 4.0.0
Reporter: Kent Yao






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`

2023-08-21 Thread Hudson (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756827#comment-17756827
 ] 

Hudson commented on SPARK-44891:


User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/42584

> Enable Doctests of `rand`, `randn` and `log`
> 
>
> Key: SPARK-44891
> URL: https://issues.apache.org/jira/browse/SPARK-44891
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44892) Add official image Dockerfile for Spark 3.3.3

2023-08-21 Thread Yuming Wang (Jira)
Yuming Wang created SPARK-44892:
---

 Summary: Add official image Dockerfile for Spark 3.3.3
 Key: SPARK-44892
 URL: https://issues.apache.org/jira/browse/SPARK-44892
 Project: Spark
  Issue Type: Sub-task
  Components: Spark Docker
Affects Versions: 3.3.3
Reporter: Yuming Wang






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`

2023-08-21 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44891:
-

 Summary: Enable Doctests of `rand`, `randn` and `log`
 Key: SPARK-44891
 URL: https://issues.apache.org/jira/browse/SPARK-44891
 Project: Spark
  Issue Type: Sub-task
  Components: Connect, PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44889) Fix docstring of `monotonically_increasing_id`

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng resolved SPARK-44889.
---
Fix Version/s: 3.5.0
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 42582
[https://github.com/apache/spark/pull/42582]

> Fix docstring of `monotonically_increasing_id`
> --
>
> Key: SPARK-44889
> URL: https://issues.apache.org/jira/browse/SPARK-44889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
> Fix For: 3.5.0, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44889) Fix docstring of `monotonically_increasing_id`

2023-08-21 Thread Ruifeng Zheng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ruifeng Zheng reassigned SPARK-44889:
-

Assignee: Ruifeng Zheng

> Fix docstring of `monotonically_increasing_id`
> --
>
> Key: SPARK-44889
> URL: https://issues.apache.org/jira/browse/SPARK-44889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Assignee: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-44888) Need to update the golden files of SQLQueryTestSuite for Java 21.

2023-08-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie reassigned SPARK-44888:


Assignee: Yang Jie

> Need to update the golden files of SQLQueryTestSuite for Java 21.
> -
>
> Key: SPARK-44888
> URL: https://issues.apache.org/jira/browse/SPARK-44888
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-44888) Need to update the golden files of SQLQueryTestSuite for Java 21.

2023-08-21 Thread Yang Jie (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie resolved SPARK-44888.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 42580
[https://github.com/apache/spark/pull/42580]

> Need to update the golden files of SQLQueryTestSuite for Java 21.
> -
>
> Key: SPARK-44888
> URL: https://issues.apache.org/jira/browse/SPARK-44888
> Project: Spark
>  Issue Type: Bug
>  Components: SQL, Tests
>Affects Versions: 4.0.0
>Reporter: Yang Jie
>Assignee: Yang Jie
>Priority: Major
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44840) array_insert() give wrong results for ngative index

2023-08-21 Thread Aparna Garg (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756783#comment-17756783
 ] 

Aparna Garg commented on SPARK-44840:
-

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/42564

> array_insert() give wrong results for ngative index
> ---
>
> Key: SPARK-44840
> URL: https://issues.apache.org/jira/browse/SPARK-44840
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.4.0
>Reporter: Serge Rielau
>Assignee: Max Gekk
>Priority: Major
>
> Unlike in Snowflake we decided that array_inert() is 1 based.
> This means 1 is the first element in an array and -1 is the last. 
> This matches the behavior of functions such as substr() and element_at().
>  
> {code:java}
> > SELECT array_insert(array('a', 'b', 'c'), 1, 'z');
> ["z","a","b","c"]
> > SELECT array_insert(array('a', 'b', 'c'), 0, 'z');
> Error
> > SELECT array_insert(array('a', 'b', 'c'), -1, 'z');
> ["a","b","c","z"]
> > SELECT array_insert(array('a', 'b', 'c'), 5, 'z');
> ["a","b","c",NULL,"z"]
> > SELECT array_insert(array('a', 'b', 'c'), -5, 'z');
> ["z",NULL,"a","b","c"]
> > SELECT array_insert(array('a', 'b', 'c'), 2, cast(NULL AS STRING));
> ["a",NULL,"b","c"]
> {code}
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44889) Fix docstring of `monotonically_increasing_id`

2023-08-21 Thread Aparna Garg (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756778#comment-17756778
 ] 

Aparna Garg commented on SPARK-44889:
-

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/42582

> Fix docstring of `monotonically_increasing_id`
> --
>
> Key: SPARK-44889
> URL: https://issues.apache.org/jira/browse/SPARK-44889
> Project: Spark
>  Issue Type: Sub-task
>  Components: Documentation, PySpark
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Ruifeng Zheng
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-44890) Miswritten remarks in pom file

2023-08-21 Thread chenyu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-44890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756746#comment-17756746
 ] 

chenyu commented on SPARK-44890:


I had submit a patch 

https://github.com/apache/spark/pull/42583

 

> Miswritten remarks in pom file
> --
>
> Key: SPARK-44890
> URL: https://issues.apache.org/jira/browse/SPARK-44890
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: chenyu
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> Spelling issues in pom files affect understanding which uses 'dont update'.
> It needs to maintain the same writing style as other places



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44890) Miswritten remarks in pom file

2023-08-21 Thread chenyu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

chenyu updated SPARK-44890:
---
Attachment: screenshot-1.png

> Miswritten remarks in pom file
> --
>
> Key: SPARK-44890
> URL: https://issues.apache.org/jira/browse/SPARK-44890
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 3.4.1
>Reporter: chenyu
>Priority: Minor
> Attachments: screenshot-1.png
>
>
> Spelling issues in pom files affect understanding which uses 'dont update'.
> It needs to maintain the same writing style as other places



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44890) Miswritten remarks in pom file

2023-08-21 Thread chenyu (Jira)
chenyu created SPARK-44890:
--

 Summary: Miswritten remarks in pom file
 Key: SPARK-44890
 URL: https://issues.apache.org/jira/browse/SPARK-44890
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.4.1
Reporter: chenyu


Spelling issues in pom files affect understanding which uses 'dont update'.

It needs to maintain the same writing style as other places



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44813) The JIRA Python misses our assignee when it searches user again

2023-08-21 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44813:

Fix Version/s: 3.3.4
   (was: 3.3.3)

> The JIRA Python misses our assignee when it searches user again
> ---
>
> Key: SPARK-44813
> URL: https://issues.apache.org/jira/browse/SPARK-44813
> Project: Spark
>  Issue Type: Bug
>  Components: Project Infra
>Affects Versions: 4.0.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.4.2, 3.5.0, 4.0.0, 3.3.4
>
>
> {code:java}
> >>> assignee = asf_jira.user("yao")
> >>> "SPARK-44801"'SPARK-44801'
> >>> asf_jira.assign_issue(issue.key, assignee.name)
> response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' 
> cannot be assigned issues."}} {code}
>  
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-44857) Fix getBaseURI error in Spark Worker LogPage UI buttons

2023-08-21 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-44857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-44857:

Fix Version/s: 3.3.4
   (was: 3.3.3)

> Fix getBaseURI error in Spark Worker LogPage UI buttons
> ---
>
> Key: SPARK-44857
> URL: https://issues.apache.org/jira/browse/SPARK-44857
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, Web UI
>Affects Versions: 3.2.0, 3.2.4, 3.3.2, 3.4.1, 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.4.2, 3.5.0, 4.0.0, 3.3.4
>
> Attachments: Screenshot 2023-08-17 at 2.38.45 PM.png
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-44889) Fix docstring of `monotonically_increasing_id`

2023-08-21 Thread Ruifeng Zheng (Jira)
Ruifeng Zheng created SPARK-44889:
-

 Summary: Fix docstring of `monotonically_increasing_id`
 Key: SPARK-44889
 URL: https://issues.apache.org/jira/browse/SPARK-44889
 Project: Spark
  Issue Type: Sub-task
  Components: Documentation, PySpark
Affects Versions: 3.5.0, 4.0.0
Reporter: Ruifeng Zheng






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org