[jira] [Created] (SPARK-44906) Move substituteAppNExecIds logic into kubernetesConf.annotations method
Binjie Yang created SPARK-44906: --- Summary: Move substituteAppNExecIds logic into kubernetesConf.annotations method Key: SPARK-44906 URL: https://issues.apache.org/jira/browse/SPARK-44906 Project: Spark Issue Type: Improvement Components: Kubernetes Affects Versions: 3.4.1 Reporter: Binjie Yang Move Utils. SubstituteAppNExecIds logic into KubernetesConf.annotations as the default logic, easy for users to reuse, rather than to rewrite it again at the same logic. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44905) NullPointerException on stateful expression evaluation
Kent Yao created SPARK-44905: Summary: NullPointerException on stateful expression evaluation Key: SPARK-44905 URL: https://issues.apache.org/jira/browse/SPARK-44905 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.1, 3.5.0, 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44728) Improve PySpark documentations
[ https://issues.apache.org/jira/browse/SPARK-44728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757209#comment-17757209 ] Ruifeng Zheng commented on SPARK-44728: --- A good docstring should contain the following sections: # Brief Description: A concise summary explaining the function's purpose. # xVersion Annotations: Annotations like versionadded and versionchanged to signify the addition or modifications of the function in different versions of the software. # Parameters: This section should list and describe all input parameters. If the function doesn't accept any parameters, this section can be omitted. # Returns: Detail what the function returns. If the function doesn't return anything, this section can be omitted. # See Also: A list of related API functions or methods. This section can be omitted if no related APIs exist. # Notes: Include additional information or warnings about the function's usage here. # Examples: Every example should begin with a brief description, followed by the example code, and conclude with the expected output. Any necessary import statements should be included at the beginning of each example. > Improve PySpark documentations > -- > > Key: SPARK-44728 > URL: https://issues.apache.org/jira/browse/SPARK-44728 > Project: Spark > Issue Type: Umbrella > Components: PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Priority: Major > > An umbrella Jira ticket to improve the PySpark documentation. > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44904) Correct the ‘versionadded’ of `sql.functions.approx_percentile` to 3.5.0.
[ https://issues.apache.org/jira/browse/SPARK-44904?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44904: - Summary: Correct the ‘versionadded’ of `sql.functions.approx_percentile` to 3.5.0. (was: Correct the ‘versionchanged’ of `sql.functions.approx_percentile` to 3.5.0.) > Correct the ‘versionadded’ of `sql.functions.approx_percentile` to 3.5.0. > - > > Key: SPARK-44904 > URL: https://issues.apache.org/jira/browse/SPARK-44904 > Project: Spark > Issue Type: Improvement > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Yang Jie >Priority: Minor > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44904) Correct the ‘versionchanged’ of `sql.functions.approx_percentile` to 3.5.0.
Yang Jie created SPARK-44904: Summary: Correct the ‘versionchanged’ of `sql.functions.approx_percentile` to 3.5.0. Key: SPARK-44904 URL: https://issues.apache.org/jira/browse/SPARK-44904 Project: Spark Issue Type: Improvement Components: Documentation, PySpark Affects Versions: 3.5.0, 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43327) Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
[ https://issues.apache.org/jira/browse/SPARK-43327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-43327. - Fix Version/s: 3.3.4 Resolution: Fixed Issue resolved by pull request 41154 [https://github.com/apache/spark/pull/41154] > Trigger `committer.setupJob` before plan execute in `FileFormatWriter` > -- > > Key: SPARK-43327 > URL: https://issues.apache.org/jira/browse/SPARK-43327 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 >Reporter: ming95 >Assignee: ming95 >Priority: Major > Fix For: 3.3.4 > > > In this jira, the case where `outputOrdering` might not work if AQE is > enabled has been resolved. > https://issues.apache.org/jira/browse/SPARK-40588 > However, since it materializes the AQE plan in advance (triggers > getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not > execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with > an error. > Normally this step should be executed after committer.setupJob(job). > This may eventually result in the insertoverwrite directory being deleted. > > {code:java} > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.spark.sql.QueryTest > import org.apache.spark.sql.catalyst.TableIdentifier > sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC") > sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC") > sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 644164") > sql("set spark.sql.ansi.enabled=true") > val loc = > > spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location > val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration) > println("Location exists: " + fs.exists(new Path(loc))) > try { > sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " + > "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by > amt1)") > } finally { > println("Location exists: " + fs.exists(new Path(loc))) > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43327) Trigger `committer.setupJob` before plan execute in `FileFormatWriter`
[ https://issues.apache.org/jira/browse/SPARK-43327?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-43327: --- Assignee: ming95 > Trigger `committer.setupJob` before plan execute in `FileFormatWriter` > -- > > Key: SPARK-43327 > URL: https://issues.apache.org/jira/browse/SPARK-43327 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.2.3 >Reporter: ming95 >Assignee: ming95 >Priority: Major > > In this jira, the case where `outputOrdering` might not work if AQE is > enabled has been resolved. > https://issues.apache.org/jira/browse/SPARK-40588 > However, since it materializes the AQE plan in advance (triggers > getFinalPhysicalPlan) , it may cause the committer.setupJob(job) to not > execute When `AdaptiveSparkPlanExec#getFinalPhysicalPlan()` is executed with > an error. > Normally this step should be executed after committer.setupJob(job). > This may eventually result in the insertoverwrite directory being deleted. > > {code:java} > import org.apache.hadoop.fs.{FileSystem, Path} > import org.apache.spark.sql.QueryTest > import org.apache.spark.sql.catalyst.TableIdentifier > sql("CREATE TABLE IF NOT EXISTS spark32_overwrite(amt1 int) STORED AS ORC") > sql("CREATE TABLE IF NOT EXISTS spark32_overwrite2(amt1 long) STORED AS ORC") > sql("INSERT OVERWRITE TABLE spark32_overwrite2 select 644164") > sql("set spark.sql.ansi.enabled=true") > val loc = > > spark.sessionState.catalog.getTableMetadata(TableIdentifier("spark32_overwrite")).location > val fs = FileSystem.get(loc, spark.sparkContext.hadoopConfiguration) > println("Location exists: " + fs.exists(new Path(loc))) > try { > sql("INSERT OVERWRITE TABLE spark32_overwrite select amt1 from " + > "(select cast(amt1 as int) as amt1 from spark32_overwrite2 distribute by > amt1)") > } finally { > println("Location exists: " + fs.exists(new Path(loc))) > } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44854) Python timedelta to DayTimeIntervalType edge cases bug
[ https://issues.apache.org/jira/browse/SPARK-44854?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44854. -- Fix Version/s: 3.5.0 4.0.0 3.4.2 Resolution: Fixed Issue resolved by pull request 42541 [https://github.com/apache/spark/pull/42541] > Python timedelta to DayTimeIntervalType edge cases bug > -- > > Key: SPARK-44854 > URL: https://issues.apache.org/jira/browse/SPARK-44854 > Project: Spark > Issue Type: Bug > Components: PySpark >Affects Versions: 3.4.0 >Reporter: Ocean HD >Priority: Minor > Labels: pull-request-available > Fix For: 3.5.0, 4.0.0, 3.4.2 > > Original Estimate: 3h > Remaining Estimate: 3h > > h1. Python Timedelta to PySpark DayTimeIntervalType bug > There is a bug that exists which means certain Python datetime.timedelta > objects get converted to a PySpark DayTimeIntervalType column with a > different value to that which is stored in the Python timedelta. > A simple illustrative example can be produced with the below code: > > {code:java} > from datetime import timedelta > from pyspark.sql.types import DayTimeIntervalType, StructField, StructType > spark = ...spark session setup here... > td = timedelta(days=4498031, seconds=16054, microseconds=81) > df = spark.createDataFrame([(td,)], > StructType([StructField(name="timedelta_col", dataType=DayTimeIntervalType(), > nullable=False)])) > df.show(truncate=False) > > ++ > > |timedelta_col | > > ++ > > |INTERVAL '4498031 04:27:35.81' DAY TO SECOND| > > ++ > print(str(td)) > > '4498031 days, 4:27:34.81' {code} > In the above example, look at the seconds. The original python timedelta > object has 34 seconds, the pyspark DayTimeIntervalType column has 35 seconds. > h1. Fix > This issue arises because the current conversion from python timedelta uses > the timedelta function `.total_seconds()` to get the number of seconds, and > then adds the microsecond component back in afterwards. Unfortunately the > `.total_seconds()` function with some timedeltas (ones with microsecond > entries close to 1_000_000 I believe) ends up rounding *up* to the nearest > second (probably due to floating point precision), with the microseconds then > added on top of that. The effect is that 1 second gets added incorrectly. > The issue can be fixed by doing the processing in a slightly different way. > Instead of doing: > > {code:java} > (math.floor(dt.total_seconds()) * 100) + dt.microseconds{code} > > Instead we construct the timedelta from its components: > > {code:java} > (((dt.days * 86400) + dt.seconds) * 1_000_000) + dt.microseconds {code} > > h1. Tests > An illustrative edge case example for timedeltas is the above (which can also > be written as `datetime.timedelta(microseconds=38862989445481)`) > > A related edge case which is already handled but not tested exists for the > situation where there are positive and negative components to the created > timedelta object. An entry for this edge case is also included as it is > related. > h1. PR > Link to the PR addressing this issue: > https://github.com/apache/spark/pull/42541 > h1. Keywords to help people searching for this issue: > datetime.timedelta > timedelta > pyspark.sql.types.DayTimeIntervalType > DayTimeIntervalType > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44903) Refine docstring of `approx_count_distinct`
[ https://issues.apache.org/jira/browse/SPARK-44903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44903: - Affects Version/s: 3.5.0 > Refine docstring of `approx_count_distinct` > --- > > Key: SPARK-44903 > URL: https://issues.apache.org/jira/browse/SPARK-44903 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44903) Refine docstring of `approx_count_distinct`
[ https://issues.apache.org/jira/browse/SPARK-44903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie updated SPARK-44903: - Component/s: Documentation PySpark (was: Pandas API on Spark) > Refine docstring of `approx_count_distinct` > --- > > Key: SPARK-44903 > URL: https://issues.apache.org/jira/browse/SPARK-44903 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 4.0.0 >Reporter: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44903) Refine docstring of `approx_count_distinct`
Yang Jie created SPARK-44903: Summary: Refine docstring of `approx_count_distinct` Key: SPARK-44903 URL: https://issues.apache.org/jira/browse/SPARK-44903 Project: Spark Issue Type: Sub-task Components: Pandas API on Spark Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44751) XML: Implement FIleFormat Interface
[ https://issues.apache.org/jira/browse/SPARK-44751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44751: Assignee: Sandip Agarwala > XML: Implement FIleFormat Interface > --- > > Key: SPARK-44751 > URL: https://issues.apache.org/jira/browse/SPARK-44751 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > > This will also address most of the review comments from the first XML PR: > https://github.com/apache/spark/pull/41832 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44751) XML: Implement FIleFormat Interface
[ https://issues.apache.org/jira/browse/SPARK-44751?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44751. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42462 [https://github.com/apache/spark/pull/42462] > XML: Implement FIleFormat Interface > --- > > Key: SPARK-44751 > URL: https://issues.apache.org/jira/browse/SPARK-44751 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Sandip Agarwala >Assignee: Sandip Agarwala >Priority: Major > Fix For: 4.0.0 > > > This will also address most of the review comments from the first XML PR: > https://github.com/apache/spark/pull/41832 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44214) Support Spark Driver Live Log UI
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44214: - Assignee: Dongjoon Hyun > Support Spark Driver Live Log UI > > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44214) Support Spark Driver Live Log UI
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44214. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42542 [https://github.com/apache/spark/pull/42542] > Support Spark Driver Live Log UI > > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44902) The precision of LongDecimal is inconsistent with Hive.
[ https://issues.apache.org/jira/browse/SPARK-44902?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhen Wang updated SPARK-44902: -- Description: The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads to type conversion errors in some cases. Relevant code: [https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51] [https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76] Reproduce: create table and view in hive: {code:java} create table t (value bigint); create view v as select value * 0.1 from t; {code} read in spark: {code:java} select * from v; {code} error occurred: {code:java} org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast `(value * 0.1)` from "DECIMAL(22,1)" to "DECIMAL(21,1)".The type path of the target object is: You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object at org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3627) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3658) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3635) {code} was: The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads to type conversion errors in some cases. Relevant code: [https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51] [https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76] Reproduce: create table and view in hive: {code:java} create table t (value bigint); create view v as select value * 0.1 from t; {code} read in spark: {code:java} select * from v; {code} error occurred: {code:java} org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast `(value * 0.1)` from "DECIMAL(22,1)" to "DECIMAL(21,1)".The type path of the target object is: You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object at org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3627) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3658) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3635) {code} > The precision of LongDecimal is inconsistent with Hive. > --- > > Key: SPARK-44902 > URL: https://issues.apache.org/jira/browse/SPARK-44902 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 3.4.0 >Reporter: Zhen Wang >Priority: Major > > The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads > to type conversion errors in some cases. > > Relevant code: > [https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51] > [https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76] > > Reproduce: > create table and view in hive: > {code:java} > create table t (value bigint); > create view v as select value * 0.1 from t; {code} > read in spark: > {code:java} > select * from v; {code} > error occurred: > {code:java} > org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE]
[jira] [Created] (SPARK-44902) The precision of LongDecimal is inconsistent with Hive.
Zhen Wang created SPARK-44902: - Summary: The precision of LongDecimal is inconsistent with Hive. Key: SPARK-44902 URL: https://issues.apache.org/jira/browse/SPARK-44902 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Zhen Wang The precision of LongDecimal in Hive is 19 but it is 20 in Spark. This leads to type conversion errors in some cases. Relevant code: [https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129|https://github.com/apache/spark/blob/4646991abd7f4a47a1b8712e2017a2fae98f7c5a/sql/api/src/main/scala/org/apache/spark/sql/types/DecimalType.scala#L129C51-L129C51] [https://github.com/apache/hive/blob/3d3acc7a19399d749a39818573a76a0dbbaf2598/serde/src/java/org/apache/hadoop/hive/serde2/typeinfo/HiveDecimalUtils.java#L76] Reproduce: create table and view in hive: {code:java} create table t (value bigint); create view v as select value * 0.1 from t; {code} read in spark: {code:java} select * from v; {code} error occurred: {code:java} org.apache.spark.sql.AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast `(value * 0.1)` from "DECIMAL(22,1)" to "DECIMAL(21,1)".The type path of the target object is: You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object at org.apache.spark.sql.errors.QueryCompilationErrors$.upCastFailureError(QueryCompilationErrors.scala:285) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$.org$apache$spark$sql$catalyst$analysis$Analyzer$ResolveUpCast$$fail(Analyzer.scala:3627) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3658) at org.apache.spark.sql.catalyst.analysis.Analyzer$ResolveUpCast$$anonfun$apply$57$$anonfun$applyOrElse$235.applyOrElse(Analyzer.scala:3635) {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44856) Improve Python UDTF arrow serializer performance
[ https://issues.apache.org/jira/browse/SPARK-44856?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44856: - Assignee: Michael Zhang > Improve Python UDTF arrow serializer performance > > > Key: SPARK-44856 > URL: https://issues.apache.org/jira/browse/SPARK-44856 > Project: Spark > Issue Type: Sub-task > Components: PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Assignee: Michael Zhang >Priority: Major > > Currently, there is a lot of overhead in the arrow serializer for Python > UDTFs. The overhead is largely from converting arrow batches into pandas > series and converting UDTF's results back to a pandas dataframe. > We should try directly converting Python object into arrow and vice versa to > avoid the expensive pandas conversion. Similar to this converter: > [https://github.com/apache/spark/blob/be04ac1ace91f6da34b08a1510e41d3ab6f0377b/python/pyspark/sql/connect/conversion.py#L56] > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44900) Cached DataFrame keeps growing
[ https://issues.apache.org/jira/browse/SPARK-44900?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Varun Nalla updated SPARK-44900: Priority: Critical (was: Major) > Cached DataFrame keeps growing > -- > > Key: SPARK-44900 > URL: https://issues.apache.org/jira/browse/SPARK-44900 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 2.4.0 >Reporter: Varun Nalla >Priority: Critical > > Scenario : > We have a kafka streaming application where the data lookups are happening by > joining another DF which is cached, and the caching strategy is > MEMORY_AND_DISK. > However the size of the cached DataFrame keeps on growing for every micro > batch the streaming application process and that's being visible under > storage tab. > A similar stack overflow thread was already raised. > https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44776) Add ProducedRowCount to SparkListenerConnectOperationFinished
[ https://issues.apache.org/jira/browse/SPARK-44776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-44776: Assignee: Lingkai Kong > Add ProducedRowCount to SparkListenerConnectOperationFinished > - > > Key: SPARK-44776 > URL: https://issues.apache.org/jira/browse/SPARK-44776 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Lingkai Kong >Assignee: Lingkai Kong >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > As title -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44776) Add ProducedRowCount to SparkListenerConnectOperationFinished
[ https://issues.apache.org/jira/browse/SPARK-44776?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-44776. -- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42454 [https://github.com/apache/spark/pull/42454] > Add ProducedRowCount to SparkListenerConnectOperationFinished > - > > Key: SPARK-44776 > URL: https://issues.apache.org/jira/browse/SPARK-44776 > Project: Spark > Issue Type: Task > Components: Connect >Affects Versions: 3.4.1 >Reporter: Lingkai Kong >Priority: Major > Fix For: 3.5.0, 4.0.0 > > > As title -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43506) Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43506: - Assignee: Haejoon Lee > Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0. > --- > > Key: SPARK-43506 > URL: https://issues.apache.org/jira/browse/SPARK-43506 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43451) Enable RollingTests.test_rolling_count for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43451. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42551 [https://github.com/apache/spark/pull/42551] > Enable RollingTests.test_rolling_count for pandas 2.0.0. > > > Key: SPARK-43451 > URL: https://issues.apache.org/jira/browse/SPARK-43451 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable RollingTests.test_rolling_count for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43506) Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43506. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42551 [https://github.com/apache/spark/pull/42551] > Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0. > --- > > Key: SPARK-43506 > URL: https://issues.apache.org/jira/browse/SPARK-43506 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable ArrowTests.test_toPandas_empty_columns for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43451) Enable RollingTests.test_rolling_count for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43451: - Assignee: Haejoon Lee > Enable RollingTests.test_rolling_count for pandas 2.0.0. > > > Key: SPARK-43451 > URL: https://issues.apache.org/jira/browse/SPARK-43451 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable RollingTests.test_rolling_count for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43563) Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43563: - Assignee: Haejoon Lee > Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0. > > > Key: SPARK-43563 > URL: https://issues.apache.org/jira/browse/SPARK-43563 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-43459) Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-43459: - Assignee: Haejoon Lee > Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0. > > > Key: SPARK-43459 > URL: https://issues.apache.org/jira/browse/SPARK-43459 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > > Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas > 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43563) Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43563. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42551 [https://github.com/apache/spark/pull/42551] > Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0. > > > Key: SPARK-43563 > URL: https://issues.apache.org/jira/browse/SPARK-43563 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable CsvTests.test_read_csv_with_squeeze for pandas 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-43459) Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0.
[ https://issues.apache.org/jira/browse/SPARK-43459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-43459. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42551 [https://github.com/apache/spark/pull/42551] > Enable OpsOnDiffFramesGroupByTests for pandas 2.0.0. > > > Key: SPARK-43459 > URL: https://issues.apache.org/jira/browse/SPARK-43459 > Project: Spark > Issue Type: Sub-task > Components: Pandas API on Spark >Affects Versions: 4.0.0 >Reporter: Haejoon Lee >Assignee: Haejoon Lee >Priority: Major > Fix For: 4.0.0 > > > Enable OpsOnDiffFramesGroupByTests.test_groupby_multiindex_columns for pandas > 2.0.0. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44901) Add API in 'analyze' method to return partitioning/ordering expressions
Daniel created SPARK-44901: -- Summary: Add API in 'analyze' method to return partitioning/ordering expressions Key: SPARK-44901 URL: https://issues.apache.org/jira/browse/SPARK-44901 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 4.0.0 Reporter: Daniel -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`
[ https://issues.apache.org/jira/browse/SPARK-44891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-44891. --- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42584 [https://github.com/apache/spark/pull/42584] > Enable Doctests of `rand`, `randn` and `log` > > > Key: SPARK-44891 > URL: https://issues.apache.org/jira/browse/SPARK-44891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`
[ https://issues.apache.org/jira/browse/SPARK-44891?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44891: - Assignee: Ruifeng Zheng > Enable Doctests of `rand`, `randn` and `log` > > > Key: SPARK-44891 > URL: https://issues.apache.org/jira/browse/SPARK-44891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44214) Support Spark Driver Live Log UI
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44214: -- Summary: Support Spark Driver Live Log UI (was: Add driver log live UI) > Support Spark Driver Live Log UI > > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44900) Cached DataFrame keeps growing
Varun Nalla created SPARK-44900: --- Summary: Cached DataFrame keeps growing Key: SPARK-44900 URL: https://issues.apache.org/jira/browse/SPARK-44900 Project: Spark Issue Type: Bug Components: Spark Core Affects Versions: 2.4.0 Reporter: Varun Nalla Scenario : We have a kafka streaming application where the data lookups are happening by joining another DF which is cached, and the caching strategy is MEMORY_AND_DISK. However the size of the cached DataFrame keeps on growing for every micro batch the streaming application process and that's being visible under storage tab. A similar stack overflow thread was already raised. https://stackoverflow.com/questions/55601779/spark-dataframe-cache-keeps-growing -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44748) Query execution to support PARTITION BY and ORDER BY clause for table arguments
[ https://issues.apache.org/jira/browse/SPARK-44748?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Takuya Ueshin resolved SPARK-44748. --- Fix Version/s: 4.0.0 Assignee: Daniel Resolution: Fixed Issue resolved by pull request 42420 https://github.com/apache/spark/pull/42420 > Query execution to support PARTITION BY and ORDER BY clause for table > arguments > --- > > Key: SPARK-44748 > URL: https://issues.apache.org/jira/browse/SPARK-44748 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 4.0.0 >Reporter: Daniel >Assignee: Daniel >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44214) Add driver log live UI
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44214: -- Summary: Add driver log live UI (was: Add driver log live UI for K8s environment) > Add driver log live UI > -- > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Kubernetes, Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44214) Add driver log live UI
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44214: -- Affects Version/s: 4.0.0 (was: 3.5.0) > Add driver log live UI > -- > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44214) Add driver log live UI
[ https://issues.apache.org/jira/browse/SPARK-44214?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-44214: -- Component/s: (was: Kubernetes) > Add driver log live UI > -- > > Key: SPARK-44214 > URL: https://issues.apache.org/jira/browse/SPARK-44214 > Project: Spark > Issue Type: Improvement > Components: Spark Core, Web UI >Affects Versions: 3.5.0 >Reporter: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44898) Upgrade `gcs-connector` to 2.2.17
[ https://issues.apache.org/jira/browse/SPARK-44898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-44898: - Assignee: Dongjoon Hyun > Upgrade `gcs-connector` to 2.2.17 > - > > Key: SPARK-44898 > URL: https://issues.apache.org/jira/browse/SPARK-44898 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44898) Upgrade `gcs-connector` to 2.2.17
[ https://issues.apache.org/jira/browse/SPARK-44898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-44898. --- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42588 [https://github.com/apache/spark/pull/42588] > Upgrade `gcs-connector` to 2.2.17 > - > > Key: SPARK-44898 > URL: https://issues.apache.org/jira/browse/SPARK-44898 > Project: Spark > Issue Type: Improvement > Components: Build >Affects Versions: 4.0.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44899) Refine the docstring of `DataFrame.collect`
[ https://issues.apache.org/jira/browse/SPARK-44899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Allison Wang updated SPARK-44899: - Summary: Refine the docstring of `DataFrame.collect` (was: Refine the docstring of `DataFrame.collect()`) > Refine the docstring of `DataFrame.collect` > --- > > Key: SPARK-44899 > URL: https://issues.apache.org/jira/browse/SPARK-44899 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Allison Wang >Priority: Major > > Make the docstring of DataFrame.collect() better and add more examples. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44899) Refine the docstring of `DataFrame.collect()`
Allison Wang created SPARK-44899: Summary: Refine the docstring of `DataFrame.collect()` Key: SPARK-44899 URL: https://issues.apache.org/jira/browse/SPARK-44899 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.5.0, 4.0.0 Reporter: Allison Wang Make the docstring of DataFrame.collect() better and add more examples. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44818) Fix race for pending interrupt issued before taskThread is initialized
[ https://issues.apache.org/jira/browse/SPARK-44818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen resolved SPARK-44818. Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42504 [https://github.com/apache/spark/pull/42504] > Fix race for pending interrupt issued before taskThread is initialized > -- > > Key: SPARK-44818 > URL: https://issues.apache.org/jira/browse/SPARK-44818 > Project: Spark > Issue Type: Task > Components: Spark Core, Structured Streaming >Affects Versions: 3.5.1 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > Fix For: 4.0.0 > > > Fix race for pending interrupt issued before taskThread is initialized -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44818) Fix race for pending interrupt issued before taskThread is initialized
[ https://issues.apache.org/jira/browse/SPARK-44818?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Josh Rosen reassigned SPARK-44818: -- Assignee: Anish Shrigondekar > Fix race for pending interrupt issued before taskThread is initialized > -- > > Key: SPARK-44818 > URL: https://issues.apache.org/jira/browse/SPARK-44818 > Project: Spark > Issue Type: Task > Components: Spark Core, Structured Streaming >Affects Versions: 3.5.1 >Reporter: Anish Shrigondekar >Assignee: Anish Shrigondekar >Priority: Major > > Fix race for pending interrupt issued before taskThread is initialized -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener
[ https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Raghu Angadi resolved SPARK-44460. -- Resolution: Won't Fix Not an issue in Apache Spark. > Pass user auth credential to Python workers for foreachBatch and listener > - > > Key: SPARK-44460 > URL: https://issues.apache.org/jira/browse/SPARK-44460 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Priority: Major > > No user specific credentials are sent to Python worker that runs user > functions like foreachBatch() and streaming listener. > We might need to pass in these. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44460) Pass user auth credential to Python workers for foreachBatch and listener
[ https://issues.apache.org/jira/browse/SPARK-44460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757087#comment-17757087 ] Wei Liu commented on SPARK-44460: - [~rangadi] This seems to be a Databricks internal issue. See the updates in SC-138245 > Pass user auth credential to Python workers for foreachBatch and listener > - > > Key: SPARK-44460 > URL: https://issues.apache.org/jira/browse/SPARK-44460 > Project: Spark > Issue Type: Task > Components: Connect, Structured Streaming >Affects Versions: 3.4.1 >Reporter: Raghu Angadi >Priority: Major > > No user specific credentials are sent to Python worker that runs user > functions like foreachBatch() and streaming listener. > We might need to pass in these. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44898) Upgrade `gcs-connector` to 2.2.17
Dongjoon Hyun created SPARK-44898: - Summary: Upgrade `gcs-connector` to 2.2.17 Key: SPARK-44898 URL: https://issues.apache.org/jira/browse/SPARK-44898 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Dongjoon Hyun -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed
[ https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757052#comment-17757052 ] Dipayan Dev commented on SPARK-44884: - There is no reason to disable this feature in Spark 3.3.0. We have a lot of downstream applications that are dependent on the _SUCCESS file and this feature change wasn't mention anywhere in the release. Any workaround for this or anyway I can contribute? [~ste...@apache.org] > Spark doesn't create SUCCESS file when external path is passed > -- > > Key: SPARK-44884 > URL: https://issues.apache.org/jira/browse/SPARK-44884 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dipayan Dev >Priority: Critical > Attachments: image-2023-08-20-18-08-38-531.png, > image-2023-08-20-18-46-53-342.png > > > The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0 > Code to reproduce the issue. > > {code:java} > scala> spark.conf.set("spark.sql.orc.char.enabled", true) > scala> val DF = Seq(("test1", 123)).toDF("name", "num") > scala> DF.write.option("path", > "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name") > 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. {code} > The above code succeeds and creates the External Hive table, but {*}there is > no SUCCESS file generated{*}. The same code when running spark 2.4.0, > generating a SUCCESS file. > Adding the content of the bucket after table creation > > !image-2023-08-20-18-08-38-531.png|width=453,height=162! > > But when I don’t pass the external path as following, the SUCCESS file is > generated > {code:java} > scala> > DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1") > {code} > !image-2023-08-20-18-46-53-342.png|width=465,height=166! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed
[ https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757050#comment-17757050 ] Dipayan Dev commented on SPARK-44884: - [~ste...@apache.org] , have set that also, but still no _SUCCESS file when we pass an external path. I am not using any custom committer. Its the default Hadoop-mapreduce one. Can you please point me to the code? {code:java} spark.conf.set("spark.hadoop.mapreduce.fileoutputcommitter.marksuccessfuljobs", true) {code} > Spark doesn't create SUCCESS file when external path is passed > -- > > Key: SPARK-44884 > URL: https://issues.apache.org/jira/browse/SPARK-44884 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dipayan Dev >Priority: Critical > Attachments: image-2023-08-20-18-08-38-531.png, > image-2023-08-20-18-46-53-342.png > > > The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0 > Code to reproduce the issue. > > {code:java} > scala> spark.conf.set("spark.sql.orc.char.enabled", true) > scala> val DF = Seq(("test1", 123)).toDF("name", "num") > scala> DF.write.option("path", > "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name") > 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. {code} > The above code succeeds and creates the External Hive table, but {*}there is > no SUCCESS file generated{*}. The same code when running spark 2.4.0, > generating a SUCCESS file. > Adding the content of the bucket after table creation > > !image-2023-08-20-18-08-38-531.png|width=453,height=162! > > But when I don’t pass the external path as following, the SUCCESS file is > generated > {code:java} > scala> > DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1") > {code} > !image-2023-08-20-18-46-53-342.png|width=465,height=166! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44884) Spark doesn't create SUCCESS file when external path is passed
[ https://issues.apache.org/jira/browse/SPARK-44884?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17757036#comment-17757036 ] Steve Loughran commented on SPARK-44884: this is created in the committer; for hadoop-mapreduce ones "mapreduce.fileoutputcommitter.marksuccessfuljobs"; is the flag to enable this; if it is not being created then it'll be down to how saveAsTable commits work > Spark doesn't create SUCCESS file when external path is passed > -- > > Key: SPARK-44884 > URL: https://issues.apache.org/jira/browse/SPARK-44884 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dipayan Dev >Priority: Critical > Attachments: image-2023-08-20-18-08-38-531.png, > image-2023-08-20-18-46-53-342.png > > > The issue is not happening in Spark 2.x (I am using 2.4.0), but only in 3.3.0 > Code to reproduce the issue. > > {code:java} > scala> spark.conf.set("spark.sql.orc.char.enabled", true) > scala> val DF = Seq(("test1", 123)).toDF("name", "num") > scala> DF.write.option("path", > "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("test_schema.table_name") > 23/08/20 12:31:43 WARN SessionState: METASTORE_FILTER_HOOK will be ignored, > since hive.security.authorization.manager is set to instance of > HiveAuthorizerFactory. {code} > The above code succeeds and creates the External Hive table, but {*}there is > no SUCCESS file generated{*}. The same code when running spark 2.4.0, > generating a SUCCESS file. > Adding the content of the bucket after table creation > > !image-2023-08-20-18-08-38-531.png|width=453,height=162! > > But when I don’t pass the external path as following, the SUCCESS file is > generated > {code:java} > scala> > DF.write.mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("us_wm_supply_chain_rcv_pre_prod.test_tb1") > {code} > !image-2023-08-20-18-46-53-342.png|width=465,height=166! > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44897) Local Property Propagation to Subquery Broadcast Exec
Michael Chen created SPARK-44897: Summary: Local Property Propagation to Subquery Broadcast Exec Key: SPARK-44897 URL: https://issues.apache.org/jira/browse/SPARK-44897 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 3.4.0 Reporter: Michael Chen https://issues.apache.org/jira/browse/SPARK-32748 was opened and then I believe mistakenly reverted to address this issue. The claim was local properties propagation in SubqueryBroadcastExec to the dynamic pruning thread is not necessary because they will be propagated by broadcast threads anyways. However, in a scenario where the dynamic pruning thread is first to initialize the broadcast relation future, the local properties will not be propagated correctly. This is because the local properties being propagated to the broadcast threads would already be incorrect. I do not have a good way of reproducing this consistently because generally the SubqueryBroadcastExec is not the first to initialize the broadcast relation future, but by adding a Thread.sleep(1) into the doPrepare method of SubqueryBroadcastExec, the following test always fails. {code:java} withSQLConf(StaticSQLConf.SUBQUERY_BROADCAST_MAX_THREAD_THRESHOLD.key -> "1") { withTable("a", "b") { val confKey = "spark.sql.y" val confValue1 = UUID.randomUUID().toString() val confValue2 = UUID.randomUUID().toString() Seq((confValue1, "1")).toDF("key", "value") .write .format("parquet") .partitionBy("key") .mode("overwrite") .saveAsTable("a") val df1 = spark.table("a") def generateBroadcastDataFrame(confKey: String, confValue: String): Dataset[String] = { val df = spark.range(1).mapPartitions { _ => Iterator(TaskContext.get.getLocalProperty(confKey)) }.filter($"value".contains(confValue)).as("c") df.hint("broadcast") } // set local property and assert val df2 = generateBroadcastDataFrame(confKey, confValue1) spark.sparkContext.setLocalProperty(confKey, confValue1) val checkDF = df1.join(df2).where($"a.key" === $"c.value").select($"a.key", $"c.value") val checks = checkDF.collect() assert(checks.forall(_.toSeq == Seq(confValue1, confValue1))) // change local property and re-assert Seq((confValue2, "1")).toDF("key", "value") .write .format("parquet") .partitionBy("key") .mode("overwrite") .saveAsTable("b") val df3 = spark.table("b") val df4 = generateBroadcastDataFrame(confKey, confValue2) spark.sparkContext.setLocalProperty(confKey, confValue2) val checks2DF = df3.join(df4).where($"b.key" === $"c.value").select($"b.key", $"c.value") val checks2 = checks2DF.collect() assert(checks2.forall(_.toSeq == Seq(confValue2, confValue2))) assert(checks2.nonEmpty) } } {code} -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-43288) DataSourceV2: CREATE TABLE LIKE
[ https://issues.apache.org/jira/browse/SPARK-43288?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756860#comment-17756860 ] Ignite TC Bot commented on SPARK-43288: --- User 'Hisoka-X' has created a pull request for this issue: https://github.com/apache/spark/pull/42586 > DataSourceV2: CREATE TABLE LIKE > --- > > Key: SPARK-43288 > URL: https://issues.apache.org/jira/browse/SPARK-43288 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.4.0 >Reporter: John Zhuge >Priority: Major > > Support CREATE TABLE LIKE in DSv2. -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44896) Consider adding information os_prio, cpu, elapsed, tid, nid, etc., from the jstack tool
Kent Yao created SPARK-44896: Summary: Consider adding information os_prio, cpu, elapsed, tid, nid, etc., from the jstack tool Key: SPARK-44896 URL: https://issues.apache.org/jira/browse/SPARK-44896 Project: Spark Issue Type: Sub-task Components: Web UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44894) Upgrade tink to 1.10
Yang Jie created SPARK-44894: Summary: Upgrade tink to 1.10 Key: SPARK-44894 URL: https://issues.apache.org/jira/browse/SPARK-44894 Project: Spark Issue Type: Improvement Components: Build Affects Versions: 4.0.0 Reporter: Yang Jie -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44895) Considering 'daemon', 'priority' from higher JDKs for ThreadStackTrace class
Kent Yao created SPARK-44895: Summary: Considering 'daemon', 'priority' from higher JDKs for ThreadStackTrace class Key: SPARK-44895 URL: https://issues.apache.org/jira/browse/SPARK-44895 Project: Spark Issue Type: Sub-task Components: Web UI Affects Versions: 4.0.0 Reporter: Kent Yao {code:java} jshell> var t = java.lang.management.ManagementFactory.getThreadMXBean()t ==> com.sun.management.internal.HotSpotThreadImpl@7daf6ecc jshell> var tt = t.dumpAllThreads(true, true)tt ==> ThreadInfo[10] { "main" prio=5 Id=1 RUNNABLEat ... k$NonfairSync@27fa135a } jshell> for (java.lang.management.ThreadInfo t1: tt) {System.out.println(t1.toString());}"main" prio=5 Id=1 RUNNABLEat java.management@20.0.1/sun.management.ThreadImpl.dumpThreads0(Native Method) at java.management@20.0.1/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:540) at java.management@20.0.1/sun.management.ThreadImpl.dumpAllThreads(ThreadImpl.java:527) at REPL.$JShell$12.do_it$Aux($JShell$12.java:7) at REPL.$JShell$12.do_it$($JShell$12.java:11) at java.base@20.0.1/java.lang.invoke.DirectMethodHandle$Holder.invokeStatic(DirectMethodHandle$Holder) at java.base@20.0.1/java.lang.invoke.LambdaForm$MH/0x007001008c00.invoke(LambdaForm$MH) at java.base@20.0.1/java.lang.invoke.Invokers$Holder.invokeExact_MT(Invokers$Holder) ... "Reference Handler" daemon prio=10 Id=8 RUNNABLEat java.base@20.0.1/java.lang.ref.Reference.waitForReferencePendingList(Native Method) at java.base@20.0.1/java.lang.ref.Reference.processPendingReferences(Reference.java:246) at java.base@20.0.1/java.lang.ref.Reference$ReferenceHandler.run(Reference.java:208) {code} the `daemon prio=10` is not available for ThreadInfo of jdk8 -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44883) Spark insertInto with location GCS bucket root causes NPE
[ https://issues.apache.org/jira/browse/SPARK-44883?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Steve Loughran resolved SPARK-44883. Resolution: Duplicate > Spark insertInto with location GCS bucket root causes NPE > - > > Key: SPARK-44883 > URL: https://issues.apache.org/jira/browse/SPARK-44883 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.3.0 >Reporter: Dipayan Dev >Priority: Minor > > In our Organisation, we are using GCS bucket root location to point to our > Hive table. Dataproc's latest 2.1 uses *Spark* *3.3.0* and this needs to be > fixed. > Spark Scala code to reproduce this issue > {noformat} > val DF = Seq(("test1", 123)).toDF("name", "num") > DF.write.option("path", > "gs://test_dd123/").mode(SaveMode.Overwrite).partitionBy("num").format("orc").saveAsTable("schema_name.table_name") > val DF1 = Seq(("test2", 125)).toDF("name", "num") > DF.write.mode(SaveMode.Overwrite).format("orc").insertInto("schema_name.table_name") > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:141) > at org.apache.hadoop.fs.Path.(Path.java:120) > at org.apache.hadoop.fs.Path.suffix(Path.java:441) > at > org.apache.spark.sql.execution.datasources.InsertIntoHadoopFsRelationCommand.$anonfun$getCustomPartitionLocations$1(InsertIntoHadoopFsRelationCommand.scala:254) > {noformat} > Looks like the issue is coming from Hadoop Path. > {noformat} > scala> import org.apache.hadoop.fs.Path > import org.apache.hadoop.fs.Path > scala> val path: Path = new Path("gs://test_dd123/") > path: org.apache.hadoop.fs.Path = gs://test_dd123/ > scala> path.suffix("/num=123") > java.lang.NullPointerException > at org.apache.hadoop.fs.Path.(Path.java:150) > at org.apache.hadoop.fs.Path.(Path.java:129) > at org.apache.hadoop.fs.Path.suffix(Path.java:450){noformat} > Path.suffix throughs NPE when writing into GS buckets root. > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44893) ThreadInfo improvements for monitoring APIs
Kent Yao created SPARK-44893: Summary: ThreadInfo improvements for monitoring APIs Key: SPARK-44893 URL: https://issues.apache.org/jira/browse/SPARK-44893 Project: Spark Issue Type: Umbrella Components: Spark Core, Web UI Affects Versions: 4.0.0 Reporter: Kent Yao -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`
[ https://issues.apache.org/jira/browse/SPARK-44891?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756827#comment-17756827 ] Hudson commented on SPARK-44891: User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/42584 > Enable Doctests of `rand`, `randn` and `log` > > > Key: SPARK-44891 > URL: https://issues.apache.org/jira/browse/SPARK-44891 > Project: Spark > Issue Type: Sub-task > Components: Connect, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44892) Add official image Dockerfile for Spark 3.3.3
Yuming Wang created SPARK-44892: --- Summary: Add official image Dockerfile for Spark 3.3.3 Key: SPARK-44892 URL: https://issues.apache.org/jira/browse/SPARK-44892 Project: Spark Issue Type: Sub-task Components: Spark Docker Affects Versions: 3.3.3 Reporter: Yuming Wang -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44891) Enable Doctests of `rand`, `randn` and `log`
Ruifeng Zheng created SPARK-44891: - Summary: Enable Doctests of `rand`, `randn` and `log` Key: SPARK-44891 URL: https://issues.apache.org/jira/browse/SPARK-44891 Project: Spark Issue Type: Sub-task Components: Connect, PySpark Affects Versions: 3.5.0, 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44889) Fix docstring of `monotonically_increasing_id`
[ https://issues.apache.org/jira/browse/SPARK-44889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng resolved SPARK-44889. --- Fix Version/s: 3.5.0 4.0.0 Resolution: Fixed Issue resolved by pull request 42582 [https://github.com/apache/spark/pull/42582] > Fix docstring of `monotonically_increasing_id` > -- > > Key: SPARK-44889 > URL: https://issues.apache.org/jira/browse/SPARK-44889 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > Fix For: 3.5.0, 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44889) Fix docstring of `monotonically_increasing_id`
[ https://issues.apache.org/jira/browse/SPARK-44889?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ruifeng Zheng reassigned SPARK-44889: - Assignee: Ruifeng Zheng > Fix docstring of `monotonically_increasing_id` > -- > > Key: SPARK-44889 > URL: https://issues.apache.org/jira/browse/SPARK-44889 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Assignee: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-44888) Need to update the golden files of SQLQueryTestSuite for Java 21.
[ https://issues.apache.org/jira/browse/SPARK-44888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie reassigned SPARK-44888: Assignee: Yang Jie > Need to update the golden files of SQLQueryTestSuite for Java 21. > - > > Key: SPARK-44888 > URL: https://issues.apache.org/jira/browse/SPARK-44888 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-44888) Need to update the golden files of SQLQueryTestSuite for Java 21.
[ https://issues.apache.org/jira/browse/SPARK-44888?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Jie resolved SPARK-44888. -- Fix Version/s: 4.0.0 Resolution: Fixed Issue resolved by pull request 42580 [https://github.com/apache/spark/pull/42580] > Need to update the golden files of SQLQueryTestSuite for Java 21. > - > > Key: SPARK-44888 > URL: https://issues.apache.org/jira/browse/SPARK-44888 > Project: Spark > Issue Type: Bug > Components: SQL, Tests >Affects Versions: 4.0.0 >Reporter: Yang Jie >Assignee: Yang Jie >Priority: Major > Fix For: 4.0.0 > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44840) array_insert() give wrong results for ngative index
[ https://issues.apache.org/jira/browse/SPARK-44840?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756783#comment-17756783 ] Aparna Garg commented on SPARK-44840: - User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/42564 > array_insert() give wrong results for ngative index > --- > > Key: SPARK-44840 > URL: https://issues.apache.org/jira/browse/SPARK-44840 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 3.4.0 >Reporter: Serge Rielau >Assignee: Max Gekk >Priority: Major > > Unlike in Snowflake we decided that array_inert() is 1 based. > This means 1 is the first element in an array and -1 is the last. > This matches the behavior of functions such as substr() and element_at(). > > {code:java} > > SELECT array_insert(array('a', 'b', 'c'), 1, 'z'); > ["z","a","b","c"] > > SELECT array_insert(array('a', 'b', 'c'), 0, 'z'); > Error > > SELECT array_insert(array('a', 'b', 'c'), -1, 'z'); > ["a","b","c","z"] > > SELECT array_insert(array('a', 'b', 'c'), 5, 'z'); > ["a","b","c",NULL,"z"] > > SELECT array_insert(array('a', 'b', 'c'), -5, 'z'); > ["z",NULL,"a","b","c"] > > SELECT array_insert(array('a', 'b', 'c'), 2, cast(NULL AS STRING)); > ["a",NULL,"b","c"] > {code} > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44889) Fix docstring of `monotonically_increasing_id`
[ https://issues.apache.org/jira/browse/SPARK-44889?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756778#comment-17756778 ] Aparna Garg commented on SPARK-44889: - User 'zhengruifeng' has created a pull request for this issue: https://github.com/apache/spark/pull/42582 > Fix docstring of `monotonically_increasing_id` > -- > > Key: SPARK-44889 > URL: https://issues.apache.org/jira/browse/SPARK-44889 > Project: Spark > Issue Type: Sub-task > Components: Documentation, PySpark >Affects Versions: 3.5.0, 4.0.0 >Reporter: Ruifeng Zheng >Priority: Major > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-44890) Miswritten remarks in pom file
[ https://issues.apache.org/jira/browse/SPARK-44890?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17756746#comment-17756746 ] chenyu commented on SPARK-44890: I had submit a patch https://github.com/apache/spark/pull/42583 > Miswritten remarks in pom file > -- > > Key: SPARK-44890 > URL: https://issues.apache.org/jira/browse/SPARK-44890 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1 >Reporter: chenyu >Priority: Minor > Attachments: screenshot-1.png > > > Spelling issues in pom files affect understanding which uses 'dont update'. > It needs to maintain the same writing style as other places -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44890) Miswritten remarks in pom file
[ https://issues.apache.org/jira/browse/SPARK-44890?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] chenyu updated SPARK-44890: --- Attachment: screenshot-1.png > Miswritten remarks in pom file > -- > > Key: SPARK-44890 > URL: https://issues.apache.org/jira/browse/SPARK-44890 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.4.1 >Reporter: chenyu >Priority: Minor > Attachments: screenshot-1.png > > > Spelling issues in pom files affect understanding which uses 'dont update'. > It needs to maintain the same writing style as other places -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44890) Miswritten remarks in pom file
chenyu created SPARK-44890: -- Summary: Miswritten remarks in pom file Key: SPARK-44890 URL: https://issues.apache.org/jira/browse/SPARK-44890 Project: Spark Issue Type: Bug Components: Build Affects Versions: 3.4.1 Reporter: chenyu Spelling issues in pom files affect understanding which uses 'dont update'. It needs to maintain the same writing style as other places -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44813) The JIRA Python misses our assignee when it searches user again
[ https://issues.apache.org/jira/browse/SPARK-44813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44813: Fix Version/s: 3.3.4 (was: 3.3.3) > The JIRA Python misses our assignee when it searches user again > --- > > Key: SPARK-44813 > URL: https://issues.apache.org/jira/browse/SPARK-44813 > Project: Spark > Issue Type: Bug > Components: Project Infra >Affects Versions: 4.0.0 >Reporter: Kent Yao >Assignee: Kent Yao >Priority: Major > Fix For: 3.4.2, 3.5.0, 4.0.0, 3.3.4 > > > {code:java} > >>> assignee = asf_jira.user("yao") > >>> "SPARK-44801"'SPARK-44801' > >>> asf_jira.assign_issue(issue.key, assignee.name) > response text = {"errorMessages":[],"errors":{"assignee":"User 'airhot' > cannot be assigned issues."}} {code} > > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-44857) Fix getBaseURI error in Spark Worker LogPage UI buttons
[ https://issues.apache.org/jira/browse/SPARK-44857?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yuming Wang updated SPARK-44857: Fix Version/s: 3.3.4 (was: 3.3.3) > Fix getBaseURI error in Spark Worker LogPage UI buttons > --- > > Key: SPARK-44857 > URL: https://issues.apache.org/jira/browse/SPARK-44857 > Project: Spark > Issue Type: Bug > Components: Spark Core, Web UI >Affects Versions: 3.2.0, 3.2.4, 3.3.2, 3.4.1, 3.5.0 >Reporter: Dongjoon Hyun >Assignee: Dongjoon Hyun >Priority: Major > Fix For: 3.4.2, 3.5.0, 4.0.0, 3.3.4 > > Attachments: Screenshot 2023-08-17 at 2.38.45 PM.png > > -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-44889) Fix docstring of `monotonically_increasing_id`
Ruifeng Zheng created SPARK-44889: - Summary: Fix docstring of `monotonically_increasing_id` Key: SPARK-44889 URL: https://issues.apache.org/jira/browse/SPARK-44889 Project: Spark Issue Type: Sub-task Components: Documentation, PySpark Affects Versions: 3.5.0, 4.0.0 Reporter: Ruifeng Zheng -- This message was sent by Atlassian Jira (v8.20.10#820010) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org