[jira] [Resolved] (SPARK-47289) Allow extensions to log extended information in explain plan

2024-04-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-47289.
-
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45488
[https://github.com/apache/spark/pull/45488]

> Allow extensions to log extended information in explain plan
> 
>
> Key: SPARK-47289
> URL: https://issues.apache.org/jira/browse/SPARK-47289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> With session extensions, Spark planning can be extended to apply additional 
> rules and modify the execution plan. If an extension replaces a node in the 
> plan, the new node will be displayed in the plan. However, it is sometimes 
> useful for extensions provided extended information to the end user to 
> explain the impact of the extension. For instance an extension may 
> automatically enable/disable some feature that it provides and can provide 
> this extended information in the plan. 
> The proposal is to optionally turn on extended plan information from 
> extensions. Extensions can add additional planning information via a new 
> interface that internally uses a new TreeNodeTag, say 'explainPlan'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47289) Allow extensions to log extended information in explain plan

2024-04-04 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-47289:
---

Assignee: Parth Chandra

> Allow extensions to log extended information in explain plan
> 
>
> Key: SPARK-47289
> URL: https://issues.apache.org/jira/browse/SPARK-47289
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Parth Chandra
>Assignee: Parth Chandra
>Priority: Major
>  Labels: pull-request-available
>
> With session extensions, Spark planning can be extended to apply additional 
> rules and modify the execution plan. If an extension replaces a node in the 
> plan, the new node will be displayed in the plan. However, it is sometimes 
> useful for extensions provided extended information to the end user to 
> explain the impact of the extension. For instance an extension may 
> automatically enable/disable some feature that it provides and can provide 
> this extended information in the plan. 
> The proposal is to optionally turn on extended plan information from 
> extensions. Extensions can add additional planning information via a new 
> interface that internally uses a new TreeNodeTag, say 'explainPlan'.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47718) .sql() does not recognize watermark defined upstream

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47718?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47718:
---
Labels: pull-request-available  (was: )

> .sql() does not recognize watermark defined upstream
> 
>
> Key: SPARK-47718
> URL: https://issues.apache.org/jira/browse/SPARK-47718
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 3.5.1
>Reporter: Chloe He
>Priority: Blocker
>  Labels: pull-request-available
>
> I have a data pipeline set up in such a way that it reads data from a Kafka 
> source, does some transformation on the data using pyspark, then writes the 
> output into a sink (Kafka, Redis, etc).
>  
> My entire pipeline in written in SQL, so I wish to use the .sql() method to 
> execute SQL on my streaming source directly.
>  
> However, I'm running into the issue where my watermark is not being 
> recognized by the downstream query via the .sql() method.
>  
> ```
> Python 3.11.8 | packaged by conda-forge | (main, Feb 16 2024, 20:49:36) 
> [Clang 16.0.6 ] on darwin
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import pyspark
> >>> print(pyspark.__version__)
> 3.5.1
> >>> from pyspark.sql import SparkSession
> >>>
> >>> session = SparkSession.builder \
> ...     .config("spark.jars.packages", 
> "org.apache.spark:spark-sql-kafka-0-10_2.12:3.5.1")\
> ...     .getOrCreate()
> >>> from pyspark.sql.functions import col, from_json
> >>> from pyspark.sql.types import StructField, StructType, TimestampType, 
> >>> LongType, DoubleType, IntegerType
> >>> schema = StructType(
> ...     [
> ...         StructField('createTime', TimestampType(), True),
> ...         StructField('orderId', LongType(), True),
> ...         StructField('payAmount', DoubleType(), True),
> ...         StructField('payPlatform', IntegerType(), True),
> ...         StructField('provinceId', IntegerType(), True),
> ...     ])
> >>>
> >>> streaming_df = session.readStream\
> ...     .format("kafka")\
> ...     .option("kafka.bootstrap.servers", "localhost:9092")\
> ...     .option("subscribe", "payment_msg")\
> ...     .option("startingOffsets","earliest")\
> ...     .load()\
> ...     .select(from_json(col("value").cast("string"), 
> schema).alias("parsed_value"))\
> ...     .select("parsed_value.*")\
> ...     .withWatermark("createTime", "10 seconds")
> >>>
> >>> streaming_df.createOrReplaceTempView("streaming_df")
> >>> session.sql("""
> ... SELECT
> ...     window.start, window.end, provinceId, sum(payAmount) as totalPayAmount
> ...     FROM streaming_df
> ...     GROUP BY provinceId, window('createTime', '1 hour', '30 minutes')
> ...     ORDER BY window.start
> ... """)\
> ...   .writeStream\
> ...   .format("kafka") \
> ...   .option("checkpointLocation", "checkpoint") \
> ...   .option("kafka.bootstrap.servers", "localhost:9092") \
> ...   .option("topic", "sink") \
> ...   .start()
> ```
>  
> This throws exception
> ```
> pyspark.errors.exceptions.captured.AnalysisException: Append output mode not 
> supported when there are streaming aggregations on streaming 
> DataFrames/DataSets without watermark; line 6 pos 4;
> ```
>  
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47735) Make pyspark.testing.connectutils compatible with pyspark-connect

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47735?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47735:
---
Labels: pull-request-available  (was: )

> Make pyspark.testing.connectutils compatible with pyspark-connect
> -
>
> Key: SPARK-47735
> URL: https://issues.apache.org/jira/browse/SPARK-47735
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47735) Make pyspark.testing.connectutils compatible with pyspark-connect

2024-04-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47735:


 Summary: Make pyspark.testing.connectutils compatible with 
pyspark-connect
 Key: SPARK-47735
 URL: https://issues.apache.org/jira/browse/SPARK-47735
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47734) Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query

2024-04-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47734.
--
Fix Version/s: 4.0.0
   3.5.2
   Resolution: Fixed

Fixed in https://github.com/apache/spark/pull/45885

> Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping 
> streaming query
> -
>
> Key: SPARK-47734
> URL: https://issues.apache.org/jira/browse/SPARK-47734
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0, 3.5.2
>
>
> https://issues.apache.org/jira/browse/SPARK-47199 didn't fix the flakiness in 
> the pyspark.sql.dataframe.DataFrame.writeStream doctest : the problem is not 
> that we are colliding on the test but, rather, that the test is starting a 
> background thread to write to a directory then deleting that directory from 
> the main test thread, something which is inherently race prone.
> The fix is simple: stop the streaming query in the doctest itself, similar to 
> other streaming doctest examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47733) Add operational metrics for TWS operators

2024-04-04 Thread Jing Zhan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jing Zhan updated SPARK-47733:
--
Description: 
Add metrics to improve observability for the newly added operator 
TransformWithState and some changes we've made into RocksDB.
Proposed metrics to add:
 * on the RocksDB StateStore metrics side, we will add the following:

 ** num external col families
 ** num internal col families
 * on the operator side, we will add the following:

 ** number of state vars
 ** count of state vars by type
 ** output mode
 ** timeout mode
 ** registered timers in batch
 ** expired timers in batch
 ** initial state enabled or not
 ** number of state vars removed in batch

> Add operational metrics for TWS operators
> -
>
> Key: SPARK-47733
> URL: https://issues.apache.org/jira/browse/SPARK-47733
> Project: Spark
>  Issue Type: Task
>  Components: Structured Streaming
>Affects Versions: 4.0.0
>Reporter: Jing Zhan
>Priority: Major
>  Labels: pull-request-available
>
> Add metrics to improve observability for the newly added operator 
> TransformWithState and some changes we've made into RocksDB.
> Proposed metrics to add:
>  * on the RocksDB StateStore metrics side, we will add the following:
>  ** num external col families
>  ** num internal col families
>  * on the operator side, we will add the following:
>  ** number of state vars
>  ** count of state vars by type
>  ** output mode
>  ** timeout mode
>  ** registered timers in batch
>  ** expired timers in batch
>  ** initial state enabled or not
>  ** number of state vars removed in batch



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47734) Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47734?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47734:
---
Labels: pull-request-available  (was: )

> Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping 
> streaming query
> -
>
> Key: SPARK-47734
> URL: https://issues.apache.org/jira/browse/SPARK-47734
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Josh Rosen
>Assignee: Josh Rosen
>Priority: Major
>  Labels: pull-request-available
>
> https://issues.apache.org/jira/browse/SPARK-47199 didn't fix the flakiness in 
> the pyspark.sql.dataframe.DataFrame.writeStream doctest : the problem is not 
> that we are colliding on the test but, rather, that the test is starting a 
> background thread to write to a directory then deleting that directory from 
> the main test thread, something which is inherently race prone.
> The fix is simple: stop the streaming query in the doctest itself, similar to 
> other streaming doctest examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47734) Fix flaky pyspark.sql.dataframe.DataFrame.writeStream doctest by stopping streaming query

2024-04-04 Thread Josh Rosen (Jira)
Josh Rosen created SPARK-47734:
--

 Summary: Fix flaky pyspark.sql.dataframe.DataFrame.writeStream 
doctest by stopping streaming query
 Key: SPARK-47734
 URL: https://issues.apache.org/jira/browse/SPARK-47734
 Project: Spark
  Issue Type: Improvement
  Components: PySpark, Tests
Affects Versions: 4.0.0
Reporter: Josh Rosen
Assignee: Josh Rosen


https://issues.apache.org/jira/browse/SPARK-47199 didn't fix the flakiness in 
the pyspark.sql.dataframe.DataFrame.writeStream doctest : the problem is not 
that we are colliding on the test but, rather, that the test is starting a 
background thread to write to a directory then deleting that directory from the 
main test thread, something which is inherently race prone.

The fix is simple: stop the streaming query in the doctest itself, similar to 
other streaming doctest examples.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47592) Connector module: Migrate logError with variables to structured logging framework

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47592?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47592:
---
Labels: pull-request-available  (was: )

> Connector module: Migrate logError with variables to structured logging 
> framework
> -
>
> Key: SPARK-47592
> URL: https://issues.apache.org/jira/browse/SPARK-47592
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47598) MLLib: Migrate logError with variables to structured logging framework

2024-04-04 Thread Gengliang Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47598?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gengliang Wang resolved SPARK-47598.

Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45837
[https://github.com/apache/spark/pull/45837]

> MLLib: Migrate logError with variables to structured logging framework
> --
>
> Key: SPARK-47598
> URL: https://issues.apache.org/jira/browse/SPARK-47598
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Assignee: BingKun Pan
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26875) Add an option on FileStreamSource to include modified files

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-26875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-26875:
---
Labels: pull-request-available  (was: )

> Add an option on FileStreamSource to include modified files 
> 
>
> Key: SPARK-26875
> URL: https://issues.apache.org/jira/browse/SPARK-26875
> Project: Spark
>  Issue Type: New Feature
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Mike Dias
>Priority: Minor
>  Labels: pull-request-available
>
> The current behavior only the check the filename to determine if a file 
> should be processed or not. I propose to add an option to also test the file 
> timestamp if is greater than last time it was processed, as an indication 
> that it's modified and have different content. 
> It is useful when the source producer eventually overrides files with new 
> content.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47731) Fix the 2b+ rows in a single rowgroup for row_index in Parquet reader

2024-04-04 Thread Thang Long Vu (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47731?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Thang Long Vu updated SPARK-47731:
--
Description: 
Parquet reader in Spark has a bug where a file containing 2b+ rows in a single 
rowgroup causes it to run out of the `Integer` range. This prevents Delta 
Parquet readers from exposing the row_index field as a metadata field.

It would be great to have this fix so that we can use 2b+ rows in a single 
rowgroup and also to safely allow row_index field to be used in the Delta 
Parquet readers for any functionalities that might depend on it.

Link to the comment in the code: 
https://github.com/delta-io/delta/blob/e3a481bd6c42a4f91686377d78ec9d9c934e27ee/spark/src/main/scala/org/apache/spark/sql/delta/DeltaParquetFileFormat.scala#L200

  was:
Parquet reader in Spark has a bug where a file containing 2b+ rows in a single 
rowgroup causes it to run out of the `Integer` range. This prevents Delta 
Parquet readers from exposing the row_index field as a metadata field.

 

It would be great to have this fix so that we can use 2b+ rows in a single 
rowgroup and also to safely allow row_index field to be used in the Delta 
Parquet readers for any functionalities that might depend on it.


> Fix the 2b+ rows in a single rowgroup for row_index in Parquet reader
> -
>
> Key: SPARK-47731
> URL: https://issues.apache.org/jira/browse/SPARK-47731
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.5.0, 4.0.0
>Reporter: Thang Long Vu
>Priority: Major
>
> Parquet reader in Spark has a bug where a file containing 2b+ rows in a 
> single rowgroup causes it to run out of the `Integer` range. This prevents 
> Delta Parquet readers from exposing the row_index field as a metadata field.
> It would be great to have this fix so that we can use 2b+ rows in a single 
> rowgroup and also to safely allow row_index field to be used in the Delta 
> Parquet readers for any functionalities that might depend on it.
> Link to the comment in the code: 
> https://github.com/delta-io/delta/blob/e3a481bd6c42a4f91686377d78ec9d9c934e27ee/spark/src/main/scala/org/apache/spark/sql/delta/DeltaParquetFileFormat.scala#L200



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47731) Fix the 2b+ rows in a single rowgroup for row_index in Parquet reader

2024-04-04 Thread Thang Long Vu (Jira)
Thang Long Vu created SPARK-47731:
-

 Summary: Fix the 2b+ rows in a single rowgroup for row_index in 
Parquet reader
 Key: SPARK-47731
 URL: https://issues.apache.org/jira/browse/SPARK-47731
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.5.0, 4.0.0
Reporter: Thang Long Vu


Parquet reader in Spark has a bug where a file containing 2b+ rows in a single 
rowgroup causes it to run out of the `Integer` range. This prevents Delta 
Parquet readers from exposing the row_index field as a metadata field.

 

It would be great to have this fix so that we can use 2b+ rows in a single 
rowgroup and also to safely allow row_index field to be used in the Delta 
Parquet readers for any functionalities that might depend on it.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47730) Support APP_ID and EXECUTOR_ID placeholder in labels

2024-04-04 Thread Xi Chen (Jira)
Xi Chen created SPARK-47730:
---

 Summary: Support APP_ID and EXECUTOR_ID placeholder in labels
 Key: SPARK-47730
 URL: https://issues.apache.org/jira/browse/SPARK-47730
 Project: Spark
  Issue Type: Improvement
  Components: k8s
Affects Versions: 3.5.1
Reporter: Xi Chen






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47728) Document G1 Concurrent GC metrics

2024-04-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47728.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45874
[https://github.com/apache/spark/pull/45874]

> Document G1 Concurrent GC metrics
> -
>
> Key: SPARK-47728
> URL: https://issues.apache.org/jira/browse/SPARK-47728
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> This is to document G1 Concurrent GC metrics introduced with 
> https://issues.apache.org/jira/browse/SPARK-44162



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47728) Document G1 Concurrent GC metrics

2024-04-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47728:
-

Assignee: Luca Canali

> Document G1 Concurrent GC metrics
> -
>
> Key: SPARK-47728
> URL: https://issues.apache.org/jira/browse/SPARK-47728
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Assignee: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>
> This is to document G1 Concurrent GC metrics introduced with 
> https://issues.apache.org/jira/browse/SPARK-44162



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47729) Get the proper default port for pyspark-connect testcases

2024-04-04 Thread Dongjoon Hyun (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47729?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47729.
---
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45875
[https://github.com/apache/spark/pull/45875]

> Get the proper default port for pyspark-connect testcases
> -
>
> Key: SPARK-47729
> URL: https://issues.apache.org/jira/browse/SPARK-47729
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, Tests
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Minor
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-47565) PySpark workers dying in daemon mode idle queue fail query

2024-04-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-47565.
--
Fix Version/s: 4.0.0
   Resolution: Fixed

Issue resolved by pull request 45635
[https://github.com/apache/spark/pull/45635]

> PySpark workers dying in daemon mode idle queue fail query
> --
>
> Key: SPARK-47565
> URL: https://issues.apache.org/jira/browse/SPARK-47565
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.2, 3.5.1, 3.3.4
>Reporter: Sebastian Hillig
>Assignee: Nikita Awasthi
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> PySpark workers may die after entering the idle queue in 
> `PythonWorkerFactory`. This may happen because of code that runs in the 
> process, or external factors.
> When drawn from the warmpool, such a worker will result in an I/O exception 
> on the first read/write .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47565) PySpark workers dying in daemon mode idle queue fail query

2024-04-04 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47565?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-47565:


Assignee: Nikita Awasthi

> PySpark workers dying in daemon mode idle queue fail query
> --
>
> Key: SPARK-47565
> URL: https://issues.apache.org/jira/browse/SPARK-47565
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.4.2, 3.5.1, 3.3.4
>Reporter: Sebastian Hillig
>Assignee: Nikita Awasthi
>Priority: Major
>  Labels: pull-request-available
>
> PySpark workers may die after entering the idle queue in 
> `PythonWorkerFactory`. This may happen because of code that runs in the 
> process, or external factors.
> When drawn from the warmpool, such a worker will result in an I/O exception 
> on the first read/write .



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47694) Make max message size configurable on client side

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47694:
--

Assignee: Martin Grund  (was: Apache Spark)

> Make max message size configurable on client side
> -
>
> Key: SPARK-47694
> URL: https://issues.apache.org/jira/browse/SPARK-47694
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Robert Dillitz
>Assignee: Martin Grund
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3
>
>
> Follow-up to SPARK-42816: Make the limit configurable on the client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47694) Make max message size configurable on client side

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47694:
--

Assignee: Apache Spark  (was: Martin Grund)

> Make max message size configurable on client side
> -
>
> Key: SPARK-47694
> URL: https://issues.apache.org/jira/browse/SPARK-47694
> Project: Spark
>  Issue Type: Sub-task
>  Components: Connect
>Affects Versions: 3.4.1
>Reporter: Robert Dillitz
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3
>
>
> Follow-up to SPARK-42816: Make the limit configurable on the client side.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47359:
--

Assignee: (was: Apache Spark)

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47359:
--

Assignee: Apache Spark

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47359:
--

Assignee: (was: Apache Spark)

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47567) StringLocate

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47567:
--

Assignee: Apache Spark

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47567) StringLocate

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47567:
--

Assignee: Apache Spark

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47567) StringLocate

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47567:
--

Assignee: (was: Apache Spark)

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47359) StringTranslate (all collations)

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47359?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47359:
--

Assignee: Apache Spark

> StringTranslate (all collations)
> 
>
> Key: SPARK-47359
> URL: https://issues.apache.org/jira/browse/SPARK-47359
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Uroš Bojanić
>Assignee: Apache Spark
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringTranslate* built-in string function 
> in Spark. First confirm what is the expected behaviour for this function when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringTranslate* function 
> so it supports all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-47567) StringLocate

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47567?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot reassigned SPARK-47567:
--

Assignee: (was: Apache Spark)

> StringLocate
> 
>
> Key: SPARK-47567
> URL: https://issues.apache.org/jira/browse/SPARK-47567
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Milan Dankovic
>Priority: Major
>  Labels: pull-request-available
>
> Enable collation support for the *StringLocate* built-in string function in 
> Spark. First confirm what is the expected behaviour for these functions when 
> given collated strings, and then move on to implementation and testing. One 
> way to go about this is to consider using {_}StringSearch{_}, an efficient 
> ICU service for string matching. Implement the corresponding unit tests 
> (CollationStringExpressionsSuite) and E2E tests (CollationSuite) to reflect 
> how this function should be used with collation in SparkSQL, and feel free to 
> use your chosen Spark SQL Editor to experiment with the existing functions to 
> learn more about how they work. In addition, look into the possible use-cases 
> and implementation of similar functions within other other open-source DBMS, 
> such as [PostgreSQL|https://www.postgresql.org/docs/].
>  
> The goal for this Jira ticket is to implement the *StringLocate* functions so 
> that they support all collation types currently supported in Spark. To 
> understand what changes were introduced in order to enable full collation 
> support for other existing functions in Spark, take a look at the Spark PRs 
> and Jira tickets for completed tasks in this parent (for example: Contains, 
> StartsWith, EndsWith).
>  
> Read more about ICU [Collation Concepts|http://example.com/] and 
> [Collator|http://example.com/] class, as well as _StringSearch_ using the 
> [ICU user 
> guide|https://unicode-org.github.io/icu/userguide/collation/string-search.html]
>  and [ICU 
> docs|https://unicode-org.github.io/icu-docs/apidoc/released/icu4j/com/ibm/icu/text/StringSearch.html].
>  Also, refer to the Unicode Technical Standard for string 
> [searching|https://www.unicode.org/reports/tr10/#Searching] and 
> [collation|https://www.unicode.org/reports/tr35/tr35-collation.html#Collation_Type_Fallback].



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47593) Connector module: Migrate logWarn with variables to structured logging framework

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47593?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47593:
---
Labels: pull-request-available  (was: )

> Connector module: Migrate logWarn with variables to structured logging 
> framework
> 
>
> Key: SPARK-47593
> URL: https://issues.apache.org/jira/browse/SPARK-47593
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47586) Hive module: Migrate logError with variables to structured logging framework

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47586?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47586:
---
Labels: pull-request-available  (was: )

> Hive module: Migrate logError with variables to structured logging framework
> 
>
> Key: SPARK-47586
> URL: https://issues.apache.org/jira/browse/SPARK-47586
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 4.0.0
>Reporter: Gengliang Wang
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47727) Make SparkConf to root level to for both SparkSession and SparkContext

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47727?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47727:
---
Labels: pull-request-available  (was: )

> Make SparkConf to root level to for both SparkSession and SparkContext
> --
>
> Key: SPARK-47727
> URL: https://issues.apache.org/jira/browse/SPARK-47727
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 4.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47728) Document G1 Concurrent GC metrics

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47728?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47728:
---
Labels: pull-request-available  (was: )

> Document G1 Concurrent GC metrics
> -
>
> Key: SPARK-47728
> URL: https://issues.apache.org/jira/browse/SPARK-47728
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 4.0.0
>Reporter: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>
> This is to document G1 Concurrent GC metrics introduced with 
> https://issues.apache.org/jira/browse/SPARK-44162



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47728) Document G1 Concurrent GC metrics

2024-04-04 Thread Luca Canali (Jira)
Luca Canali created SPARK-47728:
---

 Summary: Document G1 Concurrent GC metrics
 Key: SPARK-47728
 URL: https://issues.apache.org/jira/browse/SPARK-47728
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 4.0.0
Reporter: Luca Canali


This is to document G1 Concurrent GC metrics introduced with 
https://issues.apache.org/jira/browse/SPARK-44162



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47727) Make SparkConf to root level to for both SparkSession and SparkContext

2024-04-04 Thread Hyukjin Kwon (Jira)
Hyukjin Kwon created SPARK-47727:


 Summary: Make SparkConf to root level to for both SparkSession and 
SparkContext
 Key: SPARK-47727
 URL: https://issues.apache.org/jira/browse/SPARK-47727
 Project: Spark
  Issue Type: Sub-task
  Components: PySpark
Affects Versions: 4.0.0
Reporter: Hyukjin Kwon






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-47726) Document push-based shuffle metrics

2024-04-04 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-47726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47726:
---
Labels: pull-request-available  (was: )

> Document push-based shuffle metrics
> ---
>
> Key: SPARK-47726
> URL: https://issues.apache.org/jira/browse/SPARK-47726
> Project: Spark
>  Issue Type: Documentation
>  Components: Documentation
>Affects Versions: 3.4.2, 4.0.0, 3.5.1
>Reporter: Luca Canali
>Priority: Minor
>  Labels: pull-request-available
>
> This is to add documentation for the metrics related to push-based shuffle. 
> It's a follow up documentation ticket from: 
> https://issues.apache.org/jira/browse/SPARK-36620
> Related to this, note also: https://issues.apache.org/jira/browse/SPARK-42203



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-47726) Document push-based shuffle metrics

2024-04-04 Thread Luca Canali (Jira)
Luca Canali created SPARK-47726:
---

 Summary: Document push-based shuffle metrics
 Key: SPARK-47726
 URL: https://issues.apache.org/jira/browse/SPARK-47726
 Project: Spark
  Issue Type: Documentation
  Components: Documentation
Affects Versions: 3.5.1, 3.4.2, 4.0.0
Reporter: Luca Canali


This is to add documentation for the metrics related to push-based shuffle. 
It's a follow up documentation ticket from: 
https://issues.apache.org/jira/browse/SPARK-36620

Related to this, note also: https://issues.apache.org/jira/browse/SPARK-42203



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org