date:20240211

[jira] [Resolved] (SPARK-46991) Replace IllegalArgumentException by SparkIllegalArgumentException in catalyst

2024-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-46991?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-46991.
---
Resolution: Fixed

Issue resolved by pull request 45033
[https://github.com/apache/spark/pull/45033]

> Replace IllegalArgumentException by SparkIllegalArgumentException in catalyst
> -
>
> Key: SPARK-46991
> URL: https://issues.apache.org/jira/browse/SPARK-46991
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 4.0.0
>Reporter: Max Gekk
>Assignee: Max Gekk
>Priority: Major
>  Labels: pull-request-available
> Fix For: 4.0.0
>
>
> Replace all IllegalArgumentException by SparkIllegalArgumentException in 
> Catalyst code base, and introduce new legacy error classes with the 
> _LEGACY_ERROR_TEMP_ prefix.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

2024-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47022.
---
Fix Version/s: 3.5.1
   Resolution: Fixed

Issue resolved by pull request 45081
[https://github.com/apache/spark/pull/45081]

> Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency
> -
>
> Key: SPARK-47022
> URL: https://issues.apache.org/jira/browse/SPARK-47022
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.5.1
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

2024-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-47022:
-

Assignee: Dongjoon Hyun

> Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency
> -
>
> Key: SPARK-47022
> URL: https://issues.apache.org/jira/browse/SPARK-47022
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

2024-02-11 Thread ASF GitHub Bot (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-47022:
---
Labels: pull-request-available  (was: )

> Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency
> -
>
> Key: SPARK-47022
> URL: https://issues.apache.org/jira/browse/SPARK-47022
> Project: Spark
>  Issue Type: Bug
>  Components: Connect, Tests
>Affects Versions: 3.5.0
>Reporter: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

2024-02-11 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-47022:
-

 Summary: Fix `connect/client/jvm` to have explicit `commons-lang3` 
test dependency
 Key: SPARK-47022
 URL: https://issues.apache.org/jira/browse/SPARK-47022
 Project: Spark
  Issue Type: Bug
  Components: Connect, Tests
Affects Versions: 3.5.0
Reporter: Dongjoon Hyun






--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47021) Fix `kvstore` module to have explicit `commons-lang3` test dependency

2024-02-11 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47021?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-47021.
---
Fix Version/s: 3.4.3
   3.5.1
   4.0.0
   Resolution: Fixed

Issue resolved by pull request 45080
[https://github.com/apache/spark/pull/45080]

> Fix `kvstore` module to have explicit `commons-lang3` test dependency
> -
>
> Key: SPARK-47021
> URL: https://issues.apache.org/jira/browse/SPARK-47021
> Project: Spark
>  Issue Type: Bug
>  Components: Build, Tests
>Affects Versions: 3.3.0, 3.4.2, 3.5.0, 3.3.4
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>  Labels: pull-request-available
> Fix For: 3.4.3, 3.5.1, 4.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-47019) AQE dynamic cache partitioning causes SortMergeJoin to result in data loss

2024-02-11 Thread Ridvan Appa Bugis (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-47019?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ridvan Appa Bugis resolved SPARK-47019.
---
Fix Version/s: 3.5.1
   Resolution: Fixed

> AQE dynamic cache partitioning causes SortMergeJoin to result in data loss
> --
>
> Key: SPARK-47019
> URL: https://issues.apache.org/jira/browse/SPARK-47019
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.5.0
> Environment: Tested in 3.5.0
> Reproduced on, so far:
>  * kubernetes deployment
>  * docker cluster deployment
> Local Cluster:
>  * master
>  * worker1 (2/2G)
>  * worker2 (1/1G)
>Reporter: Ridvan Appa Bugis
>Priority: Blocker
>  Labels: DAG, caching, correctness, data-loss, 
> dynamic_allocation, inconsistency, partitioning
> Fix For: 3.5.1
>
> Attachments: Screenshot 2024-02-07 at 20.09.44.png, Screenshot 
> 2024-02-07 at 20.10.07.png, eventLogs-app-20240207175940-0023.zip, 
> testdata.zip
>
>
> It seems like we have encountered an issue with Spark AQE's dynamic cache 
> partitioning which causes incorrect *count* output values and data loss.
> A similar issue could not be found, so i am creating this ticket to raise 
> awareness.
>  
> Preconditions:
>  - Setup a cluster as per environment specification
>  - Prepare test data (or a data large enough to trigger read by both 
> executors)
> Steps to reproduce:
>  - Read parent
>  - Self join parent
>  - cache + materialize parent
>  - Join parent with child
>  
> Performing a self-join over a parentDF, then caching + materialising the DF, 
> and then joining it with a childDF results in *incorrect* count value and 
> {*}missing data{*}.
>  
> Performing a *repartition* seems to fix the issue, most probably due to 
> rearrangement of the underlying partitions and statistic update.
>  
> This behaviour is observed over a multi-worker cluster with a job running 2 
> executors (1 per worker), when reading a large enough data file by both 
> executors.
> Not reproducible in local mode.
>  
> Circumvention:
> So far, by disabling 
> _spark.sql.optimizer.canChangeCachedPlanOutputPartitioning_ or performing 
> repartition this can be alleviated, but it is not the fix of the root cause.
>  
> This issue is dangerous considering that data loss is occurring silently and 
> in absence of proper checks can lead to wrong behaviour/results down the 
> line. So we have labeled it as a blocker.
>  
> There seems to be a file-size treshold after which dataloss is observed 
> (possibly implying that it happens when both executors start reading the data 
> file)
>  
> Minimal example:
> {code:java}
> // Read parent
> val parentData = session.read.format("avro").load("/data/shared/test/parent")
> // Self join parent and cache + materialize
> val parent = parentData.join(parentData, Seq("PID")).cache()
> parent.count()
> // Read child
> val child = session.read.format("avro").load("/data/shared/test/child")
> // Basic join
> val resultBasic = child.join(
>   parent,
>   parent("PID") === child("PARENT_ID")
> )
> // Count: 16479 (Wrong)
> println(s"Count no repartition: ${resultBasic.count()}")
> // Repartition parent join
> val resultRepartition = child.join(
>   parent.repartition(),
>   parent("PID") === child("PARENT_ID")
> )
> // Count: 50094 (Correct)
> println(s"Count with repartition: ${resultRepartition.count()}") {code}
>  
> Invalid count-only DAG:
>   !Screenshot 2024-02-07 at 20.10.07.png|width=519,height=853!
> Valid repartition DAG:
> !Screenshot 2024-02-07 at 20.09.44.png|width=368,height=1219!  
>  
> Spark submit for this job:
> {code:java}
> spark-submit 
>   --class ExampleApp 
>   --packages org.apache.spark:spark-avro_2.12:3.5.0 
>   --deploy-mode cluster 
>   --master spark://spark-master:6066 
>   --conf spark.sql.autoBroadcastJoinThreshold=-1  
>   --conf spark.cores.max=3 
>   --driver-cores 1 
>   --driver-memory 1g 
>   --executor-cores 1 
>   --executor-memory 1g 
>   /path/to/test.jar
>  {code}
> The cluster should be setup to the following (worker1(m+e) worker2(e)) as to 
> split the executors onto two workers.
> I have prepared a simple github repository which contains the compilable 
> above example.
> [https://github.com/ridvanappabugis/spark-3.5-issue]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-47019) AQE dynamic cache partitioning causes SortMergeJoin to result in data loss

2024-02-11 Thread Ridvan Appa Bugis (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-47019?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17816393#comment-17816393
 ] 

Ridvan Appa Bugis commented on SPARK-47019:
---

[~bersprockets] thanks for the quick investigation. I will be resolving this 
issue in that case.

Looking forward to the 3.5.1 bump!

> AQE dynamic cache partitioning causes SortMergeJoin to result in data loss
> --
>
> Key: SPARK-47019
> URL: https://issues.apache.org/jira/browse/SPARK-47019
> Project: Spark
>  Issue Type: Bug
>  Components: Optimizer, Spark Core
>Affects Versions: 3.5.0
> Environment: Tested in 3.5.0
> Reproduced on, so far:
>  * kubernetes deployment
>  * docker cluster deployment
> Local Cluster:
>  * master
>  * worker1 (2/2G)
>  * worker2 (1/1G)
>Reporter: Ridvan Appa Bugis
>Priority: Blocker
>  Labels: DAG, caching, correctness, data-loss, 
> dynamic_allocation, inconsistency, partitioning
> Attachments: Screenshot 2024-02-07 at 20.09.44.png, Screenshot 
> 2024-02-07 at 20.10.07.png, eventLogs-app-20240207175940-0023.zip, 
> testdata.zip
>
>
> It seems like we have encountered an issue with Spark AQE's dynamic cache 
> partitioning which causes incorrect *count* output values and data loss.
> A similar issue could not be found, so i am creating this ticket to raise 
> awareness.
>  
> Preconditions:
>  - Setup a cluster as per environment specification
>  - Prepare test data (or a data large enough to trigger read by both 
> executors)
> Steps to reproduce:
>  - Read parent
>  - Self join parent
>  - cache + materialize parent
>  - Join parent with child
>  
> Performing a self-join over a parentDF, then caching + materialising the DF, 
> and then joining it with a childDF results in *incorrect* count value and 
> {*}missing data{*}.
>  
> Performing a *repartition* seems to fix the issue, most probably due to 
> rearrangement of the underlying partitions and statistic update.
>  
> This behaviour is observed over a multi-worker cluster with a job running 2 
> executors (1 per worker), when reading a large enough data file by both 
> executors.
> Not reproducible in local mode.
>  
> Circumvention:
> So far, by disabling 
> _spark.sql.optimizer.canChangeCachedPlanOutputPartitioning_ or performing 
> repartition this can be alleviated, but it is not the fix of the root cause.
>  
> This issue is dangerous considering that data loss is occurring silently and 
> in absence of proper checks can lead to wrong behaviour/results down the 
> line. So we have labeled it as a blocker.
>  
> There seems to be a file-size treshold after which dataloss is observed 
> (possibly implying that it happens when both executors start reading the data 
> file)
>  
> Minimal example:
> {code:java}
> // Read parent
> val parentData = session.read.format("avro").load("/data/shared/test/parent")
> // Self join parent and cache + materialize
> val parent = parentData.join(parentData, Seq("PID")).cache()
> parent.count()
> // Read child
> val child = session.read.format("avro").load("/data/shared/test/child")
> // Basic join
> val resultBasic = child.join(
>   parent,
>   parent("PID") === child("PARENT_ID")
> )
> // Count: 16479 (Wrong)
> println(s"Count no repartition: ${resultBasic.count()}")
> // Repartition parent join
> val resultRepartition = child.join(
>   parent.repartition(),
>   parent("PID") === child("PARENT_ID")
> )
> // Count: 50094 (Correct)
> println(s"Count with repartition: ${resultRepartition.count()}") {code}
>  
> Invalid count-only DAG:
>   !Screenshot 2024-02-07 at 20.10.07.png|width=519,height=853!
> Valid repartition DAG:
> !Screenshot 2024-02-07 at 20.09.44.png|width=368,height=1219!  
>  
> Spark submit for this job:
> {code:java}
> spark-submit 
>   --class ExampleApp 
>   --packages org.apache.spark:spark-avro_2.12:3.5.0 
>   --deploy-mode cluster 
>   --master spark://spark-master:6066 
>   --conf spark.sql.autoBroadcastJoinThreshold=-1  
>   --conf spark.cores.max=3 
>   --driver-cores 1 
>   --driver-memory 1g 
>   --executor-cores 1 
>   --executor-memory 1g 
>   /path/to/test.jar
>  {code}
> The cluster should be setup to the following (worker1(m+e) worker2(e)) as to 
> split the executors onto two workers.
> I have prepared a simple github repository which contains the compilable 
> above example.
> [https://github.com/ridvanappabugis/spark-3.5-issue]
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-46991) Replace IllegalArgumentException by SparkIllegalArgumentException in catalyst

[jira] [Resolved] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

[jira] [Assigned] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

[jira] [Updated] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

[jira] [Created] (SPARK-47022) Fix `connect/client/jvm` to have explicit `commons-lang3` test dependency

[jira] [Resolved] (SPARK-47021) Fix `kvstore` module to have explicit `commons-lang3` test dependency

[jira] [Resolved] (SPARK-47019) AQE dynamic cache partitioning causes SortMergeJoin to result in data loss

[jira] [Commented] (SPARK-47019) AQE dynamic cache partitioning causes SortMergeJoin to result in data loss

8 matches

Site Navigation

Mail list logo

Footer information