[jira] [Updated] (SPARK-42439) Job description in v2 FileWrites can have the wrong committer

2023-09-12 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated SPARK-42439:
---
Labels: bug pull-request-available  (was: bug)

> Job description in v2 FileWrites can have the wrong committer
> -
>
> Key: SPARK-42439
> URL: https://issues.apache.org/jira/browse/SPARK-42439
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Lorenzo Martini
>Priority: Minor
>  Labels: bug, pull-request-available
>
> There is a difference in behavior between v1 writes and v2 writes in the 
> order of events happening when configuring the file writer and the committer.
> v1:
>  # writer.prepareWrite()
>  # committer.setupJob()
> v2:
>  # committer.setupJob()
>  # writer.prepareWrite()
>  
> This is because the `prepareWrite()` call (that is the one performing the 
> call `
> job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])`)
> happens as part of the `createWriteJobDescription` which is `lazy val` in the 
> `toBatch` call and therefore is evaluated after the `committer.setupJob` at 
> the end of the `toBatch`
> This causes issues when evaluating the committer as some elements might be 
> missing, for example the aforementioned output format class not being set, 
> causing the committer being set up as generic write instead of parquet write.
>  
> The fix is very simple and it is to make the `createJobDescription` call 
> non-lazy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42439) Job description in v2 FileWrites can have the wrong committer

2023-02-17 Thread Lorenzo Martini (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorenzo Martini updated SPARK-42439:

Issue Type: Bug  (was: Improvement)

> Job description in v2 FileWrites can have the wrong committer
> -
>
> Key: SPARK-42439
> URL: https://issues.apache.org/jira/browse/SPARK-42439
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Lorenzo Martini
>Priority: Minor
>  Labels: bug
>
> There is a difference in behavior between v1 writes and v2 writes in the 
> order of events happening when configuring the file writer and the committer.
> v1:
>  # writer.prepareWrite()
>  # committer.setupJob()
> v2:
>  # committer.setupJob()
>  # writer.prepareWrite()
>  
> This is because the `prepareWrite()` call (that is the one performing the 
> call `
> job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])`)
> happens as part of the `createWriteJobDescription` which is `lazy val` in the 
> `toBatch` call and therefore is evaluated after the `committer.setupJob` at 
> the end of the `toBatch`
> This causes issues when evaluating the committer as some elements might be 
> missing, for example the aforementioned output format class not being set, 
> causing the committer being set up as generic write instead of parquet write.
>  
> The fix is very simple and it is to make the `createJobDescription` call 
> non-lazy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-42439) Job description in v2 FileWrites can have the wrong committer

2023-02-14 Thread Lorenzo Martini (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-42439?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lorenzo Martini updated SPARK-42439:

Labels: bug  (was: )

> Job description in v2 FileWrites can have the wrong committer
> -
>
> Key: SPARK-42439
> URL: https://issues.apache.org/jira/browse/SPARK-42439
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.3.1
>Reporter: Lorenzo Martini
>Priority: Minor
>  Labels: bug
>
> There is a difference in behavior between v1 writes and v2 writes in the 
> order of events happening when configuring the file writer and the committer.
> v1:
>  # writer.prepareWrite()
>  # committer.setupJob()
> v2:
>  # committer.setupJob()
>  # writer.prepareWrite()
>  
> This is because the `prepareWrite()` call (that is the one performing the 
> call `
> job.setOutputFormatClass(classOf[ParquetOutputFormat[Row]])`)
> happens as part of the `createWriteJobDescription` which is `lazy val` in the 
> `toBatch` call and therefore is evaluated after the `committer.setupJob` at 
> the end of the `toBatch`
> This causes issues when evaluating the committer as some elements might be 
> missing, for example the aforementioned output format class not being set, 
> causing the committer being set up as generic write instead of parquet write.
>  
> The fix is very simple and it is to make the `createJobDescription` call 
> non-lazy



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org