[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns

Wenchen Fan (Jira) Mon, 13 May 2024 22:08:05 -0700


     [ 
https://issues.apache.org/jira/browse/SPARK-48241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Wenchen Fan reassigned SPARK-48241:
-----------------------------------

    Assignee: Jiayi Liu

> CSV parsing failure with char/varchar type columns
> --------------------------------------------------
>
>                 Key: SPARK-48241
>                 URL: https://issues.apache.org/jira/browse/SPARK-48241
>             Project: Spark
>          Issue Type: Bug
>          Components: SQL
>    Affects Versions: 3.5.1
>            Reporter: Jiayi Liu
>            Assignee: Jiayi Liu
>            Priority: Major
>              Labels: pull-request-available
>             Fix For: 4.0.0
>
>
> CSV table containing char and varchar columns will result in the following 
> error when selecting from the CSV table:
> {code:java}
> java.lang.IllegalArgumentException: requirement failed: requiredSchema 
> (struct<id:int,name:string>) should be the subset of dataSchema 
> (struct<id:int,name:string>).
>     at scala.Predef$.require(Predef.scala:281)
>     at 
> org.apache.spark.sql.catalyst.csv.UnivocityParser.<init>(UnivocityParser.scala:56)
>     at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.$anonfun$buildReader$2(CSVFileFormat.scala:127)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:155)
>     at 
> org.apache.spark.sql.execution.datasources.FileFormat$$anon$1.apply(FileFormat.scala:140)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.org$apache$spark$sql$execution$datasources$FileScanRDD$$anon$$readCurrentFile(FileScanRDD.scala:231)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:293)
>     at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:125){code}
> The reason for the error is that the StringType columns in the dataSchema and 
> requiredSchema of UnivocityParser are not consistent. It is due to the 
> metadata contained in the StringType StructField of the dataSchema, which is 
> missing in the requiredSchema. We need to retain the metadata when resolving 
> schema.
>  



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-48241) CSV parsing failure with char/varchar type columns

Reply via email to