Max Gekk created SPARK-46890:
--------------------------------

             Summary: CSV fails on a column with default and without enforcing 
schema
                 Key: SPARK-46890
                 URL: https://issues.apache.org/jira/browse/SPARK-46890
             Project: Spark
          Issue Type: Bug
          Components: SQL
    Affects Versions: 4.0.0
            Reporter: Max Gekk
            Assignee: Max Gekk


When we create a table using CSV on an existing file with a header and:
- a column has an default +
- enforceSchema is false - taking into account CSV header

The example below shows the issue:

{code:sql}
CREATE TABLE IF NOT EXISTS products (
  product_id INT,
  name STRING,
  price FLOAT default 0.0,
  quantity INT default 0
)
USING CSV
OPTIONS (
  header 'true',
  inferSchema 'false',
  enforceSchema 'false',
  path '/Users/maximgekk/tmp/products.csv'
);
{code}

The CSV file products.csv:
{code}
product_id,name,price,quantity
1,Apple,0.50,100
2,Banana,0.25,200
3,Orange,0.75,50
{code}

The query fails:

{code:sql}
spark-sql (default)> SELECT price FROM products;
24/01/28 11:43:09 ERROR Executor: Exception in task 0.0 in stage 8.0 (TID 6)
java.lang.IllegalArgumentException: Number of column in CSV header is not equal 
to number of fields in the schema:
 Header length: 4, schema size: 1
CSV file: file:///Users/maximgekk/tmp/products.csv
{code}





--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

Reply via email to