Re: [PR] [SPARK-46890][SQL] Fix CSV parsing bug with existence default values and column pruning [spark]

via GitHub Wed, 31 Jan 2024 09:48:16 -0800


MaxGekk commented on PR #44939:
URL: https://github.com/apache/spark/pull/44939#issuecomment-1919608098


   @dtenedor Could you remove the first part of the test:
   ```scala
     test("SPARK-46890: CSV fails on a column with default and without 
enforcing schema") {
       withTable("Products") {
         spark.sql(
           s"""
              |CREATE TABLE IF NOT EXISTS Products (
              |  product_id INT,
              |  name STRING,
              |  price FLOAT default 0.0,
              |  quantity INT default 0
              |)
              |USING CSV
              |OPTIONS (
              |  header 'true',
              |  inferSchema 'false',
              |  enforceSchema 'false',
              |  path "${testFile(productsFile)}"
              |)
          """.stripMargin)
         checkAnswer(
           sql("SELECT price FROM Products"),
           Seq(
             Row(0.50),
             Row(0.25),
             Row(0.75)))
       }
     }
   ```
   Set a breakpoint inside of the main constructor of `CSVHeaderChecker`:
   <img width="952" alt="Screenshot 2024-01-31 at 20 43 42" 
src="https://github.com/apache/spark/assets/1580697/47f0e377-7ab7-4d89-b68c-b240b81ae036";>
   you should see where we hit the breakpoint from:
   
   <img width="967" alt="Screenshot 2024-01-31 at 20 45 49" 
src="https://github.com/apache/spark/assets/1580697/09bfe2cf-a3ae-473a-9c93-35bb54a0f924";>
   
   The `CSVFileFormat` is v1 implementation. It seems we fallback from V2 to V1 
datasource implementation for some reasons.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-46890][SQL] Fix CSV parsing bug with existence default values and column pruning [spark]

Reply via email to