AbinayaJayaprakasam commented on PR #53221: URL: https://github.com/apache/spark/pull/53221#issuecomment-3577528867
**What problem does this solve** Parquet files with `TIMESTAMP(NANOS,false)` exist and are completely unreadable SPARK-40819 which only fixed `TIMESTAMP(NANOS,true)` with a config flag No workaround exists for users Testing procedure : Step 1: Generated a test parquet file <img width="449" height="510" alt="image" src="https://github.com/user-attachments/assets/6a3bfd48-6455-42dc-b7e0-5fc80d7ac171" /> Step 2: Read it with pyspark <img width="667" height="441" alt="image" src="https://github.com/user-attachments/assets/64b7da58-dfbb-40ec-adaf-7a3802c64986" /> Step 3: Before fix : <img width="452" height="315" alt="image" src="https://github.com/user-attachments/assets/156d6542-0b4a-4e99-a87f-895abc39ddc1" /> Step 4: after fix <img width="452" height="346" alt="image" src="https://github.com/user-attachments/assets/ae105c0c-5acc-4587-b218-ad6875a96674" /> **Test coverage** Updated existing test: `ParquetSchemaSuite` -Changed test expectation from "error" to "success with LongType" **Behavior Matrix** | Scenario | Before | After | Breaking? | |--------------------------------|-------------------|-------------------|----------- | | NANOS + nanosAsLong=true | LongType | LongType | No | | NANOS + nanosAsLong=false | ERROR | LongType | **No** (fix!) | | MICROS/MILLIS timestamps | TimestampType | TimestampType | No | -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
