Re: [PR] [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match [spark]

via GitHub Fri, 06 Oct 2023 08:09:41 -0700


andygrove commented on PR #43245:
URL: https://github.com/apache/spark/pull/43245#issuecomment-1750858019


   Thanks @Hisoka-X. I tested this out, but the behavior is still different 
from 3.4.0. I don't think we can just use the length of the format to verify 
matches, because some formats have optional components like `[.SSS][XXX]`.
   
   Using the same test file from the issue:
   
   ```csv
   2884-06-24T02:45:51.138
   2884-06-24T02:45:51.138
   2884-06-24T02:45:51.138
   ```
   
   ## 3.4.0
   
   Infers column as `timestamp`.
   
   ```
   scala> val df = spark.read.option("timestampFormat", 
"yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]").option("inferSchema", 
true).csv("/tmp/timestamps.csv")
   df: org.apache.spark.sql.DataFrame = [_c0: timestamp]
   ```
   
   ## This PR
   
   Infers column as `string`.
   
   ```
   scala> val df = spark.read.option("timestampFormat", 
"yyyy-MM-dd'T'HH:mm:ss[.SSS][XXX]").option("inferSchema", 
false).csv("/tmp/timestamps.csv")
   val df: org.apache.spark.sql.DataFrame = [_c0: string]
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-45424][SQL] Fix TimestampFormatter return optional parse results when only prefix match [spark]

Reply via email to