[GitHub] [spark] sadikovi commented on a diff in pull request #37327: [SPARK-39904][SQL] Rename inferDate to prefersDate and clarify semantics of the option in CSV data source

GitBox Tue, 02 Aug 2022 16:10:54 -0700


sadikovi commented on code in PR #37327:
URL: https://github.com/apache/spark/pull/37327#discussion_r936096730



##########
docs/sql-data-sources-csv.md:
##########
@@ -109,9 +109,9 @@ Data source options of CSV can be set via:
     <td>read</td>
   </tr>
   <tr>
-    <td><code>inferDate</code></td> 
+    <td><code>prefersDate</code></td>
     <td>false</td>
-    <td>Whether or not to infer columns that satisfy the 
<code>dateFormat</code> option as <code>Date</code>. Requires 
<code>inferSchema</code> to be <code>true</code>. When <code>false</code>, 
columns with dates will be inferred as <code>String</code> (or as 
<code>Timestamp</code> if it fits the <code>timestampFormat</code>).</td>
+    <td>Attempts to infer string columns as <code>Date</code> if the values 
satisfy <code>dateFormat</code> option and failed to be parsed by the 
respective formatter during schema inference (<code>inferSchema</code>). When 
used in conjunction with a user-provided schema, attempts to parse timestamp 
columns as dates using <code>dateFormat</code> if they fail to conform to 
<code>timestampFormat</code>, the parsed values will be cast to timestamp type 
afterwards.</td>

Review Comment:
   I would still prefer to keep the change for user-provided schema. There is a 
different behaviour when schema inference or providing one. During the former, 
string columns would be converted to dates but in the latter they would be kept 
as timestamp columns.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sadikovi commented on a diff in pull request #37327: [SPARK-39904][SQL] Rename inferDate to prefersDate and clarify semantics of the option in CSV data source

Reply via email to