MaxGekk commented on code in PR #56248:
URL: https://github.com/apache/spark/pull/56248#discussion_r3387615991


##########
docs/sql-data-sources-json.md:
##########
@@ -201,6 +201,12 @@ Data source options of JSON can be set via:
     <td>Sets the string that indicates a timestamp without timezone format. 
Custom date formats follow the formats at <a 
href="https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html";>Datetime
 Patterns</a>. This applies to timestamp without timezone type, note that 
zone-offset and time-zone components are not supported when writing or reading 
this data type.</td>
     <td>read/write</td>
   </tr>
+  <tr>
+    <td><code>inferTimestamp</code></td>
+    <td><code>false</code></td>
+    <td>Allows inferring of <code>TimestampType</code> and 
<code>TimestampNTZType</code> from strings that match the timestamp patterns 
defined by the <code>timestampFormat</code> and <code>timestampNTZFormat</code> 
options respectively. JSON built-in functions ignore this option.</td>

Review Comment:
   The sentence "JSON built-in functions ignore this option." is incorrect — 
`schema_of_json` does respect `inferTimestamp`.
   
   The migration guide (`sql-migration-guide.md:320`) is explicit: _"JSON 
datasource and JSON function `schema_of_json` infer TimestampType from string 
values … Set the JSON option `inferTimestamp` to `true` to enable such type 
inference."_
   
   The code path confirms it: `SchemaOfJsonEvaluator` creates `new 
JSONOptions(options, "UTC")` 
([`JsonExpressionEvalUtils.scala:209`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/json/JsonExpressionEvalUtils.scala#L209))
 with user-supplied options, then `jsonInferSchema.inferField(parser)` at line 
215 checks `options.inferTimestamp` 
([`JsonInferSchema.scala:179`](https://github.com/apache/spark/blob/master/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/json/JsonInferSchema.scala#L179))
 to produce `TimestampType`/`TimestampNTZType`.
   
   The other options with this disclaimer (`multiLine`, `encoding`, `lineSep`) 
control file-level I/O, which is genuinely irrelevant for a string-based 
function. `inferTimestamp` controls type inference logic, which 
`schema_of_json` does perform.
   
   Note also that the Scaladoc for `inferTimestamp` in `JSONOptions.scala` 
(lines 200–203) does not include this sentence.
   
   Suggestion: remove the sentence.
   ```suggestion
       <td>Allows inferring of <code>TimestampType</code> and 
<code>TimestampNTZType</code> from strings that match the timestamp patterns 
defined by the <code>timestampFormat</code> and <code>timestampNTZFormat</code> 
options respectively.</td>
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to