Github user HyukjinKwon commented on the issue:
https://github.com/apache/spark/pull/14509
1. I think this issue is related with the question,
https://issues.apache.org/jira/browse/SPARK-15458. It seems we want to disable
the inference and promote to give schema explicitly in this case.
2. If it is not text data source, I think we should avoid the schema
inference due the some problems listed in the issue above. If it becomes
`isTextSource` I think we will allow the schema inference even if
`spark.sql.streaming.schemaInference` is disabled. I think the comment means it
does not throw an exception but set the inferred schema.
3. If my understanding is correct, `inferSchema` in CSV is different with
`spark.sql.streaming.schemaInference` nor the case when `userSpecifiedSchema`
is not given.
For CSV, we can directly use header as column names, so we have three
choices.
- Schema is given and `inferSchema` is `false`: Don't infer schema and
don't use header
- Schema is not given and `inferSchema` is `false`: Use header as column
names but don't infer schema
- Schema is not given and `inferSchema` is `true`: User header as column
names and also infer schema
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]