[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...

HyukjinKwon Mon, 08 Aug 2016 01:17:35 -0700

Github user HyukjinKwon commented on the issue:

    https://github.com/apache/spark/pull/14509
  
    1. I think this issue is related with the question, 
https://issues.apache.org/jira/browse/SPARK-15458. It seems we want to disable 
the inference and promote to give schema explicitly in this case.
    
    2. If it is not text data source, I think we should avoid the schema 
inference due the some problems listed in the issue above. If it becomes 
`isTextSource` I think we will allow the schema inference even if 
`spark.sql.streaming.schemaInference` is disabled. I think the comment means it 
does not throw an exception but set the inferred schema.
    
    3. If my understanding is correct, `inferSchema` in CSV is different with 
`spark.sql.streaming.schemaInference` nor the case when `userSpecifiedSchema` 
is not given.
    
    For CSV, we can directly use header as column names, so we have three 
choices.
    
    - Schema is given and `inferSchema` is `false`: Don't infer schema and 
don't use header
    - Schema is not given and `inferSchema` is `false`: Use header as column 
names but don't infer schema
    - Schema is not given and `inferSchema` is `true`: User header as column 
names and also infer schema




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...

Reply via email to