[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...

xwu0226 Mon, 08 Aug 2016 11:48:11 -0700

Github user xwu0226 commented on the issue:

    https://github.com/apache/spark/pull/14509
  
    I see. So the purpose of `spark.sql.streaming.schemaInference` is to 
allow/disallow Spark SQL to infer schema on streaming file datasource. A bit 
confusion here. Many thanks!  
    For CSV file format, I also tried the following on a csv file that has 
header line for column names:
    ```
    scala> spark.conf.set("spark.sql.streaming.schemaInference", true)
    scala> val in = 
spark.readStream.format("csv").load("/Users/xinwu/spark-test/data/csv1")
    in: org.apache.spark.sql.DataFrame = [_c0: string, _c1: string]
    scala> val in = spark.readStream.format("csv").option("header", 
true).load("/Users/xinwu/spark-test/data/csv1")
    in: org.apache.spark.sql.DataFrame = [signal: string, flash: string]
    scala> val in = spark.readStream.format("csv").option("header", 
true).option("inferSchema", true).load("/Users/xinwu/spark-test/data/csv1")
    in: org.apache.spark.sql.DataFrame = [signal: string, flash: int]
    ```
    And the output looks expected on master branch's code. So I think there is 
nothing wrong here so far. It was my misunderstanding of the purpose of 
`spark.sql.streaming.schemaInference`. Unless you think differently, I will 
close this PR.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14509: [SPARK-16924][SQL] - Support option("inferSchema", true)...

Reply via email to