[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14118 Merged to master/2.0 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65466/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #65466 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65466/consoleFull)** for PR 14118 at commit [`365cbfb`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #65466 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65466/consoleFull)** for PR 14118 at commit [`365cbfb`](https://github.com/apache/spark/commit/3

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14118 @lw-lin could you address @srowen's comments. Otherwise this is good to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65343/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #65343 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)** for PR 14118 at commit [`d5357f9`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #65343 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)** for PR 14118 at commit [`d5357f9`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 @HyukjinKwon thanks for the information! @srowen yea I still think this is good to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 I support this PR. But just to make sure, I'd like to bring a reference. It seems at least `na.strings` option in `read.csv` in R does as proposed here, ```r bt <- "A,B,C,D

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-12 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14118 @lw-lin just checking that you think this is still good to go? @HyukjinKwon do you have an opinion on the current state? --- If your project is set up for it, you can reply to this email and have yo

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65042/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #65042 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65042/consoleFull)** for PR 14118 at commit [`d5357f9`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #65042 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65042/consoleFull)** for PR 14118 at commit [`d5357f9`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/14118 retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64576/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #64576 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)** for PR 14118 at commit [`d5357f9`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #64576 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)** for PR 14118 at commit [`d5357f9`](https://github.com/apache/spark/commit/d

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 > What if I am writing explicitly an empty string out? Does it become just 1,,2? Yes. It becomes `1,,2` in 2.0, and the same `1,,2` with this patch -- no behavior changes. > Can you

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 I believe we can change the default vale of `nullValue` to `'\u'.toString` in order to express any value is not `null`. I remember this matches with no empty string nor any other string alth

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 @rxin Please let me leave my though why I thought it looks good to me in case it is helpful. Yes, but we should set `nullValue` for writing `null`. So, I think, setting `""` for `nullVa

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14118 What if I am writing explicitly an empty string out? Does it become just 1,,2? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your p

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 @rxin yes all empty (e.g. zero sized string) values become null values once they are read back. E.g. given `test.csv`: ``` 1,,3, ``` `spark.read.csv("test.csv").show()` produc

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64040/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #64040 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64040/consoleFull)** for PR 14118 at commit [`74b4dd8`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14118 Also LGTM other than that major question. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14118 With this change, do all empty (e.g. zero sized string) values become null values once they are read back? --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #64040 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64040/consoleFull)** for PR 14118 at commit [`74b4dd8`](https://github.com/apache/spark/commit/7

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 This change looks good to me - I don't see any other reasons that `null` should not be read for `Boolean`, `TimestampType`, `DateType` and `StringType` inconsistently with other types. --- If

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/14118 CC @HyukjinKwon -- WDYT? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63713/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #63713 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63713/consoleFull)** for PR 14118 at commit [`f58e33d`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #63713 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63713/consoleFull)** for PR 14118 at commit [`f58e33d`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 Jenkins retest this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishe

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 This is ready for review. To summarize, this patch casts user-specified `nullValue`s to `null`s for all supported types including the string type: - this fixes the problem where null date

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread devmanhinton
Github user devmanhinton commented on the issue: https://github.com/apache/spark/pull/14118 Just as a +1 would at least like the option to have `""` autocast to `null` when read in from csv. Helpful for me in production given UDFs skip function application when input is `null` but not

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63493/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #63493 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63493/consoleFull)** for PR 14118 at commit [`f58e33d`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #63493 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63493/consoleFull)** for PR 14118 at commit [`f58e33d`](https://github.com/apache/spark/commit/f

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 BTW, this problem exists in the external CSV data source as well. The root cause of https://github.com/databricks/spark-csv/issues/370 is this issue and also if my understanding is correct, the

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread djk121
Github user djk121 commented on the issue: https://github.com/apache/spark/pull/14118 I'm doing this: val dataframe = sparkSession.read .format("com.databricks.spark.csv") .option("header", "true") .option("nullValue", "null") .schema

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14118 You can specify "com.databricks.spark.csv" as the source. On Fri, Aug 5, 2016 at 11:58 PM, djk121 wrote: > Is there a way to fall back to the old databricks csv library in spark 2

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread djk121
Github user djk121 commented on the issue: https://github.com/apache/spark/pull/14118 Is there a way to fall back to the old databricks csv library in spark 2.0 to work around this? Round-tripping worked there with .option("nullValue", "null"), but I don't see a way to get round-tripp

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 This change looks reasonable to me. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature ena

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 @falaki could you take a look at the lasted update: [[bf01cea] StringType should also respect `nullValue`](https://github.com/apache/spark/pull/14118/commits/bf01cea8273f00386ceef6459f8b8fe2c169e12a)

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature e

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/14118 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63260/ Test PASSed. ---

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #63260 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63260/consoleFull)** for PR 14118 at commit [`bf01cea`](https://github.com/apache/spark/commit/

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-04 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/14118 **[Test build #63260 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63260/consoleFull)** for PR 14118 at commit [`bf01cea`](https://github.com/apache/spark/commit/b

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-04 Thread falaki
Github user falaki commented on the issue: https://github.com/apache/spark/pull/14118 @lw-lin thanks a lot for the clear summary. After seeing some use cases, I think it is better to apply nullValue to all types, including `StringType`. `treatEmptyValuesAsNulls` seems a special ca

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-01 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 Here are some findings as I dug a little: 1. Since https://github.com/databricks/spark-csv/pull/102(Jul, 2015), we would cast `""` as `null` for all types other than strings. For strings, `""

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-30 Thread deanchen
Github user deanchen commented on the issue: https://github.com/apache/spark/pull/14118 Would be great to get a resolution to this. We're running into issues in production attempting to parse csv's with nullable dates. Personally prefer option b for our use case. --- If your project

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread lw-lin
Github user lw-lin commented on the issue: https://github.com/apache/spark/pull/14118 I think @HyukjinKwon has made a good point: it's kind of strange null strings can be written out, but can not be read back as nulls. So for `StringType`: nulls writ

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/14118 IMHO, handling `StringType` at least lets users handling `null`s in roundtrip in writing and reading. CSV writes `null` according to `nullValue` [here](https://github.com/apache/spark/blob/38cf8

[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread rxin
Github user rxin commented on the issue: https://github.com/apache/spark/pull/14118 Thanks for the information. I'm still confused. From an end-user perspective, do we need to handle StringType there? --- If your project is set up for it, you can reply to this email and have your re