[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-18 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged to master/2.0


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65466/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65466 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65466/consoleFull)**
 for PR 14118 at commit 
[`365cbfb`](https://github.com/apache/spark/commit/365cbfb02b58bc1992a635118ffba6b4e371cb06).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65466 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65466/consoleFull)**
 for PR 14118 at commit 
[`365cbfb`](https://github.com/apache/spark/commit/365cbfb02b58bc1992a635118ffba6b4e371cb06).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-15 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14118
  
@lw-lin could you address @srowen's comments. Otherwise this is good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65343/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65343 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65343 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65343/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
@HyukjinKwon thanks for the information!

@srowen yea I still think this is good to go.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-13 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-12 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
I support this PR. But just to make sure, I'd like to bring a reference.

It seems at least `na.strings` option in `read.csv` in R does as proposed 
here,

```r
bt <- "A,B,C,D
10,20,NaN
30,,40
40,30,20
,NA,20"

b<- read.csv(text=bt,check.names = FALSE, na.strings=c("NA","NaN", " "))
b
```

prints

```
   A  B  C  D   
   
1 10 20 NA NA   
   
2 30 NA 40 NA   
   
3 40 30 20 NA   
   
4 NA NA 20 NA
```




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-12 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14118
  
@lw-lin just checking that you think this is still good to go? @HyukjinKwon 
do you have an opinion on the current state?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/65042/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65042 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65042/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #65042 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/65042/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-09-07 Thread hvanhovell
Github user hvanhovell commented on the issue:

https://github.com/apache/spark/pull/14118
  
retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64576/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #64576 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #64576 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64576/consoleFull)**
 for PR 14118 at commit 
[`d5357f9`](https://github.com/apache/spark/commit/d5357f9d784cc277d58fd896738a87a7aff7ba70).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-29 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
> What if I am writing explicitly an empty string out? Does it become just 
1,,2?

Yes. It becomes `1,,2` in 2.0, and the same `1,,2` with this patch -- no 
behavior changes.

> Can you also clarify whether this is behavior changing, or something else?

This patch behaves differently from 2.0 when reading `1,,2` back: (given 
`nullValue` the default value: empty string ""), `1,,2` would be read back as 
`1,,2` in 2.0, but would be read back as `1,[null],[null],2` with this patch.

@rxin ~





---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
I believe we can change the default vale of `nullValue` to 
`'\u'.toString` in order to express any value is not `null`. I remember 
this matches with no empty string nor any other string although I should 
research if this is a right way to express this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
@rxin Please let me leave my though why I thought it looks good to me in 
case it is helpful.

Yes, but we should set `nullValue` for writing `null`. So, I think, setting 
`""` for `nullValue` means treating `""` as null.

For example, if we have the dataframe as below:

```
+--+
| a|
+--+
|   abc|
|  null|
+--+
``` 

with `nullValue` set to `"abc"`, this will writes

```
abc
abc
```

Here, we ended up with no diff between `null` and `abc`. but since users 
set `nullValue` to `abc` for output, users would understand this behaviour.

I mean.. there is no expression for actual `null` so we are explicitly 
giving the representation for this and so, I thought it is okay even if we can 
differentiate `nullValue` from actual `null`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14118
  
What if I am writing explicitly an empty string out? Does it become just 
1,,2?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
@rxin yes all empty (e.g. zero sized string) values become null values once 
they are read back.

E.g. given `test.csv`:
```
1,,3,
```
`spark.read.csv("test.csv").show()` produces:
```
+---++---++
|_c0| _c1|_c2| _c3|
+---++---++
|  1|null|  3|null|
+---++---++
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/64040/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-19 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #64040 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64040/consoleFull)**
 for PR 14118 at commit 
[`74b4dd8`](https://github.com/apache/spark/commit/74b4dd8ff2f79faaf9df50c5a54e6298234137e7).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Also LGTM other than that major question.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14118
  
With this change, do all empty (e.g. zero sized string) values become null 
values once they are read back?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #64040 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/64040/consoleFull)**
 for PR 14118 at commit 
[`74b4dd8`](https://github.com/apache/spark/commit/74b4dd8ff2f79faaf9df50c5a54e6298234137e7).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
This change looks good to me - I don't see any other reasons that `null` 
should not be read for `Boolean`, `TimestampType`, `DateType` and `StringType` 
inconsistently with other types.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-18 Thread srowen
Github user srowen commented on the issue:

https://github.com/apache/spark/pull/14118
  
CC @HyukjinKwon -- WDYT?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63713/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #63713 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63713/consoleFull)**
 for PR 14118 at commit 
[`f58e33d`](https://github.com/apache/spark/commit/f58e33d2c0815c9fd4bae2552609d8b2f3098e13).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #63713 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63713/consoleFull)**
 for PR 14118 at commit 
[`f58e33d`](https://github.com/apache/spark/commit/f58e33d2c0815c9fd4bae2552609d8b2f3098e13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Jenkins retest this please


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
This is ready for review.

To summarize, this patch casts user-specified `nullValue`s to `null`s for 
all supported types including the string type:
- this fixes the problem where null dates, null timestamps, etc were not 
cast to `null` correctly;
- this also casts `nullValue` to `null` for string type, as per @falaki 
@HyukjinKwon and many people's comments. But please note, this is a behavior 
change from 2.0.0.

@rxin @falaki could you take another look?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-12 Thread devmanhinton
Github user devmanhinton commented on the issue:

https://github.com/apache/spark/pull/14118
  
Just as a +1 would at least like the option to have `""` autocast to `null` 
when read in from csv. Helpful for me in production given UDFs skip function 
application when input is `null` but not in the case of an empty string. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63493/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #63493 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63493/consoleFull)**
 for PR 14118 at commit 
[`f58e33d`](https://github.com/apache/spark/commit/f58e33d2c0815c9fd4bae2552609d8b2f3098e13).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-09 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #63493 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63493/consoleFull)**
 for PR 14118 at commit 
[`f58e33d`](https://github.com/apache/spark/commit/f58e33d2c0815c9fd4bae2552609d8b2f3098e13).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
BTW, this problem exists in the external CSV data source as well. The root 
cause of https://github.com/databricks/spark-csv/issues/370 is this issue and 
also if my understanding is correct, the external CSV data source would not 
work in Spark 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread djk121
Github user djk121 commented on the issue:

https://github.com/apache/spark/pull/14118
  
I'm doing this:

val dataframe = sparkSession.read
   .format("com.databricks.spark.csv")
   .option("header", "true")
   .option("nullValue", "null")
   .schema(schema)
   .load(csvPath)

I then take that dataframe and attempt to write it out to parquet like so:

dataframe.write
 .mode(SaveMode.Overwrite)
 .option("compression", "snappy")
 .parquet(outputPath)

When the parquet writes go, I get the same traceback as above.  I can see 
from that traceback that it's org.apache.spark.sql.execution.datasources.csv, 
so for whatever reason, com.databricks.spark.csv isn't being used.  Do I need 
to do something different to force it to be used?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14118
  
You can specify "com.databricks.spark.csv" as the source.


On Fri, Aug 5, 2016 at 11:58 PM, djk121  wrote:

> Is there a way to fall back to the old databricks csv library in spark 2.0
> to work around this? Round-tripping worked there with .option("nullValue",
> "null"), but I don't see a way to get round-tripping working with any 
combo
> of options in 2.0.
>
> —
> You are receiving this because you were mentioned.
> Reply to this email directly, view it on GitHub
> , or 
mute
> the thread
> 

> .
>



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread djk121
Github user djk121 commented on the issue:

https://github.com/apache/spark/pull/14118
  
Is there a way to fall back to the old databricks csv library in spark 2.0 
to work around this? Round-tripping worked there with .option("nullValue", 
"null"), but I don't see a way to get round-tripping working with any combo of 
options in 2.0.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
This change looks reasonable to me.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
@falaki could you take a look at the lasted update: [[bf01cea] StringType 
should also respect 
`nullValue`](https://github.com/apache/spark/pull/14118/commits/bf01cea8273f00386ceef6459f8b8fe2c169e12a)?
 Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/63260/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue:

https://github.com/apache/spark/pull/14118
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #63260 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63260/consoleFull)**
 for PR 14118 at commit 
[`bf01cea`](https://github.com/apache/spark/commit/bf01cea8273f00386ceef6459f8b8fe2c169e12a).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-05 Thread SparkQA
Github user SparkQA commented on the issue:

https://github.com/apache/spark/pull/14118
  
**[Test build #63260 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/63260/consoleFull)**
 for PR 14118 at commit 
[`bf01cea`](https://github.com/apache/spark/commit/bf01cea8273f00386ceef6459f8b8fe2c169e12a).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-04 Thread falaki
Github user falaki commented on the issue:

https://github.com/apache/spark/pull/14118
  
@lw-lin thanks a lot for the clear summary. 
After seeing some use cases, I think it is better to apply nullValue to all 
types, including `StringType`. `treatEmptyValuesAsNulls` seems a special case 
of `nullValue = ""` and adds to confusion of all these options.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-08-01 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Here are some findings as I dug a little:

1. Since https://github.com/databricks/spark-csv/pull/102(Jul, 2015), we 
would cast `""` as `null` for all types other than strings. For strings, `""` 
would still be `""`;

2. Then we had added `treatEmptyValuesAsNulls` in 
https://github.com/databricks/spark-csv/pull/147(Sep, 2015), after which, `""` 
would be `null` when `treatEmptyValuesAsNulls == true` and would be still `""` 
otherwise;

3. Then we had added `nullValue` in 
https://github.com/databricks/spark-csv/pull/224(Dec, 2015), so people could 
specify some string like `"MISSING"` other than the default `""` to represent 
null values.

Then after the above 1.2.3., we have the following, which seems reasonable 
and is backward-compatible:




(default) when nullVale == ""
when nullValue == "MISSING"


(default) when treatEmptyValuesAsNulls == 
false
"" would cast to ""
"" would cast to ""


when treatEmptyValuesAsNulls == true
"" would cast to null
"" would cast to ""



However we don't have this `treatEmptyValuesAsNulls` in Spark 2.0. @falaki 
would it be OK with you if I add it back?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-30 Thread deanchen
Github user deanchen commented on the issue:

https://github.com/apache/spark/pull/14118
  
Would be great to get a resolution to this. We're running into issues in 
production attempting to parse csv's with nullable dates. Personally prefer 
option b for our use case.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread lw-lin
Github user lw-lin commented on the issue:

https://github.com/apache/spark/pull/14118
  
I think @HyukjinKwon has made a good point: it's kind of strange null 
strings can be written out, but can not be read back as nulls.

So for `StringType`:



nulls write & read
consistent with 1.6?


option (a)
null strings can be written out,but can 
NOT be read back as nulls
yes


option (b)
null strings can be written out, and can be 
read back as nulls
NO



@HyukjinKwon and I are somewhat inclined to option(b) because it sounds 
reasonable to end users. @rxin would you make a final decision? Thanks!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue:

https://github.com/apache/spark/pull/14118
  
IMHO, handling `StringType` at least lets users handling `null`s in 
roundtrip in writing and reading. CSV writes `null` according to `nullValue` 
[here](https://github.com/apache/spark/blob/38cf8f2a50068f80350740ac28e31c8accd20634/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVParser.scala#L78)
 but this can't read them back if it does not handle `StringType`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark issue #14118: [SPARK-16462][SPARK-16460][SPARK-15144][SQL] Make CSV ca...

2016-07-10 Thread rxin
Github user rxin commented on the issue:

https://github.com/apache/spark/pull/14118
  
Thanks for the information. I'm still confused. From an end-user 
perspective, do we need to handle StringType there?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org