[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/11947


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-216006137
  
OK I'm going to merge this in master and manually update the commit message.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-216003501
  
LGTM. (Maybe we should not forget, for documentation,  `nullValue` has the 
highest priority than other options such as `nanValue` if the same value is 
given as option)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215984253
  
@HyukjinKwon would be great if you can review this. Thanks.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215984080
  
@falaki can you update the pr description?



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215979899
  
please also provide a way for strings to be converted to null upon reading


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215947150
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57423/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215947147
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215947097
  
**[Test build #57423 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57423/consoleFull)**
 for PR 11947 at commit 
[`6facd26`](https://github.com/apache/spark/commit/6facd262f897883499e0fb46a4304e4b7c5c0c05).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215943795
  
LGTM pending tests.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215943744
  
**[Test build #57423 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57423/consoleFull)**
 for PR 11947 at commit 
[`6facd26`](https://github.com/apache/spark/commit/6facd262f897883499e0fb46a4304e4b7c5c0c05).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread falaki
Github user falaki commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215943595
  
@rxin done.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215940817
  
@falaki sorry this no longer merges cleanly. Do you mind bringing it up to 
date?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215930852
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215930854
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/57394/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215930751
  
**[Test build #57394 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57394/consoleFull)**
 for PR 11947 at commit 
[`698b4b4`](https://github.com/apache/spark/commit/698b4b41baa1ebd5d66ea6242bcb39bcd0887f8b).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215926089
  
**[Test build #57394 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57394/consoleFull)**
 for PR 11947 at commit 
[`698b4b4`](https://github.com/apache/spark/commit/698b4b41baa1ebd5d66ea6242bcb39bcd0887f8b).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215924124
  
As discussed offline, we should just have a single option for setting null, 
another for nan, another for inf and negative inf. Basically just 4.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-27 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215196735
  
i personally would have been happy with a simple single values for nulls 
for all datatypes.

and the usage of that single value should be consistent across reading and 
writing. so when that value is encountered during reading it becomes null 
(except for double/float columns it becomes NaN perhaps), and when writing a 
null values gets written out as this value.

for example when dealing with text files dumped from hive this value is 
typically "\N" across all columns and datatypes. when i read this sort of data 
i simply want every "\N" to become null, and when writing out data that needs 
to be compatible with hive i would like to write out nulls across all columns 
as "\N".

for cascading/scalding this value is typically "" (the empty value). so 
again i would want all empty values to be converted to nulls when reading, and 
when writing i would want every null to be written out as the empty value.

thanks  


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-27 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215194241
  
do these settings roundtrip correctly? say i set doubleNaNValue to "XY", 
and i create a dataframe with a Double.NaN in it, does it get written out 
correctly as XY, and then XY gets read back in correctly as Double.NaN?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-27 Thread koertkuipers
Github user koertkuipers commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-215192562
  
hello!
why is there no stringNullValue?
basically i want for a column with type string to read in all empty strings 
as nulls. this is what the old option "treatEmptyStringsAsNulls" used to do. 
its the natural complement for writing out nulls as empty strings (without this 
data does not roundtrip).
thanks


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-208662536
  
Test PASSed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/9/
Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-208662534
  
Merged build finished. Test PASSed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-208662375
  
**[Test build #9 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/9/consoleFull)**
 for PR 11947 at commit 
[`161a3eb`](https://github.com/apache/spark/commit/161a3ebeb9201d68c97e771def2a77b994e3b217).
 * This patch passes all tests.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-208638423
  
**[Test build #9 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/9/consoleFull)**
 for PR 11947 at commit 
[`161a3eb`](https://github.com/apache/spark/commit/161a3ebeb9201d68c97e771def2a77b994e3b217).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-206023267
  
**[Test build #55033 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55033/consoleFull)**
 for PR 11947 at commit 
[`124873b`](https://github.com/apache/spark/commit/124873bd469b827ef8de11931001ba1186157dbb).
 * This patch **fails to build**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-206023276
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55033/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-206023274
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-206020403
  
**[Test build #55033 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55033/consoleFull)**
 for PR 11947 at commit 
[`124873b`](https://github.com/apache/spark/commit/124873bd469b827ef8de11931001ba1186157dbb).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-205955742
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/55010/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-205955671
  
**[Test build #55010 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55010/consoleFull)**
 for PR 11947 at commit 
[`180a900`](https://github.com/apache/spark/commit/180a9000af49f46ad4d6e0e4b424309c46f3bfa6).
 * This patch **fails Spark unit tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-205955740
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-205951008
  
**[Test build #55010 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55010/consoleFull)**
 for PR 11947 at commit 
[`180a900`](https://github.com/apache/spark/commit/180a9000af49f46ad4d6e0e4b424309c46f3bfa6).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread falaki
Github user falaki commented on a diff in the pull request:

https://github.com/apache/spark/pull/11947#discussion_r58596297
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -177,35 +177,57 @@ private[csv] object CSVTypeCast {
   datum: String,
   castType: DataType,
   nullable: Boolean = true,
-  nullValue: String = ""): Any = {
+  params: CSVOptions = CSVOptions()): Any = {
 
-if (datum == nullValue && nullable && 
(!castType.isInstanceOf[StringType])) {
-  null
-} else {
-  castType match {
-case _: ByteType => datum.toByte
-case _: ShortType => datum.toShort
-case _: IntegerType => datum.toInt
-case _: LongType => datum.toLong
-case _: FloatType => Try(datum.toFloat)
-  
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).floatValue())
-case _: DoubleType => Try(datum.toDouble)
-  
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).doubleValue())
-case _: BooleanType => datum.toBoolean
-case dt: DecimalType =>
+castType match {
+  case _: ByteType => if (datum == params.byteNullValue && nullable) 
null else datum.toByte
+  case _: ShortType => if (datum == params.shortNullValue && nullable) 
null else datum.toShort
+  case _: IntegerType => if (datum == params.integerNullValue && 
nullable) null else datum.toInt
+  case _: LongType => if (datum == params.longNullValue && nullable) 
null else datum.toLong
+  case _: FloatType =>
+if (datum == params.floatNullValue && nullable) {
+  null
+} else if (datum == params.floatNaNValue) {
+  Float.NaN
+} else if (datum == params.floatNegativeInf) {
+  Float.NegativeInfinity
+} else if (datum == params.floatPositiveInf) {
+  Float.PositiveInfinity
+} else {
+  Try(datum.toFloat)
+
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).floatValue())
+}
+  case _: DoubleType =>
+if (datum == params.doubleNullValue && nullable) {
+  null
+} else if (datum == params.doubleNaNValue) {
+  Double.NaN
+} else if (datum == params.doubleNegativeInf) {
+  Double.NegativeInfinity
+} else if (datum == params.doublePositiveInf) {
+  Double.PositiveInfinity
+} else {
+  Try(datum.toDouble)
--- End diff --

I think in this case, in a private and unexposed method, this seem OK. 
There are many other instances of it in `CSVInferSchema`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11947#discussion_r57813530
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala
 ---
@@ -177,35 +177,57 @@ private[csv] object CSVTypeCast {
   datum: String,
   castType: DataType,
   nullable: Boolean = true,
-  nullValue: String = ""): Any = {
+  params: CSVOptions = CSVOptions()): Any = {
 
-if (datum == nullValue && nullable && 
(!castType.isInstanceOf[StringType])) {
-  null
-} else {
-  castType match {
-case _: ByteType => datum.toByte
-case _: ShortType => datum.toShort
-case _: IntegerType => datum.toInt
-case _: LongType => datum.toLong
-case _: FloatType => Try(datum.toFloat)
-  
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).floatValue())
-case _: DoubleType => Try(datum.toDouble)
-  
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).doubleValue())
-case _: BooleanType => datum.toBoolean
-case dt: DecimalType =>
+castType match {
+  case _: ByteType => if (datum == params.byteNullValue && nullable) 
null else datum.toByte
+  case _: ShortType => if (datum == params.shortNullValue && nullable) 
null else datum.toShort
+  case _: IntegerType => if (datum == params.integerNullValue && 
nullable) null else datum.toInt
+  case _: LongType => if (datum == params.longNullValue && nullable) 
null else datum.toLong
+  case _: FloatType =>
+if (datum == params.floatNullValue && nullable) {
+  null
+} else if (datum == params.floatNaNValue) {
+  Float.NaN
+} else if (datum == params.floatNegativeInf) {
+  Float.NegativeInfinity
+} else if (datum == params.floatPositiveInf) {
+  Float.PositiveInfinity
+} else {
+  Try(datum.toFloat)
+
.getOrElse(NumberFormat.getInstance(Locale.getDefault).parse(datum).floatValue())
+}
+  case _: DoubleType =>
+if (datum == params.doubleNullValue && nullable) {
+  null
+} else if (datum == params.doubleNaNValue) {
+  Double.NaN
+} else if (datum == params.doubleNegativeInf) {
+  Double.NegativeInfinity
+} else if (datum == params.doublePositiveInf) {
+  Double.PositiveInfinity
+} else {
+  Try(datum.toDouble)
--- End diff --

(Also, it looks the use of `Try` API is discouraged 
[scala-style-guide#exception](https://github.com/databricks/scala-style-guide#exception).)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-29 Thread cloud-fan
Github user cloud-fan commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-202843052
  
I'm not sure how complicated the use case will be, but it really scares me 
with so many options...

If we decide to do it, I think we should also add these options to JSON, to 
make them consistent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/11947#discussion_r57708347
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -27,6 +27,8 @@ import org.apache.spark.unsafe.types.UTF8String
 
 class CSVTypeCastSuite extends SparkFunSuite {
 
+  private def isNull(v: Any) = assert(v == null)
--- End diff --

nit: `isNull` looks like something that return boolean, how about 
`assertNull`?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-202711498
  
I found both `NaN` and `Infinity` are handled in JSON data source and it 
was fixed in this PR, 
https://github.com/apache/spark/commit/7a9dcbc91d55dbc0cbf4812319bde65f4509b467.

cc @yhuai for reviewing.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-202682552
  
For codes, overall, it looks good to me. However, I am not used to and have 
a lot of experience of dealing with `NaN`, `Inf ` or `-Inf`. If the values can 
be different in many cases, I think it is reasonable. 

Nevertheless, I feel a bit questionable for the options for `null` for each 
type.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11947#discussion_r57657765
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala
 ---
@@ -64,17 +66,21 @@ class CSVTypeCastSuite extends SparkFunSuite {
   }
 
   test("Nullable types are handled") {
-assert(CSVTypeCast.castTo("", IntegerType, nullable = true) == null)
+assert(CSVTypeCast.castTo("", IntegerType, nullable = true, 
CSVOptions()) == null)
--- End diff --

I just noticed that third argument has a default value `CSVOptions()` in 
`CSVTypeCast.castTo()`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11947#discussion_r57656879
  
--- Diff: 
sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala
 ---
@@ -478,4 +479,34 @@ class CSVSuite extends QueryTest with SharedSQLContext 
with SQLTestUtils {
 
 verifyCars(cars, withHeader = false, checkTypes = false)
   }
+
+  test("nulls, NaNs and Infinity values can be parsed") {
+val numbers = sqlContext
+  .read
+  .format("csv")
+  .schema(StructType(List(
+StructField("int", IntegerType, true),
+StructField("long", LongType, true),
+StructField("float", FloatType, true),
+StructField("double", DoubleType, true)
+  )))
+  .options(Map(
+"header" -> "true",
+"mode" -> "DROPMALFORMED",
+"integerNullValue" -> "--",
+"longNullValue" -> "++",
+"floatNullValue" -> "null",
+"doubleNullValue" -> "NULL",
+"floatNaNValue" -> "FNAN",
+"doubleNaNValue" -> "DNAN",
+"floatNegativeInf" -> "-FINF",
+"floatPositiveInf" -> "FINF",
+"doublePositiveInf" -> "DINF",
+"doubleNegativeInf" -> "-DINF"))
+  .load(testFile(numbersFile))
+
+assert(numbers.count() == 8)
+
+
--- End diff --

Maybe remove those double spaces?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/11947#discussion_r57656806
  
--- Diff: 
sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala
 ---
@@ -101,3 +125,14 @@ private[sql] class CSVOptions(
 
   val rowSeparator = "\n"
 }
+
+object CSVOptions {
+
+  /** Used for convenient construction in unit tests */
+  def apply(): CSVOptions = new CSVOptions(Map.empty)
--- End diff --

For me, I a bit hesitated if this `CSVOptions` companion object is only 
used in unit tests.

I'd just use `new CSVOptions(Map("key" -> "value"))` or `new 
CSVOptions(Map.empty)` in tests.
Otherwise, I'd just make this object in the tests if this object is 
required for some reasons or just make a function in tests for convenient 
construction.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread falaki
Github user falaki commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-202570253
  
@cloud-fan would you take a look at this if you have time?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread falaki
Github user falaki commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-202502231
  
ping @HyukjinKwon and @rxin 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-201080503
  
**[Test build #54113 has 
finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54113/consoleFull)**
 for PR 11947 at commit 
[`93ac6bb`](https://github.com/apache/spark/commit/93ac6bb3eb63efb775b48af090a37a6cbe4f30c4).
 * This patch **fails Scala style tests**.
 * This patch merges cleanly.
 * This patch adds no public classes.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-201080508
  
Test FAILed.
Refer to this link for build results (access rights to CI server needed): 
https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/54113/
Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-201080505
  
Merged build finished. Test FAILed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/11947#issuecomment-201080156
  
**[Test build #54113 has 
started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54113/consoleFull)**
 for PR 11947 at commit 
[`93ac6bb`](https://github.com/apache/spark/commit/93ac6bb3eb63efb775b48af090a37a6cbe4f30c4).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread falaki
GitHub user falaki opened a pull request:

https://github.com/apache/spark/pull/11947

[SPARK-14143] Options for parsing NaNs, Infinity and nulls for numeric types

## What changes were proposed in this pull request?

1. Adds following options for parsing type-specfic nulls to CSV data source:
* byteNullValue
* integerNullValue
* shortNullValue
* longNullValue
* floatNullValue
* doubleNullValue
* decimalNullValue

2. Adds following options for parsing NaNs:
* floatNaNValue
* doubleNaNValue

3. And following options for parsing infinity:
* floatNegativeInf
* floatPositiveInf
* doubleNegativeInf
* doublePositiveInf


## How was this patch tested?
`TypeCast.castTo` is unit tested and an end-to-end test is added to 
`CSVSuite`



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/falaki/spark SPARK-14143

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/11947.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #11947


commit 93ac6bb3eb63efb775b48af090a37a6cbe4f30c4
Author: Hossein 
Date:   2016-03-24T23:31:38Z

Added support for null, NaN and Inf options for numeric types




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org