[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/11947 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-216006137 OK I'm going to merge this in master and manually update the commit message. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-216003501 LGTM. (Maybe we should not forget, for documentation, `nullValue` has the highest priority than other options such as `nanValue` if the same value is given as

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215984253 @HyukjinKwon would be great if you can review this. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215984080 @falaki can you update the pr description? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread koertkuipers
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215979899 please also provide a way for strings to be converted to null upon reading --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215947150 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215947147 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215947097 **[Test build #57423 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57423/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215943795 LGTM pending tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215943744 **[Test build #57423 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57423/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread falaki
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215943595 @rxin done. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-30 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215940817 @falaki sorry this no longer merges cleanly. Do you mind bringing it up to date? --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215930852 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215930854 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215930751 **[Test build #57394 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57394/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215926089 **[Test build #57394 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/57394/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-29 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215924124 As discussed offline, we should just have a single option for setting null, another for nan, another for inf and negative inf. Basically just 4. --- If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-27 Thread koertkuipers
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215196735 i personally would have been happy with a simple single values for nulls for all datatypes. and the usage of that single value should be consistent across

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-27 Thread koertkuipers
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215194241 do these settings roundtrip correctly? say i set doubleNaNValue to "XY", and i create a dataframe with a Double.NaN in it, does it get written out correctly as

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-27 Thread koertkuipers
Github user koertkuipers commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-215192562 hello! why is there no stringNullValue? basically i want for a column with type string to read in all empty strings as nulls. this is what the old option

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-208662536 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-208662534 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-208662375 **[Test build #9 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/9/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-208638423 **[Test build #9 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/9/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-206023267 **[Test build #55033 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55033/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-206023276 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-206023274 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-206020403 **[Test build #55033 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55033/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-205955742 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-205955671 **[Test build #55010 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55010/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-205955740 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-205951008 **[Test build #55010 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/55010/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-04-05 Thread falaki
Github user falaki commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r58596297 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -177,35 +177,57 @@ private[csv] object

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-29 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57813530 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVInferSchema.scala --- @@ -177,35 +177,57 @@ private[csv] object

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-29 Thread cloud-fan
Github user cloud-fan commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-202843052 I'm not sure how complicated the use case will be, but it really scares me with so many options... If we decide to do it, I think we should also add these

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-29 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57708347 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala --- @@ -27,6 +27,8 @@ import

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-202711498 I found both `NaN` and `Infinity` are handled in JSON data source and it was fixed in this PR,

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-202682552 For codes, overall, it looks good to me. However, I am not used to and have a lot of experience of dealing with `NaN`, `Inf ` or `-Inf`. If the values can be

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57657765 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVTypeCastSuite.scala --- @@ -64,17 +66,21 @@ class CSVTypeCastSuite

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57656879 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala --- @@ -478,4 +479,34 @@ class CSVSuite extends

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/11947#discussion_r57656806 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVOptions.scala --- @@ -101,3 +125,14 @@ private[sql] class

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread falaki
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-202570253 @cloud-fan would you take a look at this if you have time? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-28 Thread falaki
Github user falaki commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-202502231 ping @HyukjinKwon and @rxin --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-201080503 **[Test build #54113 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54113/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-201080508 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-201080505 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/11947#issuecomment-201080156 **[Test build #54113 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/54113/consoleFull)** for PR 11947 at commit

[GitHub] spark pull request: [SPARK-14143] Options for parsing NaNs, Infini...

2016-03-24 Thread falaki
GitHub user falaki opened a pull request: https://github.com/apache/spark/pull/11947 [SPARK-14143] Options for parsing NaNs, Infinity and nulls for numeric types ## What changes were proposed in this pull request? 1. Adds following options for parsing type-specfic nulls to