[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20648 I think at least we should update the document for this behavior of csv reader. --- - To unsubscribe, e-mail:

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 _To me_ I have been roughly thinking that we should better match it to R's read.csv and explicitly document this. I believe this is a good reference our CSV has resembled so far. BTW,

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 > Yup, +1 for starting this by disallowing but up to my knowledge R's read.csv allows then the legnth of tokens are shorter then its schema, putting nulls (or NA) into missing fields, as a valid

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 @HyukjinKwon @cloud-fan Thanks for the comment! Yes, I agreed we need to keep the CSV's behavior. I will check how much we can clean up with it. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20648 > allows the length of tokens are shorter than its schema, putting nulls (or NA) into missing fields Actually I also recalled this is a valid case for csv, and I remember that we did

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 +1 for disallowing it anyway if it was Wenchen's opinion too. Please go ahead. Will help double check anyway. --- - To

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 Yup, +1 for starting this by disallowing but up to my knowledge R's read.csv allows then the legnth of tokens are shorter then its schema, putting nulls (or NA) into missing fields, as a valid

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 @HyukjinKwon From the document of `DataFrameReader.csv`, the behavior of CSV reader isn't consistent with the document. ``` `PERMISSIVE` : sets other fields to `null` when it meets a

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 Yup, it's unsupported in JSON but CSV supports it. Do you mean to disallow CSV too, or simply clean up JSON code path? ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 I'll close this PR and create another PR to refactor JSON parser and related codes. Thanks @cloud-fan and @HyukjinKwon. --- - To

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 According to offline discussion with @cloud-fan, partial results are not supported at all now. We should refactor the code to clear it and reduce confusion. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 How about we start this by disallowing the partial results at all, documenting the behaviour and matching the behaviour to R's `read.csv(...)` in case of CSV (in terms of which case is an error

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-23 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/20648 I think we do have an intention to return partial result, but there is no strict definition for it, and seems there is no public document, so it's kind of a new feature. Since this is a

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 Yes, thanks @HyukjinKwon for checking the behavior. If we look at the codes of JSON parser, we will find many places indicating the expectation of availability of partial results. For

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-22 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 I was just double checking the current status for both CSV and JSON: Seems CSV fills up the partial results with an exception (which is caught by permissive mode with the corrupt record

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-22 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 From the codes, looks like there is an intention to have partial results when failing to parse the documents. This patch makes the partial results. But this should be considered as behavior change,

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87606/ Test PASSed. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87606 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87606/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87606 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87606/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/1001/

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 `FileBasedDataSourceSuite` is still flaky. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87603/ Test FAILed. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87603 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87603/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87603 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87603/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/999/

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87600/ Test FAILed. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87600 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87600/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/997/

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87600 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87600/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87586/ Test FAILed. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87586 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87586/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/987/

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87586 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87586/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20648 Will check this one within tomorrow .. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/87581/ Test FAILed. ---

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87581/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20648 **[Test build #87581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/87581/testReport)** for PR 20648 at commit

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/testing-k8s-prb-make-spark-distribution/983/

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20648 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #20648: [SPARK-23448][SQL] JSON parser should return partial row...

2018-02-21 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/20648 cc @HyukjinKwon Can you check out if this behavior makes sense to you? --- - To unsubscribe, e-mail: