[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 @mmolimar, let's leave this closed since the newer one is open BTW. You will be credited as a primary author of #22367 anyway. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22234 @gatorsmile @HyukjinKwon Please, take a look at #22367 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22234 @MaxGekk Could you take this PR over? I think we need to merge this to Spark 2.4. Users can set the behaviors to the previous one by this new conf `emptyValue`, if needed. Also update the migration guide about the behavior change and explain how to set `emptyValue`. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 Oh no I mean we fixed a bug.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22234 > cc @MaxGekk for a followup @HyukjinKwon Do you mean to update migration guide in master and probably in Spark 2.4? I don't think this should be considered as a bug because current version and previous versions of Spark can read saved CSV files correctly. Yes, for now empty strings are saved as `""` and `null`s as nothing but this is supposed to be to distinguish empty string and null in read. And produced CSV files are valid, and they can be read by any mature CSV libs. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 This is rather a quite corner case (see the elaborated cases in the JIRA [SPARK-17916](https://issues.apache.org/jira/browse/SPARK-17916)) and there's ambiguity to treat this as a bug or a proper behaviour change; however, I don't object if this can be worth enough as something that should be mentioned. cc @MaxGekk for a followup --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22234 Have we documented the behavior changes in the migration guide? If not, can we do it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 From my understanding, yea. The problem here is sounds like ambiguity in empty strings since they can be interpreted as empty strings and also `null`. To me, this is actually rather a bug since we can't distinguish, and don't respect the empty value. If empty strings are written, they should be read as empty strings. This PR proposes an ability explicitly set the empty value to work around the behaviour change. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/22234 Did we introduce any behavior change in https://github.com/apache/spark/pull/21273? Does this PR resolve it? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 Seems okay but I or someone else should take a closer look before getting this in. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95274/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95274 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95274/testReport)** for PR 22234 at commit [`0bcdb2a`](https://github.com/apache/spark/commit/0bcdb2a7f2299add11fd78a551027572f80f1ae7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95271/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95271 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95271/testReport)** for PR 22234 at commit [`bb28db9`](https://github.com/apache/spark/commit/bb28db976fad9316f68a74da4955d08c5b7abaf2). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95270/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95270 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95270/testReport)** for PR 22234 at commit [`3d3f178`](https://github.com/apache/spark/commit/3d3f178a55a8fdb4630916252866a68a98ae17cd). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95274 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95274/testReport)** for PR 22234 at commit [`0bcdb2a`](https://github.com/apache/spark/commit/0bcdb2a7f2299add11fd78a551027572f80f1ae7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95271 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95271/testReport)** for PR 22234 at commit [`bb28db9`](https://github.com/apache/spark/commit/bb28db976fad9316f68a74da4955d08c5b7abaf2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95270 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95270/testReport)** for PR 22234 at commit [`3d3f178`](https://github.com/apache/spark/commit/3d3f178a55a8fdb4630916252866a68a98ae17cd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user mmolimar commented on the issue: https://github.com/apache/spark/pull/22234 @MaxGekk I added what you suggested as well. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/95259/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95259 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95259/testReport)** for PR 22234 at commit [`8b51800`](https://github.com/apache/spark/commit/8b5180021d246ab2fdf0824c01b9f180136837ce). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/22234 **[Test build #95259 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/95259/testReport)** for PR 22234 at commit [`8b51800`](https://github.com/apache/spark/commit/8b5180021d246ab2fdf0824c01b9f180136837ce). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/22234 ok to test --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/22234 Should the new option be taken into account there: https://github.com/apache/spark/blob/b461acb2d90b734393c27fe7b359e2f2d297b8d4/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/csv/CSVDataSource.scala#L94 ? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #22234: [SPARK-25241][SQL] Configurable empty values when readin...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/22234 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org