[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 LGTM Merged to master. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89967/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89967/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89967 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89967/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89964/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89964 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89964/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). * This patch **fails due to an unknown error code, -9**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89964 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89964/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89879/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89879 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89879/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89879 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89879/testReport)** for PR 20959 at commit [`d4d9d65`](https://github.com/apache/spark/commit/d4d9d65ce28c4176c085449564c8e5f8ec0b3ff7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89796/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89796 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89796/testReport)** for PR 20959 at commit [`b2c552c`](https://github.com/apache/spark/commit/b2c552c3aef2c6361e669af107330f721398a0bc). * This patch passes all tests. * This patch merges cleanly. * This patch adds the following public classes _(experimental)_: * `class CSVSuite extends QueryTest with SharedSQLContext with SQLTestUtils with TestCsvData ` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89796 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89796/testReport)** for PR 20959 at commit [`b2c552c`](https://github.com/apache/spark/commit/b2c552c3aef2c6361e669af107330f721398a0bc). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89677/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89677 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89677/testReport)** for PR 20959 at commit [`0737bf7`](https://github.com/apache/spark/commit/0737bf7717f6b1f253c9d78013065e7147279607). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89677 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89677/testReport)** for PR 20959 at commit [`0737bf7`](https://github.com/apache/spark/commit/0737bf7717f6b1f253c9d78013065e7147279607). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89664/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89664 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89664/testReport)** for PR 20959 at commit [`257b363`](https://github.com/apache/spark/commit/257b3638ae0db7051dd25affcaf8967a5a29db5d). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89664 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89664/testReport)** for PR 20959 at commit [`257b363`](https://github.com/apache/spark/commit/257b3638ae0db7051dd25affcaf8967a5a29db5d). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20959 @gatorsmile @HyukjinKwon @sujithjay May I ask you to look at the PR again --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89239/ Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89239 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89239/testReport)** for PR 20959 at commit [`d12c2e2`](https://github.com/apache/spark/commit/d12c2e221b446596e2322a4d86ee0fd55a09b6ba). * This patch passes all tests. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89241/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89241 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89241/testReport)** for PR 20959 at commit [`a37bf3b`](https://github.com/apache/spark/commit/a37bf3bc4034f587768903df304e5642a68f87c8). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89241 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89241/testReport)** for PR 20959 at commit [`a37bf3b`](https://github.com/apache/spark/commit/a37bf3bc4034f587768903df304e5642a68f87c8). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89239 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89239/testReport)** for PR 20959 at commit [`d12c2e2`](https://github.com/apache/spark/commit/d12c2e221b446596e2322a4d86ee0fd55a09b6ba). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89220/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89220 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89220/testReport)** for PR 20959 at commit [`1427f73`](https://github.com/apache/spark/commit/1427f73e13b5809ac9622bf5fbd80cb178544864). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89220 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89220/testReport)** for PR 20959 at commit [`1427f73`](https://github.com/apache/spark/commit/1427f73e13b5809ac9622bf5fbd80cb178544864). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89217/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89217 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89217/testReport)** for PR 20959 at commit [`9f26bb7`](https://github.com/apache/spark/commit/9f26bb78a5be28eb38c3fd178f4f88b66b8917ff). * This patch **fails Python style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89217 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89217/testReport)** for PR 20959 at commit [`9f26bb7`](https://github.com/apache/spark/commit/9f26bb78a5be28eb38c3fd178f4f88b66b8917ff). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89208/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89208 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89208/testReport)** for PR 20959 at commit [`3bceb3a`](https://github.com/apache/spark/commit/3bceb3a66064778be2bf568d3ad6382152fe9bd7). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89208 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89208/testReport)** for PR 20959 at commit [`3bceb3a`](https://github.com/apache/spark/commit/3bceb3a66064778be2bf568d3ad6382152fe9bd7). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89154/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89154 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89154/testReport)** for PR 20959 at commit [`d4e815e`](https://github.com/apache/spark/commit/d4e815e8564e4c81933540896a0d85f4c8225ea2). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/89155/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89155 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89155/testReport)** for PR 20959 at commit [`d584cfe`](https://github.com/apache/spark/commit/d584cfe19135c1e742b1ae1f2873b827a32c0d7a). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89155 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89155/testReport)** for PR 20959 at commit [`d584cfe`](https://github.com/apache/spark/commit/d584cfe19135c1e742b1ae1f2873b827a32c0d7a). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #89154 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/89154/testReport)** for PR 20959 at commit [`d4e815e`](https://github.com/apache/spark/commit/d4e815e8564e4c81933540896a0d85f4c8225ea2). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 Let's address @gatorsmile's and mine at https://github.com/apache/spark/pull/20963 too as well. Seems fine otherwise. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/20959 @MaxGekk Thanks for working on this! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 sure, will review this and go ahead. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user rxin commented on the issue: https://github.com/apache/spark/pull/20959 I'm good with having this option given the data @MaxGekk posted. (I haven't reviewed the code - somebody else should do that before merging). `val sampledSchema = spark.read.option("inferSchema", true).csv(ds.sample(false, 0.7)).schema` is a bit clunky compared with an option that applies to all the sources. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 For usability, the workaround I suggested above has more flexibility. For example, we can make different operation (e.g, filter) on schema inference path. They are only few lines. Schema inference is discouraged in production line. I believe, for example, just taking 100 records and use the schema makes more sense. I am not against this option but I am saying I don't feel strong on this for above reasons. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user MaxGekk commented on the issue: https://github.com/apache/spark/pull/20959 @rxin I made an experiment on json files but numbers for csv are almost the same. For example, inferring schema for 50GB json: ``` scala> spark.read.option("samplingRatio", 0.1).json("test.json") ``` took 1.7 minute ``` scala> spark.read.option("samplingRatio", 1.0).json("test.json") ``` took 21.9 minutes. I have looked in a profiler where Spark spends time during schema inferring for 50GB json. At least on my laptop - 75% in json parsing and 18% on disk IO. Of course, the numbers will be different in a cluster if the files would be read from s3 via network. In any way, the samplingRatio option gives us opportunity to find a balance of CPUs load and network/disk IO. @HyukjinKwon The question is not about workaround, it is about usability: 1. For interactive queries, an user doesn't have to write the boilerplate code if there is the option. 2. If the code is used inside of a library, developers don't have to check special cases like if it is json use the samplingRatio option otherwise do sampling manually. Additionally the behavior behind of the option could be improved in the future. For example, it will require less file reads during sampling. It would be easer to do that with the option. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/20959 _if I remember this correctly_, there was a discussion about it a lone ago and, @rxin was not sure how much it improves the perf and if it's worth, which I ended up with agreeing with. @rxin, did I recall this correctly? There's a workaround for it BTW: ```scala val ds = Seq("a", "b", "c", "d").toDS val sampledSchema = spark.read.option("inferSchema", true).csv(ds.sample(false, 0.7)).schema spark.read.schema(sampledSchema).csv(ds) ``` --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88827/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #88827 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88827/testReport)** for PR 20959 at commit [`b6a7cc8`](https://github.com/apache/spark/commit/b6a7cc852bb7f27a94eb268f4f3e5e35f97538bd). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #88827 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88827/testReport)** for PR 20959 at commit [`b6a7cc8`](https://github.com/apache/spark/commit/b6a7cc852bb7f27a94eb268f4f3e5e35f97538bd). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88818/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #88818 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88818/testReport)** for PR 20959 at commit [`91c57cf`](https://github.com/apache/spark/commit/91c57cf898d7f62ee3c9acff63607fcb102bd317). * This patch **fails Spark unit tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #88818 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88818/testReport)** for PR 20959 at commit [`91c57cf`](https://github.com/apache/spark/commit/91c57cf898d7f62ee3c9acff63607fcb102bd317). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #88817 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88817/testReport)** for PR 20959 at commit [`ba12fca`](https://github.com/apache/spark/commit/ba12fca8514a2cb620224e859e8a8b5cc208bf31). * This patch **fails Scala style tests**. * This patch merges cleanly. * This patch adds no public classes. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/88817/ Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/20959 **[Test build #88817 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/88817/testReport)** for PR 20959 at commit [`ba12fca`](https://github.com/apache/spark/commit/ba12fca8514a2cb620224e859e8a8b5cc208bf31). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark issue #20959: [SPARK-23846][SQL] The samplingRatio option for CSV data...
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/20959 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org