[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/23014 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23014#discussion_r232893546 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -101,10 +101,11 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) { String message = "Cannot reserve additional contiguous bytes in the vectorized reader (" + (requiredCapacity >= 0 ? "requested " + requiredCapacity + " bytes" : "integer overflow") + "). As a workaround, you can reduce the vectorized reader batch size, or disable the " + -"vectorized reader. For parquet file format, refer to " + +"vectorized reader, or disable " + SQLConf.BUCKETING_ENABLED().key() + " if you read " + +"from bucket table. For Parquet file format, refer to " + SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().key() + " (default " + SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().defaultValueString() + -") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; for orc file format, " + +") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; for Orc file format, " + --- End diff -- `Orc` is `ORC` BTW :-). --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/23014#discussion_r232885260 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java --- @@ -101,7 +101,8 @@ private void throwUnsupportedException(int requiredCapacity, Throwable cause) { String message = "Cannot reserve additional contiguous bytes in the vectorized reader (" + (requiredCapacity >= 0 ? "requested " + requiredCapacity + " bytes" : "integer overflow") + "). As a workaround, you can reduce the vectorized reader batch size, or disable the " + -"vectorized reader. For parquet file format, refer to " + +"vectorized reader, or disable " + SQLConf.BUCKETING_ENABLED().key() + " if you read " + +"from bucket table. For parquet file format, refer to " + --- End diff -- parquet -> Parquet --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org
[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...
GitHub user wangyum opened a pull request: https://github.com/apache/spark/pull/23014 [MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException ## What changes were proposed in this pull request? It will throw `RuntimeException` when read from bucketed table(about 1.7G per bucket file): ![image](https://user-images.githubusercontent.com/5399861/48346889-8041ce00-e6b7-11e8-83b0-ead83fb15821.png) Default(enable bucket read): ![image](https://user-images.githubusercontent.com/5399861/48347084-2c83b480-e6b8-11e8-913a-9cafc043e9e4.png) Disable bucket read: ![image](https://user-images.githubusercontent.com/5399861/48347099-3a393a00-e6b8-11e8-94af-cb814e1ba277.png) ## How was this patch tested? manual tests You can merge this pull request into a Git repository by running: $ git pull https://github.com/wangyum/spark anotherWorkaround Alternatively you can review and apply these changes as the patch at: https://github.com/apache/spark/pull/23014.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #23014 commit a41551efd667f3ed6c30b0a2b262818e37d00884 Author: Yuming Wang Date: 2018-11-12T12:06:35Z Add new workaround --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org