[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-14 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/23014


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23014#discussion_r232893546
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -101,10 +101,11 @@ private void throwUnsupportedException(int 
requiredCapacity, Throwable cause) {
 String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader (" +
 (requiredCapacity >= 0 ? "requested " + requiredCapacity + " 
bytes" : "integer overflow") +
 "). As a workaround, you can reduce the vectorized reader batch 
size, or disable the " +
-"vectorized reader. For parquet file format, refer to " +
+"vectorized reader, or disable " + 
SQLConf.BUCKETING_ENABLED().key() + " if you read " +
+"from bucket table. For Parquet file format, refer to " +
 SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().key() +
 " (default " + 
SQLConf.PARQUET_VECTORIZED_READER_BATCH_SIZE().defaultValueString() +
-") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; 
for orc file format, " +
+") and " + SQLConf.PARQUET_VECTORIZED_READER_ENABLED().key() + "; 
for Orc file format, " +
--- End diff --

`Orc` is `ORC` BTW :-).


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request:

https://github.com/apache/spark/pull/23014#discussion_r232885260
  
--- Diff: 
sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/WritableColumnVector.java
 ---
@@ -101,7 +101,8 @@ private void throwUnsupportedException(int 
requiredCapacity, Throwable cause) {
 String message = "Cannot reserve additional contiguous bytes in the 
vectorized reader (" +
 (requiredCapacity >= 0 ? "requested " + requiredCapacity + " 
bytes" : "integer overflow") +
 "). As a workaround, you can reduce the vectorized reader batch 
size, or disable the " +
-"vectorized reader. For parquet file format, refer to " +
+"vectorized reader, or disable " + 
SQLConf.BUCKETING_ENABLED().key() + " if you read " +
+"from bucket table. For parquet file format, refer to " +
--- End diff --

parquet -> Parquet


---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] spark pull request #23014: [MINOR][SQL] Add disable bucketedRead workaround ...

2018-11-12 Thread wangyum
GitHub user wangyum opened a pull request:

https://github.com/apache/spark/pull/23014

[MINOR][SQL] Add disable bucketedRead workaround when throw RuntimeException

## What changes were proposed in this pull request?
It will throw `RuntimeException` when read from bucketed table(about 1.7G 
per bucket file):

![image](https://user-images.githubusercontent.com/5399861/48346889-8041ce00-e6b7-11e8-83b0-ead83fb15821.png)

Default(enable bucket read):

![image](https://user-images.githubusercontent.com/5399861/48347084-2c83b480-e6b8-11e8-913a-9cafc043e9e4.png)

Disable bucket read:

![image](https://user-images.githubusercontent.com/5399861/48347099-3a393a00-e6b8-11e8-94af-cb814e1ba277.png)


## How was this patch tested?

manual tests


You can merge this pull request into a Git repository by running:

$ git pull https://github.com/wangyum/spark anotherWorkaround

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/23014.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #23014


commit a41551efd667f3ed6c30b0a2b262818e37d00884
Author: Yuming Wang 
Date:   2018-11-12T12:06:35Z

Add new workaround




---

-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org