This is an automated email from the ASF dual-hosted git repository. jackylk pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/carbondata.git
The following commit(s) were added to refs/heads/master by this push: new eff2977 [CARBONDATA-3519]Made optimizations in write step to avoid unnecessary memory blk allocation/free eff2977 is described below commit eff297769bc0513cdf4e7c3d2fd3f57c5ca60df1 Author: Venu Reddy <venugopalred...@huawei.com> AuthorDate: Fri Dec 20 18:27:34 2019 +0530 [CARBONDATA-3519]Made optimizations in write step to avoid unnecessary memory blk allocation/free Issue-1: Context: For a string column with local dictionary enabled, a column page of UnsafeFixLengthColumnPage with datatype DataTypes.BYTE_ARRAY is created for encodedPage along with regular actualPage of UnsafeVarLengthColumnPage. We have capacity field in the UnsafeFixLengthColumnPage. And this field indicates the capacity of allocated memoryBlock for the page. ensureMemory() method gets called while adding rows to check if totalLength + requestSize > capacity to allocate a new memoryBlock. If there is no room to add the next row, allocates a new block, copy the old context(prev rows) and free the old memoryBlock. Problem: While, UnsafeFixLengthColumnPage with with datatype DataTypes.BYTE_ARRAY is created for encodedPage, we have not assigned the capacity field with allocated memory block size. Hence, for each add row to tablePage, ensureMemory() check always fails, allocates a new column page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This allocation of new memoryBlock and free of old memoryBlock happens for each row row addition for the string columns with local dictionary. Solution: We can set the capacity to allocated size. Issue-2: Context: InVarLengthColumnPageBase, we have a rowOffset column page of UnsafeFixLengthColumnPage of datatype INT to maintain the data offset to each row of variable length columns. This rowOffset page allocates to be size of page. Problem: If we have 10 rows in the page, we need 11 rows for its rowOffset page. Because we always keep 0 as offset to 1st row. So an additional row is required for rowOffset page. Otherwise, ensureMemory() check always fails for the last row(10th row in this case) of data and allocates a new rowOffset page memoryBlock, copy the old context(prev rows) and free the old memoryBlock. This can happen for the string columns with local dictionary, direct dictionary columns, global disctionary columns. Solution: We can allocate for 1 additional row of rowOffset page. This closes #3524 --- .../carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java | 4 ++-- .../carbondata/core/datastore/page/VarLengthColumnPageBase.java | 2 +- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java index 13c382f..4ef5e5d 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/UnsafeFixLengthColumnPage.java @@ -92,8 +92,8 @@ public class UnsafeFixLengthColumnPage extends ColumnPage { this.eachRowSize = eachRowSize; totalLength = 0; if (columnPageEncoderMeta.getStoreDataType() == DataTypes.BYTE_ARRAY) { - memoryBlock = - UnsafeMemoryManager.allocateMemoryWithRetry(taskId, (long) pageSize * eachRowSize); + capacity = pageSize * eachRowSize; + memoryBlock = UnsafeMemoryManager.allocateMemoryWithRetry(taskId, capacity); baseAddress = memoryBlock.getBaseObject(); baseOffset = memoryBlock.getBaseOffset(); } diff --git a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java index 1b1f8f6..01f1d55 100644 --- a/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java +++ b/core/src/main/java/org/apache/carbondata/core/datastore/page/VarLengthColumnPageBase.java @@ -73,7 +73,7 @@ public abstract class VarLengthColumnPageBase extends ColumnPage { try { rowOffset = ColumnPage.newPage( new ColumnPageEncoderMeta(spec, DataTypes.INT, columnPageEncoderMeta.getCompressorName()), - pageSize); + pageSize + 1); } catch (MemoryException e) { throw new RuntimeException(e); }