[jira] [Updated] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers

2020-03-18 Thread mahesh kumar behera (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

mahesh kumar behera updated HIVE-23034:
---
Resolution: Fixed
Status: Resolved  (was: Patch Available)

[^HIVE-23034.01.patch] committed to master. Thanks [~ShubhamChaurasia] for 
fixing it and [~thejas] for review.

> Arrow serializer should not keep the reference of arrow offset and validity 
> buffers
> ---
>
> Key: HIVE-23034
> URL: https://issues.apache.org/jira/browse/HIVE-23034
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23034.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, a part of writeList() method in arrow serializer is implemented 
> like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
> int nextOffset = 0;
> for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>   int selectedIndex = rowIndex;
>   if (vectorizedRowBatch.selectedInUse) {
> selectedIndex = vectorizedRowBatch.selected[rowIndex];
>   }
>   if (hiveVector.isNull[selectedIndex]) {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>   } else {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
> nextOffset += (int) hiveVector.lengths[selectedIndex];
> arrowVector.setNotNull(rowIndex);
>   }
> }
> offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and 
> offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
> the offset and validity buffers when a threshold is crossed, updates the 
> references internally and also releases the old buffers (which decrements the 
> buffer reference count). Now the reference which we obtained in 1) becomes 
> obsolete. Furthermore if try to read or write old buffer, we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
>   at 
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
>   at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( 
> {{arrowVector.getOffsetBuffer()}} ) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
> thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers

2020-03-17 Thread Shubham Chaurasia (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shubham Chaurasia updated HIVE-23034:
-
Attachment: HIVE-23034.01.patch
Status: Patch Available  (was: Open)

> Arrow serializer should not keep the reference of arrow offset and validity 
> buffers
> ---
>
> Key: HIVE-23034
> URL: https://issues.apache.org/jira/browse/HIVE-23034
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
> Attachments: HIVE-23034.01.patch
>
>  Time Spent: 10m
>  Remaining Estimate: 0h
>
> Currently, a part of writeList() method in arrow serializer is implemented 
> like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
> int nextOffset = 0;
> for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>   int selectedIndex = rowIndex;
>   if (vectorizedRowBatch.selectedInUse) {
> selectedIndex = vectorizedRowBatch.selected[rowIndex];
>   }
>   if (hiveVector.isNull[selectedIndex]) {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>   } else {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
> nextOffset += (int) hiveVector.lengths[selectedIndex];
> arrowVector.setNotNull(rowIndex);
>   }
> }
> offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and 
> offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
> the offset and validity buffers when a threshold is crossed, updates the 
> references internally and also releases the old buffers (which decrements the 
> buffer reference count). Now the reference which we obtained in 1) becomes 
> obsolete. Furthermore if try to read or write old buffer, we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
>   at 
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
>   at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( 
> {{arrowVector.getOffsetBuffer()}} ) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
> thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Updated] (HIVE-23034) Arrow serializer should not keep the reference of arrow offset and validity buffers

2020-03-17 Thread ASF GitHub Bot (Jira)


 [ 
https://issues.apache.org/jira/browse/HIVE-23034?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ASF GitHub Bot updated HIVE-23034:
--
Labels: pull-request-available  (was: )

> Arrow serializer should not keep the reference of arrow offset and validity 
> buffers
> ---
>
> Key: HIVE-23034
> URL: https://issues.apache.org/jira/browse/HIVE-23034
> Project: Hive
>  Issue Type: Bug
>  Components: llap, Serializers/Deserializers
>Reporter: Shubham Chaurasia
>Assignee: Shubham Chaurasia
>Priority: Major
>  Labels: pull-request-available
>
> Currently, a part of writeList() method in arrow serializer is implemented 
> like - 
> {code:java}
> final ArrowBuf offsetBuffer = arrowVector.getOffsetBuffer();
> int nextOffset = 0;
> for (int rowIndex = 0; rowIndex < size; rowIndex++) {
>   int selectedIndex = rowIndex;
>   if (vectorizedRowBatch.selectedInUse) {
> selectedIndex = vectorizedRowBatch.selected[rowIndex];
>   }
>   if (hiveVector.isNull[selectedIndex]) {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
>   } else {
> offsetBuffer.setInt(rowIndex * OFFSET_WIDTH, nextOffset);
> nextOffset += (int) hiveVector.lengths[selectedIndex];
> arrowVector.setNotNull(rowIndex);
>   }
> }
> offsetBuffer.setInt(size * OFFSET_WIDTH, nextOffset);
> {code}
> 1) Here we obtain a reference to {{final ArrowBuf offsetBuffer = 
> arrowVector.getOffsetBuffer();}} and keep updating the arrow vector and 
> offset vector. 
> Problem - 
> {{arrowVector.setNotNull(rowIndex)}} keeps checking the index and reallocates 
> the offset and validity buffers when a threshold is crossed, updates the 
> references internally and also releases the old buffers (which decrements the 
> buffer reference count). Now the reference which we obtained in 1) becomes 
> obsolete. Furthermore if try to read or write old buffer, we see - 
> {code:java}
> Caused by: io.netty.util.IllegalReferenceCountException: refCnt: 0
>   at 
> io.netty.buffer.AbstractByteBuf.ensureAccessible(AbstractByteBuf.java:1413)
>   at io.netty.buffer.ArrowBuf.checkIndexD(ArrowBuf.java:131)
>   at io.netty.buffer.ArrowBuf.chk(ArrowBuf.java:162)
>   at io.netty.buffer.ArrowBuf.setInt(ArrowBuf.java:656)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:432)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeStruct(Serializer.java:352)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:288)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.writeList(Serializer.java:419)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.write(Serializer.java:285)
>   at 
> org.apache.hadoop.hive.ql.io.arrow.Serializer.serializeBatch(Serializer.java:205)
> {code}
>  
> Solution - 
> This can be fixed by getting the buffers each time ( 
> {{arrowVector.getOffsetBuffer()}} ) we want to update them. 
> In our internal tests, this is very frequently seen on arrow 0.8.0 but not on 
> 0.10.0 but should be handled the same way for 0.10.0 too as it does the same 
> thing.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)