garyanaplan opened a new pull request #443:
URL: https://github.com/apache/arrow-rs/pull/443


   # Which issue does this PR close?
   
   Closes #349 .
   
   # Rationale for this change
    
   When writing BOOLEAN data, writing more than 2048 rows of data will
   overflow the hard-coded 256 buffer set for the bit-writer in the
   PlainEncoder. Once this occurs, further attempts to write to the encoder
   fail, because capacity is exceeded but the errors are silently ignored.
   
   This fix improves the error detection and reporting at the point of
   encoding and modifies the logic for bit_writing (BOOLEANS). The
   bit_writer is initially allocated 256 bytes (as at present), then each
   time the capacity is exceeded the capacity is incremented by another
   256 bytes.
   
   This certainly resolves the current problem, but it's not exactly a
   great fix because the capacity of the bit_writer could now grow
   substantially.
   
   Other data types seem to have a more sophisticated mechanism for writing
   data which doesn't involve growing or having a fixed size buffer. It
   would be desirable to make the BOOLEAN type use this same mechanism if
   possible, but that level of change is more intrusive and probably
   requires greater knowledge of the implementation than I possess.
   
   # What changes are included in this PR?
   
   (see above)
   
   # Are there any user-facing changes?
   
   No, although they may encounter the encoding error now which was silently 
ignored previously.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


Reply via email to