[GitHub] [arrow] velvia commented on a change in pull request #4815: [DISCUSS] Add strawman proposal for sparseness and data integrity

GitBox Fri, 24 Apr 2020 14:38:15 -0700


velvia commented on a change in pull request #4815:
URL: https://github.com/apache/arrow/pull/4815#discussion_r414877852




##########
File path: format/Message.fbs
##########
@@ -21,10 +21,69 @@ include "Tensor.fbs";
 
 namespace org.apache.arrow.flatbuf;
 
+/// ------------------------------------------------------
+/// Buffer encoding schemes.
+/// -------------------------------------------------------
+
+/// Encoding for buffers representing integer as offsets from a reference 
value.
+/// This encoding uses less bits then the logical type indicates.
+/// It saves space when all values in the buffer can be represented with a
+/// small bit width (e.g. if all values in an int64 column are between -128
+/// and 127, then a bit-width of 8 can be be used) offset from the
+/// reference value.
+table FrameOfReferenceIntEncoding {
+  /// The value that all values in the buffer are relative to.
+  reference_value: long = 0;

Review comment:
       Depending on the size of your batch, a sloped representation would 
result in far smaller arrays, since the delta from a slope is typically much 
smaller and can fit into less bits.  You can still do O(1) access to any 
element, just compute ax + b etc.   Assuming the data is actually increasing, 
of course - otherwise step wise is fine.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[GitHub] [arrow] velvia commented on a change in pull request #4815: [DISCUSS] Add strawman proposal for sparseness and data integrity

Reply via email to