Hi Tewfik,
It would be good to step back a bit and explain what your data is, and
what the consumer is going to do with it.
Regards
Antoine.
On Fri, 14 Feb 2020 15:08:57 -0800
Tewfik Zeghmi wrote:
> Hi Micah,
>
> The primary language is Python. I'm hoping the that the small overhead of
>
hi Micah and Tewfik,
The functionality is exposed in Python, see e.g.
https://github.com/apache/arrow/blob/apache-arrow-0.16.0/python/pyarrow/tests/test_ipc.py#L685
As Micah said, very small batches aren't necessarily optimized for
compactness (for example buffers are padded to multiples of 8).
I should note, it isn't necessarily just the extra metadata. For single
row values, there is also an overhead for padding requirements. You should
be able to measure this by looking at the size of the buffer you are using
before writing any batches to the stream (I believe the schema is written
e
Hi Micah,
The primary language is Python. I'm hoping the that the small overhead of
metadata is small compared to the schema information.
thank you!
On Fri, Feb 14, 2020 at 3:07 PM Micah Kornfield
wrote:
> Hi Tewfik,
> What language? it is possible to serialize them separately but the right
Hi Tewfik,
What language? it is possible to serialize them separately but the right
hooks might not be exposed in all languages.
There is still going to be a higher overhead for single row values in Arrow
compared to Avro due to metadata requirements.
Thanks,
Micah
On Fri, Feb 14, 2020 at 1:33
Hi,
I have a use case of creating a feature store to serve low latency traffic.
Given a key, we need the ability to save and read a feature vector in a low
latency Key Value store. Serializing an Arrow table with one row is takes
1344 bytes, while the same singular row serialized with AVRO without