Hello,

I've posted a couple of things asking about Arrow over the last few weeks
and I've come across another thing that I'm hoping you can help me
understand better.

I have a workload that writes a lot of data at once with a single
timestamp, eg. 10k values, that each has an ID attached.

Since all of them have the same timestamp, should I be using metadata to
say that they all share the same timestamp, or should the timestamp be part
of the record? What I'm concerned about is the amount of space complexity
this may incur unnecessarily. My understanding is that Arrow's requirement
is that access is always O(1), so I was wondering if a run length encoding
might be possible to be used in a situation like this?

Intuitively it feels wrong to use metadata to store a timestamp, but that
made me wonder, what are typical uses of Arrow metadata, or could you share
some example in the wild?

Thank you for your help!

Best,
Frederic

Reply via email to