Hi guys,

I have a question how to solve this the best way with protocol buffers.
I want to create a file based storage for a large amount of data.
Therefore storage space is a (or the) limiting factor.

I want to store a lot of values for one key and have the following

about 2000 measurands.
Each measurand has an id (int32) and a value. They have different values:
About 1200 of them: float
about 700 of them: unit32
about 50 of them: unit64
about 50 of them: string

I want to store about 600 values for each measurand (id) in one file.
Those files are then compressed using XZ.

The type of the value must not change.
The values tend to stay the same for one id: Most of the times all 600
values are the same (unit*/string types) or very close to each other

The IDs are (more or less) succeeding. But are very low (starting at 0
up to about 5000)

I have several ideas now:

Simple approach:
Array of:
ID + Array of values

Plain structure:
IDs in Array(s) + values in arrays of arrays

Optimized IDs:
Base ID + relative offset in Array

But: The values are already very small, so there is probably no advantage?

Optimized Values:
Base value + relative offset in Array

But: This is probably not very helpful, since XZ compression should
compress these duplicates

Optimized Values 2:
Base value + relative offset in Array
+ Ordering not by ID but instead by value.

I will implement all these approaches and give them a try. But has
anybody some hints or tips how I could solve this more efficient?



You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to protobuf+unsubscr...@googlegroups.com.
To post to this group, send email to protobuf@googlegroups.com.
Visit this group at https://groups.google.com/group/protobuf.
For more options, visit https://groups.google.com/d/optout.

Reply via email to