Hi,
I have a schema to store linear models in machine learning whose
relevant subpart looks like this:
{
"type": "record",
"name": "LinearModel",
"fields": [{"name": "weights", "type":{"type":"array",
"items":"double"}}]
}
I understand that the actual serialized form of this should be rather
efficient. What worries me is how the Java (specific) API for the
weights plays out:
public class LinearModel{...
public GenericArray<Double> weights;
...}
This means that I have to wrap each and every double in my double[]
into a Double object and add it to the GenericArray, right?
The trouble is that the double[] I intend to store may very well be
choosen in size to max out the available memory of the machine, so I
don't really have room for a more-than-lifesize copy of the data.
Is there a way to "stream" the doubles into the output without holding
a copy in memory? Or is there another way to encode a double[] in a
schema?
Thanks for any pointers,
Markus