I am still not very convinced about the value about this implementation
- particularly considering the advances made since 1.3 in memory
allocators and garbage collection.
The side effect of this proposal is many, and sometimes non-obvious.
Like implicitly moving young generation data into older generation,
causing much more memory pressure for gc, fragmentation of memory blocks
causing quite a bit of memory pressure, replicating quite a bit of
functionality with garbage collection, possibility of bugs with ref
counting, etc.
If assumption that current working set of bag/tuple does not need to be
spilled, and anything else can be, then this will pretty much
deteriorate to current impl in worst case.
A much more simpler method to gain benefits would be to handle
primitives as ... primitives and not through the java wrapper classes
for them.
It should be possible to write schema aware tuples which make use of the
primitives specified to take a fraction of memory required (4 bytes +
null_check boolean for int + offset mapping instead of 24/32 bytes it
currently is, etc).
Regards,
Mridul
Alan Gates wrote:
http://wiki.apache.org/pig/PigMemory
Alan.