Alan Gates wrote:
On May 19, 2009, at 10:30 PM, Mridul Muralidharan wrote:
I am still not very convinced about the value about this
implementation - particularly considering the advances made since 1.3
in memory allocators and garbage collection.
My fundamental concern is not with the slo
On May 19, 2009, at 10:30 PM, Mridul Muralidharan wrote:
I am still not very convinced about the value about this
implementation - particularly considering the advances made since
1.3 in memory allocators and garbage collection.
My fundamental concern is not with the slowness of garbage
I am still not very convinced about the value about this implementation
- particularly considering the advances made since 1.3 in memory
allocators and garbage collection.
The side effect of this proposal is many, and sometimes non-obvious.
Like implicitly moving young generation data into ol
We definitely do not want to follow the current design of keeping
chararrays and bytearrays as separate objects. It is that overhead of
an object for each field that we are trying to avoid.
The reason for constraining a tuple to store its data in one
TupleBuffer is to limit the size of the
If you have a small number of long-lived large objects and a large number of
small ephemeral objects then the java collector should be in pig-heaven (as
it were). The long-lived objects will take no time to collect and the
ephemeral objects won't be around to collect by the time the full GC
happen
The claims in the paper I was interested in were not issues like non-
blocking I/O etc. The claim that is of interest to pig is that a
memory allocation and garbage collection scheme that is beyond the
control of the programmer is a bad fit for a large data processing
system. This is a fun
With a constraint that all scalar values in a tuple should fit into a single
buffer, the values will always have to be copied whenever a tuple contents
need to be copied to a new tuple after a relational operation.
The overhead of copying is not large for numeric types compared to the
existing imp
That Telegraph dataflow paper is pretty long in the tooth. Certainly
several of their claims have little force any more (lack of non-blocking
I/O, poor thread performance, no unmap, very expensive synchronization for
uncontested locks). It is worth that they did all of their tests on the 1.3
JVM
http://wiki.apache.org/pig/PigMemory
Alan.