Re: A proposal for changing pig's memory management

2009-06-01 Thread Mridul Muralidharan
Alan Gates wrote: On May 19, 2009, at 10:30 PM, Mridul Muralidharan wrote: I am still not very convinced about the value about this implementation - particularly considering the advances made since 1.3 in memory allocators and garbage collection. My fundamental concern is not with the slo

Re: A proposal for changing pig's memory management

2009-05-20 Thread Alan Gates
On May 19, 2009, at 10:30 PM, Mridul Muralidharan wrote: I am still not very convinced about the value about this implementation - particularly considering the advances made since 1.3 in memory allocators and garbage collection. My fundamental concern is not with the slowness of garbage

Re: A proposal for changing pig's memory management

2009-05-19 Thread Mridul Muralidharan
I am still not very convinced about the value about this implementation - particularly considering the advances made since 1.3 in memory allocators and garbage collection. The side effect of this proposal is many, and sometimes non-obvious. Like implicitly moving young generation data into ol

Re: A proposal for changing pig's memory management

2009-05-19 Thread Alan Gates
We definitely do not want to follow the current design of keeping chararrays and bytearrays as separate objects. It is that overhead of an object for each field that we are trying to avoid. The reason for constraining a tuple to store its data in one TupleBuffer is to limit the size of the

Re: A proposal for changing pig's memory management

2009-05-19 Thread Ted Dunning
If you have a small number of long-lived large objects and a large number of small ephemeral objects then the java collector should be in pig-heaven (as it were). The long-lived objects will take no time to collect and the ephemeral objects won't be around to collect by the time the full GC happen

Re: A proposal for changing pig's memory management

2009-05-19 Thread Alan Gates
The claims in the paper I was interested in were not issues like non- blocking I/O etc. The claim that is of interest to pig is that a memory allocation and garbage collection scheme that is beyond the control of the programmer is a bad fit for a large data processing system. This is a fun

Re: A proposal for changing pig's memory management

2009-05-15 Thread Thejas Nair
With a constraint that all scalar values in a tuple should fit into a single buffer, the values will always have to be copied whenever a tuple contents need to be copied to a new tuple after a relational operation. The overhead of copying is not large for numeric types compared to the existing imp

Re: A proposal for changing pig's memory management

2009-05-14 Thread Ted Dunning
That Telegraph dataflow paper is pretty long in the tooth. Certainly several of their claims have little force any more (lack of non-blocking I/O, poor thread performance, no unmap, very expensive synchronization for uncontested locks). It is worth that they did all of their tests on the 1.3 JVM

A proposal for changing pig's memory management

2009-05-14 Thread Alan Gates
http://wiki.apache.org/pig/PigMemory Alan.