Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-12 Thread Peter Hunsberger
On Sun, May 11, 2008 at 6:59 PM, Joerg Heinicke [EMAIL PROTECTED] wrote:
 On 09.05.2008 09:41, Peter Hunsberger wrote:


 
  I haven't looked at the code here, but couldn't you just introduce a
  second getOutputStream( int bufferSize ) method where the current
  interface method continues with the current default logic if it is
  used?
 

  getOutputStream() actually already takes an int parameter, the flush buffer
 size.

Yeah, I saw that...

 Whether to add another getOutputStream() method or modify the existing
 one there is not really a difference IMO. Environment is a kind of internal
 interface (or SPI how it used to be called lately, isn't it?). This means
 there should be only very few implementations besides the one we provide if
 at all (Forrest, Lenya, CLI environment?). And in Cocoon we would change all
 usages of the single-parameterized method to the one with 2 parameters. So
 whoever provides such an Environment implementation has to adapt his
 implementation in a meaningful way anyway (empty implementation returning
 null, throwing NotSupportedException, whatever would not work). So it's the
 same effort for them whether to add a new method or changing existing one on
 the interface.

I don't see that, you can continue the existing behaviour for those
who don't change?

  IMO the decision should be made purely from a design perspective. Should  a
 configuration parameter passed around as method parameter though it is
 static through the whole lifecycle of the Environment instance? In a perfect
 world I'd say no :)

That makes sense.  Guess the question in that case is; are the are any
use cases where people could use such a parameter as not static?

 Which leaves the question how to inject the parameter.
 One place is on instantiation (e.g. CocoonServlet.getEnvironment(..) in 2.1,
 RequestProcessor.getEnvironment(..) in 2.2) which leaves us with the web.xml
 init parameter (or analogical alternatives for other environments) as
 described.

  Another option I found is to setup the environment (i.e. injecting the
 parameter) while setting up the pipeline. AbstractProcessingPipeline is the
 place where we have access to the current flush buffer size parameter and
 call getOutputStream(..) on the environment. It has a method
 setupPipeline(Environment). Why not injecting the parameter(s) here? Due to
 its lifecycle changing the property of the environment should not cause any
 problem since it's a one-time usage object, no threading problems or
 something like that.


Seems reasonable.

[snip/]

-- 
Peter Hunsberger


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-11 Thread Joerg Heinicke

On 09.05.2008 09:41, Peter Hunsberger wrote:


I think this is rather hard to do. The place where we instantiate the
BufferedOutputStreams (both java.io and o.a.c.util) is
AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a
second buffer size argument to the BufferedOutputStream constructor we need
to have it available there. One option would be to add it to
getOutputStream() - which is an interface change and not really nice.


I haven't looked at the code here, but couldn't you just introduce a
second getOutputStream( int bufferSize ) method where the current
interface method continues with the current default logic if it is
used?


getOutputStream() actually already takes an int parameter, the flush 
buffer size. Whether to add another getOutputStream() method or modify 
the existing one there is not really a difference IMO. Environment is a 
kind of internal interface (or SPI how it used to be called lately, 
isn't it?). This means there should be only very few implementations 
besides the one we provide if at all (Forrest, Lenya, CLI environment?). 
And in Cocoon we would change all usages of the single-parameterized 
method to the one with 2 parameters. So whoever provides such an 
Environment implementation has to adapt his implementation in a 
meaningful way anyway (empty implementation returning null, throwing 
NotSupportedException, whatever would not work). So it's the same effort 
for them whether to add a new method or changing existing one on the 
interface.


IMO the decision should be made purely from a design perspective. Should 
 a configuration parameter passed around as method parameter though it 
is static through the whole lifecycle of the Environment instance? In a 
perfect world I'd say no :) Which leaves the question how to inject the 
parameter. One place is on instantiation (e.g. 
CocoonServlet.getEnvironment(..) in 2.1, 
RequestProcessor.getEnvironment(..) in 2.2) which leaves us with the 
web.xml init parameter (or analogical alternatives for other 
environments) as described.


Another option I found is to setup the environment (i.e. injecting the 
parameter) while setting up the pipeline. AbstractProcessingPipeline is 
the place where we have access to the current flush buffer size 
parameter and call getOutputStream(..) on the environment. It has a 
method setupPipeline(Environment). Why not injecting the parameter(s) 
here? Due to its lifecycle changing the property of the environment 
should not cause any problem since it's a one-time usage object, no 
threading problems or something like that.


I'm just curious what the original reason was to pass the parameter 
along rather than injecting it? Maybe there is a flaw in my thoughts :) 
Whoever knows the code, are my statements correct and what do you think 
about the approach injecting the parameters rather than passing them 
along? Second, if it is a valid approach which way to go?


1) Don't provide a separate configuration option for initial buffer size.
2) Pass both parameters to getOutputStream(..).
3) Leave the current flush buffer size as parameter to 
getOutputStream(..) but inject the other one

a) from web.xml.
b) from pipeline configuration.
4) Inject both buffer sizes, eventually reactivating/reintroducing 
getOutputStream() without any parameter and deprecating the other one.


Many questions, yet another one: What do you think? :)

Joerg


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-09 Thread Peter Hunsberger
On Fri, May 9, 2008 at 12:08 AM, Joerg Heinicke [EMAIL PROTECTED] wrote:
 On 08.05.2008 11:53, Bruce Atherton wrote:

snip/

 I think this is rather hard to do. The place where we instantiate the
 BufferedOutputStreams (both java.io and o.a.c.util) is
 AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a
 second buffer size argument to the BufferedOutputStream constructor we need
 to have it available there. One option would be to add it to
 getOutputStream() - which is an interface change and not really nice.

I haven't looked at the code here, but couldn't you just introduce a
second getOutputStream( int bufferSize ) method where the current
interface method continues with the current default logic if it is
used?


 The second option would be to pass it to the Environment instance. Since
 environments can be wrapped it needs again an interface change (but just
 adding a method, which is much better). And you have to look where
 environments are instantiated, e.g. HttpServletEnvironment in CocoonServlet.
 From what I see from a quick look only potential way to provide a
 configuration would be as servlet init parameter. That makes it two
 different places to configure these two different buffer sizes - not very
 intuitive.

Yuck.

-- 
Peter Hunsberger


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-08 Thread Bruce Atherton
My only comment is that I think it would be good to allow the initial 
buffer size to be configurable. If you know the bulk of your responses 
are greater than 32K, then performing the ramp-up from 8K every time 
would be a waste of resources. For another web site, if most responses 
were smaller than 6K then an 8K buffer would be perfect. Allowing 
someone to tweak that based on their situation seems useful to me.


Not critical though, if it is hard to do. Allowing the buffer to scale 
is the important thing.


Joerg Heinicke wrote:

On 27.04.2008 23:43, Joerg Heinicke wrote:

2. Does the full amount of the buffer automatically get allocated 
for each request, or does it grow gradually based on the xml stream 
size?


I have a lot of steps in the pipeline, so I am worried about the 
impact of creating too many buffers even if they are relatively 
small. A 1 Meg buffer might be too much if it is created for every 
element of every pipeline for every request.


That's a very good question - with a negative answer: A buffer of 
that particular size is created initially. That's why I want to bring 
this issue up on dev again: With my changes for COCOON-2168 [1] it's 
now not only a problem for applications with over-sized downloads but 
potentially for everyone relying on Cocoon's default configuration. 
One idea would be to change our BufferedOutputStream implementation 
to take 2 parameters: one for the initial buffer size and one for the 
flush size. The flush treshold would be the configurable 
outputBufferSize, the initial buffer size does not need to be 
configurable I think.


What do other think?


No interest or no objections? :)

Joerg




Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-08 Thread Antonio Gallardo

Hi Joerg,

I am +1.

One question, what are supposed to be the default values for both 
parameters?


Best Regards,

Antonio Gallardo.

Joerg Heinicke escribió:

On 27.04.2008 23:43, Joerg Heinicke wrote:

2. Does the full amount of the buffer automatically get allocated 
for each request, or does it grow gradually based on the xml stream 
size?


I have a lot of steps in the pipeline, so I am worried about the 
impact of creating too many buffers even if they are relatively 
small. A 1 Meg buffer might be too much if it is created for every 
element of every pipeline for every request.


That's a very good question - with a negative answer: A buffer of 
that particular size is created initially. That's why I want to bring 
this issue up on dev again: With my changes for COCOON-2168 [1] it's 
now not only a problem for applications with over-sized downloads but 
potentially for everyone relying on Cocoon's default configuration. 
One idea would be to change our BufferedOutputStream implementation 
to take 2 parameters: one for the initial buffer size and one for the 
flush size. The flush treshold would be the configurable 
outputBufferSize, the initial buffer size does not need to be 
configurable I think.


What do other think?


No interest or no objections? :)

Joerg




Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-08 Thread Joerg Heinicke

On 08.05.2008 11:53, Bruce Atherton wrote:

My only comment is that I think it would be good to allow the initial 
buffer size to be configurable. If you know the bulk of your responses 
are greater than 32K, then performing the ramp-up from 8K every time 
would be a waste of resources. For another web site, if most responses 
were smaller than 6K then an 8K buffer would be perfect. Allowing 
someone to tweak that based on their situation seems useful to me.


Not critical though, if it is hard to do. Allowing the buffer to scale 
is the important thing.


I think this is rather hard to do. The place where we instantiate the 
BufferedOutputStreams (both java.io and o.a.c.util) is 
AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass 
a second buffer size argument to the BufferedOutputStream constructor we 
need to have it available there. One option would be to add it to 
getOutputStream() - which is an interface change and not really nice.


The second option would be to pass it to the Environment instance. Since 
environments can be wrapped it needs again an interface change (but just 
adding a method, which is much better). And you have to look where 
environments are instantiated, e.g. HttpServletEnvironment in 
CocoonServlet. From what I see from a quick look only potential way to 
provide a configuration would be as servlet init parameter. That makes 
it two different places to configure these two different buffer sizes - 
not very intuitive.


Joerg


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-08 Thread Joerg Heinicke

On 08.05.2008 12:16, Antonio Gallardo wrote:

One question, what are supposed to be the default values for both 
parameters?


For the initial buffer size I thought of 8K, maybe 16K. It should be a 
reasonable size that's not overly large (i.e. unnecessarily reserved 
memory) for most of the resources.


For the flush buffer size we already talked about 1MB as default value 
[1]. This size should be nearly never hit.


Joerg

[1] http://marc.info/?t=12047341133r=1w=4


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-05-07 Thread Joerg Heinicke

On 27.04.2008 23:43, Joerg Heinicke wrote:

2. Does the full amount of the buffer automatically get allocated for 
each request, or does it grow gradually based on the xml stream size?


I have a lot of steps in the pipeline, so I am worried about the 
impact of creating too many buffers even if they are relatively small. 
A 1 Meg buffer might be too much if it is created for every element of 
every pipeline for every request.


That's a very good question - with a negative answer: A buffer of that 
particular size is created initially. That's why I want to bring this 
issue up on dev again: With my changes for COCOON-2168 [1] it's now not 
only a problem for applications with over-sized downloads but 
potentially for everyone relying on Cocoon's default configuration. One 
idea would be to change our BufferedOutputStream implementation to take 
2 parameters: one for the initial buffer size and one for the flush 
size. The flush treshold would be the configurable outputBufferSize, the 
initial buffer size does not need to be configurable I think.


What do other think?


No interest or no objections? :)

Joerg


Re: Avoiding OutOfMemory Errors by limiting data in pipeline

2008-04-27 Thread Joerg Heinicke

On 24.04.2008 16:08, Bruce Atherton wrote:
Thanks for the response. About setting the buffer size, this looks like 
it could be what I am looking for. A few questions:


1. Do I have to set the buffer size on each transformer and the 
serializer as well as the generator? What about setting it on the pipeline?


It is on the pipeline and only there. You can set it on the map:pipe 
element in the map:components section, so that it is applied to each 
pipeline of that type. Or on any individual map:pipeline element in the 
map:pipelines section.


2. Does the full amount of the buffer automatically get allocated for 
each request, or does it grow gradually based on the xml stream size?


I have a lot of steps in the pipeline, so I am worried about the impact 
of creating too many buffers even if they are relatively small. A 1 Meg 
buffer might be too much if it is created for every element of every 
pipeline for every request.


That's a very good question - with a negative answer: A buffer of that 
particular size is created initially. That's why I want to bring this 
issue up on dev again: With my changes for COCOON-2168 [1] it's now not 
only a problem for applications with over-sized downloads but 
potentially for everyone relying on Cocoon's default configuration. One 
idea would be to change our BufferedOutputStream implementation to take 
2 parameters: one for the initial buffer size and one for the flush 
size. The flush treshold would be the configurable outputBufferSize, the 
initial buffer size does not need to be configurable I think.


What do other think?

On an unrelated note, is there some way to configure caching so that 
nothing is cached that is larger than a certain size? I'm worried that 
this might be a caching issue rather than a buffer issue.


Not that I'm aware of. Why do you think it's caching? Caching is at 
least configurable in terms of number of cache entries and I also think 
in terms of max cache size. But beyond a certain cache size the cache 
entries are written to disk anyway so it's unlikely resulting in a 
memory issue.


How do you read the object graph from the heap dump? To tell you the 
truth, I'm not sure. This is the hierarchy generated by the Heap 
Analyzer tool from IBM, and is from a heap dump on an AIX box running 
the IBM JRE. My guess as to the Object referencing the 
ComponentsSelector is that the ArrayList is not generified, so the 
analyzer doesn't know the actual type of the Object being referenced. 
What the object actually is would depend on what 
CachingProcessorPipeline put into the ArrayList. That is just a guess, 
though. And I have no explanation for the link between 
FOM_Cocoon$CallContext and ConcreteCallProcessor. Perhaps things were 
different in the 2.1.9 release?


No serious changes since 2.1.9 which is rev 392241 [2].

Joerg

[1] https://issues.apache.org/jira/browse/COCOON-2168
[2] 
http://svn.apache.org/viewvc/cocoon/branches/BRANCH_2_1_X/src/java/org/apache/cocoon/components/flow/javascript/fom/FOM_Cocoon.java?view=log