Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On Sun, May 11, 2008 at 6:59 PM, Joerg Heinicke [EMAIL PROTECTED] wrote: On 09.05.2008 09:41, Peter Hunsberger wrote: I haven't looked at the code here, but couldn't you just introduce a second getOutputStream( int bufferSize ) method where the current interface method continues with the current default logic if it is used? getOutputStream() actually already takes an int parameter, the flush buffer size. Yeah, I saw that... Whether to add another getOutputStream() method or modify the existing one there is not really a difference IMO. Environment is a kind of internal interface (or SPI how it used to be called lately, isn't it?). This means there should be only very few implementations besides the one we provide if at all (Forrest, Lenya, CLI environment?). And in Cocoon we would change all usages of the single-parameterized method to the one with 2 parameters. So whoever provides such an Environment implementation has to adapt his implementation in a meaningful way anyway (empty implementation returning null, throwing NotSupportedException, whatever would not work). So it's the same effort for them whether to add a new method or changing existing one on the interface. I don't see that, you can continue the existing behaviour for those who don't change? IMO the decision should be made purely from a design perspective. Should a configuration parameter passed around as method parameter though it is static through the whole lifecycle of the Environment instance? In a perfect world I'd say no :) That makes sense. Guess the question in that case is; are the are any use cases where people could use such a parameter as not static? Which leaves the question how to inject the parameter. One place is on instantiation (e.g. CocoonServlet.getEnvironment(..) in 2.1, RequestProcessor.getEnvironment(..) in 2.2) which leaves us with the web.xml init parameter (or analogical alternatives for other environments) as described. Another option I found is to setup the environment (i.e. injecting the parameter) while setting up the pipeline. AbstractProcessingPipeline is the place where we have access to the current flush buffer size parameter and call getOutputStream(..) on the environment. It has a method setupPipeline(Environment). Why not injecting the parameter(s) here? Due to its lifecycle changing the property of the environment should not cause any problem since it's a one-time usage object, no threading problems or something like that. Seems reasonable. [snip/] -- Peter Hunsberger
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On 09.05.2008 09:41, Peter Hunsberger wrote: I think this is rather hard to do. The place where we instantiate the BufferedOutputStreams (both java.io and o.a.c.util) is AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a second buffer size argument to the BufferedOutputStream constructor we need to have it available there. One option would be to add it to getOutputStream() - which is an interface change and not really nice. I haven't looked at the code here, but couldn't you just introduce a second getOutputStream( int bufferSize ) method where the current interface method continues with the current default logic if it is used? getOutputStream() actually already takes an int parameter, the flush buffer size. Whether to add another getOutputStream() method or modify the existing one there is not really a difference IMO. Environment is a kind of internal interface (or SPI how it used to be called lately, isn't it?). This means there should be only very few implementations besides the one we provide if at all (Forrest, Lenya, CLI environment?). And in Cocoon we would change all usages of the single-parameterized method to the one with 2 parameters. So whoever provides such an Environment implementation has to adapt his implementation in a meaningful way anyway (empty implementation returning null, throwing NotSupportedException, whatever would not work). So it's the same effort for them whether to add a new method or changing existing one on the interface. IMO the decision should be made purely from a design perspective. Should a configuration parameter passed around as method parameter though it is static through the whole lifecycle of the Environment instance? In a perfect world I'd say no :) Which leaves the question how to inject the parameter. One place is on instantiation (e.g. CocoonServlet.getEnvironment(..) in 2.1, RequestProcessor.getEnvironment(..) in 2.2) which leaves us with the web.xml init parameter (or analogical alternatives for other environments) as described. Another option I found is to setup the environment (i.e. injecting the parameter) while setting up the pipeline. AbstractProcessingPipeline is the place where we have access to the current flush buffer size parameter and call getOutputStream(..) on the environment. It has a method setupPipeline(Environment). Why not injecting the parameter(s) here? Due to its lifecycle changing the property of the environment should not cause any problem since it's a one-time usage object, no threading problems or something like that. I'm just curious what the original reason was to pass the parameter along rather than injecting it? Maybe there is a flaw in my thoughts :) Whoever knows the code, are my statements correct and what do you think about the approach injecting the parameters rather than passing them along? Second, if it is a valid approach which way to go? 1) Don't provide a separate configuration option for initial buffer size. 2) Pass both parameters to getOutputStream(..). 3) Leave the current flush buffer size as parameter to getOutputStream(..) but inject the other one a) from web.xml. b) from pipeline configuration. 4) Inject both buffer sizes, eventually reactivating/reintroducing getOutputStream() without any parameter and deprecating the other one. Many questions, yet another one: What do you think? :) Joerg
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On Fri, May 9, 2008 at 12:08 AM, Joerg Heinicke [EMAIL PROTECTED] wrote: On 08.05.2008 11:53, Bruce Atherton wrote: snip/ I think this is rather hard to do. The place where we instantiate the BufferedOutputStreams (both java.io and o.a.c.util) is AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a second buffer size argument to the BufferedOutputStream constructor we need to have it available there. One option would be to add it to getOutputStream() - which is an interface change and not really nice. I haven't looked at the code here, but couldn't you just introduce a second getOutputStream( int bufferSize ) method where the current interface method continues with the current default logic if it is used? The second option would be to pass it to the Environment instance. Since environments can be wrapped it needs again an interface change (but just adding a method, which is much better). And you have to look where environments are instantiated, e.g. HttpServletEnvironment in CocoonServlet. From what I see from a quick look only potential way to provide a configuration would be as servlet init parameter. That makes it two different places to configure these two different buffer sizes - not very intuitive. Yuck. -- Peter Hunsberger
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
My only comment is that I think it would be good to allow the initial buffer size to be configurable. If you know the bulk of your responses are greater than 32K, then performing the ramp-up from 8K every time would be a waste of resources. For another web site, if most responses were smaller than 6K then an 8K buffer would be perfect. Allowing someone to tweak that based on their situation seems useful to me. Not critical though, if it is hard to do. Allowing the buffer to scale is the important thing. Joerg Heinicke wrote: On 27.04.2008 23:43, Joerg Heinicke wrote: 2. Does the full amount of the buffer automatically get allocated for each request, or does it grow gradually based on the xml stream size? I have a lot of steps in the pipeline, so I am worried about the impact of creating too many buffers even if they are relatively small. A 1 Meg buffer might be too much if it is created for every element of every pipeline for every request. That's a very good question - with a negative answer: A buffer of that particular size is created initially. That's why I want to bring this issue up on dev again: With my changes for COCOON-2168 [1] it's now not only a problem for applications with over-sized downloads but potentially for everyone relying on Cocoon's default configuration. One idea would be to change our BufferedOutputStream implementation to take 2 parameters: one for the initial buffer size and one for the flush size. The flush treshold would be the configurable outputBufferSize, the initial buffer size does not need to be configurable I think. What do other think? No interest or no objections? :) Joerg
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
Hi Joerg, I am +1. One question, what are supposed to be the default values for both parameters? Best Regards, Antonio Gallardo. Joerg Heinicke escribió: On 27.04.2008 23:43, Joerg Heinicke wrote: 2. Does the full amount of the buffer automatically get allocated for each request, or does it grow gradually based on the xml stream size? I have a lot of steps in the pipeline, so I am worried about the impact of creating too many buffers even if they are relatively small. A 1 Meg buffer might be too much if it is created for every element of every pipeline for every request. That's a very good question - with a negative answer: A buffer of that particular size is created initially. That's why I want to bring this issue up on dev again: With my changes for COCOON-2168 [1] it's now not only a problem for applications with over-sized downloads but potentially for everyone relying on Cocoon's default configuration. One idea would be to change our BufferedOutputStream implementation to take 2 parameters: one for the initial buffer size and one for the flush size. The flush treshold would be the configurable outputBufferSize, the initial buffer size does not need to be configurable I think. What do other think? No interest or no objections? :) Joerg
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On 08.05.2008 11:53, Bruce Atherton wrote: My only comment is that I think it would be good to allow the initial buffer size to be configurable. If you know the bulk of your responses are greater than 32K, then performing the ramp-up from 8K every time would be a waste of resources. For another web site, if most responses were smaller than 6K then an 8K buffer would be perfect. Allowing someone to tweak that based on their situation seems useful to me. Not critical though, if it is hard to do. Allowing the buffer to scale is the important thing. I think this is rather hard to do. The place where we instantiate the BufferedOutputStreams (both java.io and o.a.c.util) is AbstractEnvironment.getOutputStream(int bufferSize). So in order to pass a second buffer size argument to the BufferedOutputStream constructor we need to have it available there. One option would be to add it to getOutputStream() - which is an interface change and not really nice. The second option would be to pass it to the Environment instance. Since environments can be wrapped it needs again an interface change (but just adding a method, which is much better). And you have to look where environments are instantiated, e.g. HttpServletEnvironment in CocoonServlet. From what I see from a quick look only potential way to provide a configuration would be as servlet init parameter. That makes it two different places to configure these two different buffer sizes - not very intuitive. Joerg
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On 08.05.2008 12:16, Antonio Gallardo wrote: One question, what are supposed to be the default values for both parameters? For the initial buffer size I thought of 8K, maybe 16K. It should be a reasonable size that's not overly large (i.e. unnecessarily reserved memory) for most of the resources. For the flush buffer size we already talked about 1MB as default value [1]. This size should be nearly never hit. Joerg [1] http://marc.info/?t=12047341133r=1w=4
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On 27.04.2008 23:43, Joerg Heinicke wrote: 2. Does the full amount of the buffer automatically get allocated for each request, or does it grow gradually based on the xml stream size? I have a lot of steps in the pipeline, so I am worried about the impact of creating too many buffers even if they are relatively small. A 1 Meg buffer might be too much if it is created for every element of every pipeline for every request. That's a very good question - with a negative answer: A buffer of that particular size is created initially. That's why I want to bring this issue up on dev again: With my changes for COCOON-2168 [1] it's now not only a problem for applications with over-sized downloads but potentially for everyone relying on Cocoon's default configuration. One idea would be to change our BufferedOutputStream implementation to take 2 parameters: one for the initial buffer size and one for the flush size. The flush treshold would be the configurable outputBufferSize, the initial buffer size does not need to be configurable I think. What do other think? No interest or no objections? :) Joerg
Re: Avoiding OutOfMemory Errors by limiting data in pipeline
On 24.04.2008 16:08, Bruce Atherton wrote: Thanks for the response. About setting the buffer size, this looks like it could be what I am looking for. A few questions: 1. Do I have to set the buffer size on each transformer and the serializer as well as the generator? What about setting it on the pipeline? It is on the pipeline and only there. You can set it on the map:pipe element in the map:components section, so that it is applied to each pipeline of that type. Or on any individual map:pipeline element in the map:pipelines section. 2. Does the full amount of the buffer automatically get allocated for each request, or does it grow gradually based on the xml stream size? I have a lot of steps in the pipeline, so I am worried about the impact of creating too many buffers even if they are relatively small. A 1 Meg buffer might be too much if it is created for every element of every pipeline for every request. That's a very good question - with a negative answer: A buffer of that particular size is created initially. That's why I want to bring this issue up on dev again: With my changes for COCOON-2168 [1] it's now not only a problem for applications with over-sized downloads but potentially for everyone relying on Cocoon's default configuration. One idea would be to change our BufferedOutputStream implementation to take 2 parameters: one for the initial buffer size and one for the flush size. The flush treshold would be the configurable outputBufferSize, the initial buffer size does not need to be configurable I think. What do other think? On an unrelated note, is there some way to configure caching so that nothing is cached that is larger than a certain size? I'm worried that this might be a caching issue rather than a buffer issue. Not that I'm aware of. Why do you think it's caching? Caching is at least configurable in terms of number of cache entries and I also think in terms of max cache size. But beyond a certain cache size the cache entries are written to disk anyway so it's unlikely resulting in a memory issue. How do you read the object graph from the heap dump? To tell you the truth, I'm not sure. This is the hierarchy generated by the Heap Analyzer tool from IBM, and is from a heap dump on an AIX box running the IBM JRE. My guess as to the Object referencing the ComponentsSelector is that the ArrayList is not generified, so the analyzer doesn't know the actual type of the Object being referenced. What the object actually is would depend on what CachingProcessorPipeline put into the ArrayList. That is just a guess, though. And I have no explanation for the link between FOM_Cocoon$CallContext and ConcreteCallProcessor. Perhaps things were different in the 2.1.9 release? No serious changes since 2.1.9 which is rev 392241 [2]. Joerg [1] https://issues.apache.org/jira/browse/COCOON-2168 [2] http://svn.apache.org/viewvc/cocoon/branches/BRANCH_2_1_X/src/java/org/apache/cocoon/components/flow/javascript/fom/FOM_Cocoon.java?view=log