RE: Caching output from xpathdirectory generator

Ard Schrijvers Tue, 10 Apr 2007 22:51:54 -0700

> 
> On 10.04.2007 17:49, Grzegorz Kossakowski wrote:
> 
> > Even though, all components from your pipeline are 
> cacheable you are 
> > right. Generator is called each time you change sort order.
> > It is that way because "caching" pipeline gives "all or nothing" 
> > caching. It means that, if some component in the pipeline must be 
> > executed again it is done so for all components from the pipeline.
> 
> Are you sure?


within one pipeline, and if all components are CacheableProcessingComponent(s), 
yes. 

> I have always lived in confidence on component wise 
> caching starting at the beginning of the pipeline. If 
> something prevents 
> using cached output of transformer 3, but all up to that one is ok, 
> cached output of transformer 2 is used. 

Indeed

> The sample seems to 
> prove that 
> wrong. 

Not true

>Probably it just never made a difference for us as 
> when there was 
> something uncacheable or uncached it was the generator anyway.

Hmm, not really, what about the sql transformer or jx transformer (without the 
cachevalidity and key added yourself)

> 
> But what about the following change then?
> 
> <map:pipeline type="caching">
>     <map:match pattern="cars">
>        <map:generate type="xpathdirectory" 
> src="{fergus:release}/cars">
>           <map:parameter name="depth" value="2"/>
>           <map:parameter name="exclude" value=".DS_Store"/>
>           <map:parameter name="xpath" 
> value="/car/colour|/car/model|/car/year"/>
>        </map:generate>
>        <map:serialize type="xml"/>
>     </map:match>
> 
>     <map:match type="regexp" pattern="cars(.?sortby=(\w+))?">
>        <map:generate src="cocoon:/cars"/>
>        <map:transform type="xslt" src="xslt/carsummary1.xslt">
>           <map:parameter name="sortby" value="{2}"/>
>           <map:parameter name="sortprefix" value="sortby"/>
>           <map:parameter name="sortorder" value="sortorder"/>
>           </map:transform>
>        <map:serialize type="html"/>
>     </map:match>
> </map:pipeline>
> 
> Does this end in a cached internal pipeline?

Yes of course! This ends up in 2 cache entries in the cocoon cache for one 
call. the cachekey entry for the first pipeline would look something (very 
rough )like:

PK_G-xpathdirectory-"somelocation";depth=2;exclude=.DS_Store;xpath=/car/colour|/car/model|/car/year

and the second one like, where the pipelinehash is some sort of hash of the 
first pipeline key.

PK_G-file-cocoon:/cars?pipelinehash=-302669620116689610_T-xslt-file:"xsl-location";sortby=desc;....etc
 serializer

(reading and understanding cachekeys in your statuspage can *really* help you 
find bugs and caching imperfections,
for example: add in an xsl a parameter : <map:parameter name="dummy" 
value="AAAAAAAAA". If you run pipeline with
the xsl, and you see the dummy param in a cachekey, the xsl gets cached. if 
not, you have a caching problem
before the xsl.)

When now a second call comes in, with sorting reverted, all components getKey() 
are invoked. Since sortby is different, the "total" cachekey cannot be found in 
cache. Now (I haven't checked this in code, but knows that is works like this) 
for <map:generate src="cocoon:/cars"/>, it does find a cachekey so it gets the 
cached response back from cache. Only the xslt has to be done again. 

So, you might conclude that it is extremely important to know *when* to call a 
subpipeline and when not. Calling a subpipeline might pollute your cache, or it 
might add an extreme important (like this one) "intermediate" cached response. 

@Joerg: you are confused with the pipeline in which not all components are 
cacheable components. In that case, Cocoon caches untill the first not 
cacheable component. So, in the original setup of one pipeline, if Fergus would 
replace the xsl transformer with a custom one, that would not implement 
CacheableProcessingComponent, then the total result of the pipeline would never 
be cached, but untill the transformer. Then, the transformer would always have 
to be done, and the generator ones (as long as the validities are valid). A 
call with a different sort order would thus be much faster in the single 
pipeline setup if the transformer was uncacheable....what a paradox :-) 

Anyway @Fergus:

making two seperate pipelines solves the problem. Be aware that the 
xpathdirectory uses an aggregated timestamp validities of all sources it 
depends on, so a change on filesystem of one of the sources would invalidate 
your generator. If this happens frequently, and it has to perform I would 
either:

1) wrap it with a cachingsource and do an async regenerate when it is not valid 
anymore 
2) implement your onw generator which use a compound composite which can be 
easily much more efficient then
the "all or nothing" xpathdirectory generator (though might be a hard job first 
time)
3) use the excalibur filemonitor to update in background if a source is changed

Regards Ard

> 
> Joerg
> 
> ---------------------------------------------------------------------
> To unsubscribe, e-mail: [EMAIL PROTECTED]
> For additional commands, e-mail: [EMAIL PROTECTED]
> 
>

RE: Caching output from xpathdirectory generator

Reply via email to