David,

yes, the stylesheets do get compiled into internal dynamic trees
already; in this respect, there's not much work to be done. To sketch
one possible way I think the caching could work:

The interface would contain functions, say, SablotCache and
SablotUncache; both would get a URL as the argument. There would be a
list of URLs to be kept in cache at the moment. These two functions
would manipulate the list.

Now asking for a document, one would first check the cache for it;
producing an output document, one would add it to the cache if the URL
is on the cache list. If it already exists in the cache, it would be
replaced.

The advantage of this approach is that 1) it allows for cacheing output
documents, and 2) there's no need to introduce cacheing analogues of
SablotRunProcessor etc.

It would be easier, at present, to make the cache specific to the given
instance of the Processor (for some technical reasons). Which is what I
suggest.

Does this make sense? Any problems?

Tom


David Hedbor wrote:
> 
> Kaiserovi <[EMAIL PROTECTED]> writes:
> 
> > Andreas,
> >
> > this issue has appeared several times on the list and I agree that it
> > would improve the performance a lot in many cases. We simply haven't
> > reached a satisfactory conclusion as to how to deal with the cache. One
> > problem is that Sablotron may often be just a component of a larger
> > system - wouldn't it be reasonable to provide hooks to let the system do
> > the caching? If not, should the cache be processor-instance-specific or
> > global? Should caching be transparent for the user or rather performed
> > on demand?
> 
> My idea on how the caching would work is having a globally-usable
> "object" - ie some stylesheet compile mechanism. ie:
> 
> StyleSheet s = compile_stylesheet(...);
> 
> Then you could use that stylesheet (which is the internal binary
> representation) from any thread / processor since it would be used in
> a read-only fashion. With a construction like this, the main part of
> the caching is in the hand of the user (ie they have to keep track of
> what stylesheet is which and if the cache is up-to-date). It would
> give greater flexibility and would probably be easier to implement
> than a full-scale caching (using a hash of the stylesheet data or what
> not).
> 
> I personally think this level of control / complexity is rather good.
> 
> > I feel that it would be best to discuss these questions on this list
> > first before we start to implement the cache. Any ideas will be most
> > appreciated.
> >
> > Tom Kaiser
> >
> > Andreas Jung wrote:
> > >
> > > Dear all,
> > >
> > > when we are talking about Sablotron performance we should
> > > take a look at the most important feature for getting more
> > > performance - that is caching. We need a possibility
> > > to parse a XSLT sheet only once when processing several
> > > XML files with the same sheet. We are using XSLT sheets
> > > with a size of about 350 KB and they take pretty long
> > > very time to reparse - especially when you need to process
> > > about 5000 files day by day with the same style sheet !
> > >
> > > Cheers
> > > Andreas
> 
> --
> [ Below is a random fortune, which is unrelated to the above message. ]
> Down with categorical imperative!



Reply via email to