Tom,
on caching parsed trees ...
> This does NOT require publishing the Tree object which would be bad
> indeed. The caller doesn't need to access the trees directly, so passing
> a void* (and casting it to Tree* in Sablotron) would do. On the other
> hand, the internal cache gives better control: in case the caller passes
> a bad handle, we could report an error rather than crash. Does this
> advantage pay off?
I wouldn't consider this as a strong argument - obviously if you get more
control over something, you can screw up more. It would naturally be
caller's responsibility to handle parsed trees correctly and his shame
otherwise. And passing bad tree to sablotron doesn't necessarily always have
to crash it - some simple checks on first few bytes of passed trees might
detect 99% of problems.
> 2) Extra URI scheme. Say there's a URI scheme 'parsed:' for access to
> the pre-parsed trees. A disadvantage of this is that it can hardly be
> used inside a stylesheet (in a <xsl:include> or document()). Note that
> these includes are arguably among the main candidates for parsing.
After rethinking this, a "parsed:" URI scheme doesn't make any sense - these
URIs would be written statically inside XSL files, and you can never tell
that e.g. particular include document would be always available parsed
(definitely not the first time). Handling over parsed trees is an issue of
caller/sablotron interface, so the information about parsed trees should be
contained in parameters and returns of sablotron calls. e.g. an alternative
SablotProcess call should be provided, where for every current parameter
there would be a parameter flag stating whether the parameter (be it XSL
template, main XML data or name argument) is already parsed or not.
Sablotron would then return a structure with parsed version of all trees
involved. Memory management would obviously be tough (e.g. caller would have
to dispose of all trees he doesn't want to store), but worse it doesn't
answer the issue of parsed import/include/document() trees. Hmmm.
> 3) IDEA. What about the following? Let's say Sablotron does keep a cache
> itself, but the user says explicitly what to put in and when to dispose
> of something. (So it's not much of a cache really.) This works fine with
> the includes and does not waste memory on unnecessary trees. The API
> functions for this needn't be as complicated as Honza suggests. You
> could for instance pass a list of URIs to be cached. If the result URI
> is on the list, no serialization is done and you can simply pass the URI
> on the next call to Sablotron which will find it in the cache.
Providing a "management" access to Sablotron's own cache in a set of
parallel functions is an interesting idea. You would probably have to have
functions listing the cache content, memory sizes and access statistics and
an delete function in addition to "proactive" caching instructions (a list
of URLs to cache etc), so that the caller may inteligently prune Sablotron's
cache in addition to his own if the memory is short. This actually leads to
an ideal solution, where both Sablotron and the caller share a common
caching module, which provides a full cache management/statistics functions.
This way both can make their decisions what to put in the cache and what to
remove and it doesn't spoil Sablotron's interface. This ideal solution,
however, assumes too much about the caller - that he chose to use the same
caching module. It still makes sense, while not optimal, if the caller
doesn't use the same cache, but uses the same module to control Sablotron's
cache in addition to it's own (more code, but th same functionality).
Anyone knows such C/C++ cache library ?
> One issue to be addressed is caching the named buffers
> ('arg:something'). Here, the URI is sort of temporary (the same URI may
> refer to something completely different on the next call) which can
> introduce conflicts between the contents of the cache and the input from
> the caller. Do you get my meaning?
arg: URIs are indeed not good cache handles, unless the caller explicitely
considers them so. Perhaps there should be a flag with named arguments
whether to cache them or not - if yes, the caller would assume
responsibility for uniqueness of the name.
I see the problem of caller's control of parsed import/include/document()
XSL documents caching as a more difficult one here - the caller will likely
have no idea what documents might be referenced from the XSL template he's
handling over to Sablotron. While my suggestion above provides some solution
to this situation (Sablotron would simply make it's own decision to put it
into the common cache (e.g. in case the "parent" XSL was cached) and the
caller, while not specifically aware of this cached object, could at least
prune it based on name/type/number of cache accesses or time of last cache
access), this is not exactly the "full control" by the caller. Hmmm ...
Honza
>
> 4) Steve's extensions. In what way would you define the extension? Would
> it be OK to pass any tag in a particular namespace, with attributes, to
> a handler you register? That would be easy. Passing the subtree is
> obviously harder.
>
> Tom