On 8/16/06, Tim Hannigan <[EMAIL PROTECTED]> wrote:
Solprovider - I've managed to use the aggregate files piece (http://solprovider.com/lenya/aggregatefiles) that you wrote months back and it's great.
Thanks. It is nice to have one's work appreciated.
[Summary: We tag documents and dynamically produce a report of documents with specific tags. Producing the report is processing-intensive and we would like to use caching to improve performance. Lenya-1.2.4.]
I'm looking for a way to cache the dump pipeline (so that we don't have to aggregate the entire site each time a newsAggregator doctype is accessed). I've looked into a few options and I'm looking for some advice on which to go forward with. One technique would be to somehow set a timed cache on just that pipeline; however it looks like expires cache (as per the Cocoon docs) is only available as of Cocoon 2.1.9 and my IT dept is set on using Cocoon 2.1.7 for now (also I'm not even sure that 2.1.9 is compatible with Lenya?). A second technique I've considered would be to use the File Generator (which uses cache very nicely) to bring in an xml site dump as a file; this file would have to be generated outside of this pipeline, and would be a precondition for the newsAggregator pipeline executing. This leads me to consider 2 sub-options: i) run a scheduled process that would call the $pubname/siteAggregator.xml url, then take the xml output and write it to a file in the publication's work directory. I'm not entirely sure how Cocoon's scheduler works, but I suppose I could have a shell script on a cron job that's doing a CURL. I'd love to do it internally in Cocoon if I could. ii) somehow leverage Lucene's site dump and use that instead. I haven't used Lucene yet, so I'm not really sure how to use it in this context. Am I correct to assume that Lucene has a cron job that generates a dump on a prescribed timeline?
This breaks into three functions: 1. Cache the results. 2. Use the cache if it exists. 3. Delete the cache on a schedule. The first two functions are built into publication-sitemap.xml in Lenya-1.2. It was disabled in 1.2.4 by adding "disabled" to the match of the pipeline. Another examples is at: http://solprovider.com/lenya/cache This handles the issues of not caching pages if the visitor is logged in, or if there is a query string. None of the expanded functionality matters in your case, but it shows the important lines from the standard publication-sitemap.xmap. See the "Check Cache" and "Create Cache" commented sections. map:read is easy. The WriteSourceTransformer is more complicated, and is documented at: http://cocoon.apache.org/2.1/userdocs/sourcewriting-transformer.html You may want to change the cache directory. You may need custom addSourceTags.xsl and removeSourceTags.xsl. Or maybe it will just work. --- #3 may require thought. I have not used Lenya's Scheduler; my few attempts did not work, and I did not put much effort into it. Maybe someone else can assist with it. #3 can be solved easily with a cron job that just deletes the files from the cache (assuming you are using a real operating system.) That should take almost 30 seconds for a shell programmer. If the files are deleted, then the "Check Cache" code fails, and "Create Cache" is called. solprovider --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]