let me know if you'd rather have me post these questions to the list
It is much better to post them to the list, you get the benefit of other peoples eyes, sometimes this will mean a better solution, sometimes a faster response, sometimes it'll be me anyway. Furthermore it means the issues appear in the archives.
Of course, this last point is less important if you are able to write up this info as a doc.
If not, here are my questions:
Well here's my answer (I've CC'd to the users list for the above reasons, I've also reset the reply-to header to the user list).
[also note that now I've had time to think about it this is a little different from what I said on SKYPE]
- I want to process a legacy html file named mybad.html (just one for a start). In order to catch it I created the second pipeline in sitemap.xmap in my project directory.
<map:pipeline>
<map:match pattern="mybad.html">
<map:generate src="cocoon:/**.html"/>
<map:transform type="log">
<map:parameter name="logfile" value="mydebug.log"/>
<map:parameter name="append" value="false"/>
</map:transform>
<map:serialize type="xml"/>
</map:match>
</map:pipeline>
What I'm unclear about is
a) What is the correct pattern to use in the cocoon:/-pseudo-protocol.
It's exactly the same as any other URL, so you have "cocoon:" to indicate the protocol followed by the path to the resource you refer to.
The number of slashes after the protocol is significant, if you have a single slash (e.g. "cocoon:/myfile.xml") it means only look in the current sitemap file. Two slashes (e.g. "cocoon://myfile.xml") means start looking from the root sitemap.
How or where can I find the correct reference if I'm dealing with Forrest defaults like .html.
It's just the name of the file that you want. In this case the correct element is:
<map:generate src="cocoon:/mybad.html"/>
But beware if your HTML is not valid XML this will fail. To get around this use the JTidy generator, see the forrest.xmap and the cocoon docs for examples.
b) Am I correct that referencing another pipeline with coocon will divert the output of that pipeline to become the generator (or input) of my pipeline.
Yes, although the terminology is not quite right. The generator makes a request for the indicated resource and pipes the result into the pipeline.
If that is correct, how does matching fit into all of this or, putting it differently, which way does the data actually travel:
E.g. if my matcher handles requests for "mybad.html", is this what actually happens:
1. The broader selector **.html somewhere deep down in Forrest
processes the file mybad.html as it would any other html-file,
converts it to xhtml via jtidy.
2. The matcher for mybad.html comes into play and diverts the
xml-stream into my pipeline IF the requested file is mybad.html.
If not, the first pipeline delivers the xml-stream strait to
Forrest default processing.
No.
Only one pipeline will operate, this will be the first one discovered. The exception to this is when a pipeline is called internally using the cocoon: protocol. That is, you can execute multiple pipelines that were triggered by the cocoon: protocol, but only one that us any other protocol.
I think what you are trying to do is this:
<map:match pattern="mybad.xml">
<map:generate src="cocoon:/mybad.html" type="html"/>
<!-- removed log stuff, see beow -->
<map:transform src="..."/>
<map:transform src="{forrest:stylesheets}/html2document.xsl" />
<map:transform type="idgen" />
<map:serialize type="xml"/>
</map:match>Note your matcher is for XML not for HTML. What happens when the request is made is that Forrest will look for a match on html (in forrests sitemap.xmap), which makes a request for "cocoon://mybad.xml" which will be matched by the above in your project sitemap. You will do whatever transformation you need to do to strip old navigation and the like and serialise as XML. Forrest then processes this as normal (i.e. skins it).
Note this is based on the HTML processing found in forrest.xmap, I've just added a transformation step for you to manipulate your legacy HTML.
c) Using the log-transformer component that I found in my book
<map:transform type="log"> <map:parameter name="logfile" value="mydebug.log"/> <map:parameter name="append" value="false"/> </map:transform>
I was trying to log the intermediate result for debugging purposes into a file. Unfortunateley when calling my pipeline I got the error: 'Type 'log' does not exist for 'map:transform' at file:...'
You need to define the component in your sitemap. See <map:components> element in Cocoon docs.
Ross
