There are examples of regexp matchers in the core sitemap. I'm pretty poor with regular expressions, if you don't know what to put in the pattern ask here, I'm sure there will be someone who can tell you how to match

**.html but not (**/menu-*.html or **/body-*.html or **/tabs-*.html)

(I think they are the only ones you need to avoid).


So this would be something like ^(?!tab-|menu-|body-).*.html$ and ^.*/(?!tab-|menu-|body-).*.html$ respectivly.

Unfortunatly jakarta-regexp (which is used inside cocoon) doesn't seem to support the negative lookahead (?!...) and gives me a 'RESyntaxException: Syntax error: Missing operand to closure'.

This already been reported on the regexp mailing list (See: http://permalink.gmane.org/gmane.comp.jakarta.regexp.user/168).

Too bad - jakarta-oro supports perl5 regexps.

I'll go hunting for a supported regexp and will report in later.


Since I promised an update:

A working regular expression (without negative lookahead) is the following:

^(([^t^m^b].*)|((t[^a].*)|(ta[^b].*)|(tab[^\-].*))|((m[^e].*)|(me[^n].*)|(men[^u].*)|(menu[^\-].*))|((b[^o].*)|(bo[^d].*)|(bod[^y].*)|(body[^\-].*)))\.html$

But then again jakarta-regexp leaves me standing in the cold with:

java.lang.StackOverflowError
at org.apache.regexp.RE.matchNodes(Unknown Source)
at org.apache.regexp.RE.matchNodes(Unknown Source)
...
at org.apache.cocoon.matching.AbstractRegexpMatcher.preparedMatch(AbstractRegexpMatcher.java:86)

Again jakarta-oro matches this without problems.

*sigh*

Torsten