Simpler is better ! After some broken keyboard, I see the cocoon htmltransformer.. and this make me as "I saw an angel" ! :)
If you want to download and transform a large possibility of web pages (url with ?,& ; page with frameset, or no </img> (!)), you can do that : --- a sources.xsl : <escaped-html> <i:include parse="text" src="http://www.adress" /> </escaped-html> --- in sitemap.xmap * in : <map:components> <map:transformers default="xslt"> ADD : <map:transformer name="html" logger="sitemap.transformer.html" src="org.apache.cocoon.transformation.HTMLTransformer"> <!-- Tidy configuration file --> <jtidy-config>fallback://lenya/modules/fckeditor/config/jtidy.properties</jtidy-config> </map:transformer> * in : <map:pipelines> <map:pipeline type="noncaching"> <map:match pattern="XXXXXX"> ADD <map:generate src="test/sources.xml"/> <map:transform type="include"/> <map:transform type="html"> <map:parameter name="tags" value="escaped-html"/> </map:transform> And now... go to work for my boss ! :p) Have a good WE On Fri, 13 Mar 2009 11:32:12 +0100, Florent André <[email protected]> wrote: > Hi Lenya's friend > > On Thu, 20 Nov 2008 22:10:05 +0100, Andreas Hartmann <[email protected]> > wrote: >> Hi André, >> >> Florent André schrieb: >>> thanks for this pointer ! >>> >>> HtmlGenerator works like a charm ! >>> >>> But, I try to call this htmlgenerator in a xinclude... and it's don't >>> work >>> ! :( >> >> does it work with the IncludeTransformer? >> >> > http://cocoon.apache.org/2.1/apidocs/org/apache/cocoon/transformation/IncludeTransformer.html >> >> -- Andreas >> > > Thanks Andreas, it work with include... but just for "simple" www adress > (without ? and &). > > I solved the problem of ? with a "bidouille" (~= tricks) : > -------- prepareinclude.xsl : > * replace with a regex the ? by /post--parameter/ > * create <include > src="cocoon://module/webagent/retrivepipe/www/without/http/post--parameter/parameters > > --------- webagent's sitemap.xmap > * <map:match="retrivepipe/**/post-parameter/**/"> > * <map:generate src="http://{1}/post-parameter/{2} type="html"/> // call > to HTMLGenerator > * ... > * </map:match> > > > But I don't find any other solution for the & : > - this character was translate into & in my xslt, and htmlgenerator > don't do the & ==> & transformation... > > Do you have a suggestion ? > > > Have a good day > > > >>> >>> I try : >>> <xi:include href="cocoon:/retrive/web/adress/without/http://" >>> and >>> <xi:include href="cocoon://retrive/web/adress/without/http://" >>> >>> But none of this work. >>> >>> The log4j says : >>> * java.io.FileNotFoundException: >>> * xIncluded resource not found: file:/// >>> >>> The xinclude seem to search a file and not a pipeline... >>> >>> Thank you for any ideas. >>> >>> Notes : >>> -- this Xinclude is build in an xsl call during the module's sitemap >>> >>> -- in the module's sitemap, I have one pipeline with this match, but > it's >>> don't call : >>> <!-- patern = retrive/adress/web/without/http --> >>> <map:match pattern="retrive/**"> >>> <map:generate src="http://{1}" type="html"/> >>> <map:serialize type="xml"/> >>> </map:match> >>> >>> >>> >>> On Thu, 20 Nov 2008 11:30:05 +0100, Andreas Hartmann > <[email protected]> >>> wrote: >>>> Hi André, >>>> >>>> Florent André schrieb: >>>>> I would like to parse localy downloaded (via <xi:include > parse="text">) >>>>> html pages. >>>> I'm afraid this approach will only cause a lot of headache. I'd rather >>>> recommend to use the HTMLGenerator [1] to parse the files. In your >>>> XInclude statement you can just call the HTMLGenerator pipeline using >>>> the cocoon:/ protocol. >>>> >>>> [1] http://cocoon.apache.org/2.1/userdocs/html-generator.html >>>> >>>> HTH, >>>> >>>> -- Andreas >>>> >>>>> After download, <xi:include> give me an "escape" html file. >>>>> >>>>> I suppress <!Doctype ... > with regex, but now the unescape > transformer >>>>> throw this error : >>>>> " Caused by: org.xml.sax.SAXParseException: The entity "nbsp" was >>>>> referenced, but not declared." >>>>> >>>>> I found this on internet : "To allow the use off   in you >>>> stylesheet, >>>>> you have to declare it first : <!DOCTYPE xsl:stylesheet [<!ENTITY > nbsp >>>>> " ">]> " >>>>> >>>>> How I can add this declaration in the java unescape transformer ? >>>>> >>>>> I think that I can remove all   with a regex, but I would like to >>>> more >>>>> understand how work java transformer. >>>>> >>>>> Thanks and have a good day. >>>>> >>>>> Florent >>>> >>>> -- >>>> Andreas Hartmann, CTO >>>> BeCompany GmbH >>>> http://www.becompany.ch >>>> Tel.: +41 (0) 43 818 57 01 >>>> >>>> >>>> --------------------------------------------------------------------- >>>> To unsubscribe, e-mail: [email protected] >>>> For additional commands, e-mail: [email protected] > > --------------------------------------------------------------------- > To unsubscribe, e-mail: [email protected] > For additional commands, e-mail: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
