On 08 Mar 2004, Stephen Farquhar wrote: > I am trying to get my head around XSL, but the process is slow, having > no background in either XML or regexps. > > I want to extract the daily political cartoon out of this page: > http://search.csmonitor.com/commentary/index.html >
Use the extract-images template and simply add this url inclusion pattern: .*cartoon.* <site> <name>CSM | Today's Cartoon<</name> <uri>http://search.csmonitor.com/commentary/index.html</uri> <uriPatterns> <include>.*cartoon.*</include> </uriPatterns> <images includeAltText="no"> <embedded alternate="yes" bpp="8" maxHeight="160" maxWidth="160"/> </images> <transform uriPattern=".*"> <xslt href="extract-images.xsl"/> </transform> </site> Alternatively, this should also fetch what you want: <site> <name>CSM | Today's Cartoon</name> <uri>http://search.csmonitor.com/commentary/index.html</uri> <images includeAltText="no"> <embedded alternate="yes" bpp="8" maxHeight="160" maxWidth="160"/> </images> <transform uriPattern=".*"> <xslt> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:import href="jpluck.xsl"/> <xsl:template match="/"> <html> <head> <xsl:copy-of select="//head/title"/> </head> <body> <xsl:copy-of select="/html/body/table[3]//tr[2]/td[6]/table//tr/td/div/span/b"/> <xsl:copy-of select="/html/body/table[3]//tr[2]/td[6]/table//tr/td/div[2]/img"/> </body> </html> </xsl:template> </xsl:stylesheet> </xslt> </transform> </site> In either case, you may need to modify the image attributes to suit your device. s h e h u _______________________________________________ plucker-list mailing list [EMAIL PROTECTED] http://lists.rubberchicken.org/mailman/listinfo/plucker-list

