On 08 Mar 2004, Stephen Farquhar wrote:

> I am trying to get my head around XSL, but the process is slow, having
> no background in either XML or regexps.
>
> I want to extract the daily political cartoon out of this page:
> http://search.csmonitor.com/commentary/index.html
>

Use the extract-images template and simply add this url inclusion pattern: 
.*cartoon.*

<site>
<name>CSM | Today's Cartoon<</name>
<uri>http://search.csmonitor.com/commentary/index.html</uri>
<uriPatterns>
<include>.*cartoon.*</include>
</uriPatterns>
<images includeAltText="no">
<embedded alternate="yes" bpp="8" maxHeight="160" maxWidth="160"/>
</images>
<transform uriPattern=".*">
<xslt href="extract-images.xsl"/>
</transform>
</site>

Alternatively, this should also fetch what you want:

<site>
<name>CSM | Today's Cartoon</name>
<uri>http://search.csmonitor.com/commentary/index.html</uri>
<images includeAltText="no">
<embedded alternate="yes" bpp="8" maxHeight="160" maxWidth="160"/>
</images>
<transform uriPattern=".*">
<xslt>
<xsl:stylesheet version="1.0" 
xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
<xsl:import href="jpluck.xsl"/>
<xsl:template match="/">
<html>
<head>
<xsl:copy-of select="//head/title"/>
</head>
<body>
<xsl:copy-of 
select="/html/body/table[3]//tr[2]/td[6]/table//tr/td/div/span/b"/>
<xsl:copy-of 
select="/html/body/table[3]//tr[2]/td[6]/table//tr/td/div[2]/img"/>
</body>
</html>
</xsl:template>
</xsl:stylesheet>
</xslt>
</transform>
</site>

In either case, you may need to modify the image attributes to suit your 
device.

s h e h u
_______________________________________________
plucker-list mailing list
[EMAIL PROTECTED]
http://lists.rubberchicken.org/mailman/listinfo/plucker-list

Reply via email to