RE: Greetings, and question
On 4 Apr 2002 at 19:36, Vadim Gritsenko wrote: Test it, and continue from this to next step: map:match pattern=yahoo map:generate type=html src=http://www.yahoo.com; map:parameter name=xpath value=/html/ /map:generate map:serialize type=xml/ /map:match Test it, see the result, go further. Say, use xpath value=/html/body/table. Ah. Should have thought of changing the serializer to xml... this showed me where there were problems in the source document, even after being tidied. Which in turn allowed me to fine-tune the xpath statement. I have it working now, somewhat anyway. (I suspect the page may simply be too complex to pull out *exactly* what I want, but I'm getting closer at least.) Thanks. - Brent Eades, Almonte, Ontario http://www.almonte.com http://www.bankofcanada.ca - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faqs.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Greetings, and question
Hello all, just getting up to speed on Cocoon, and finding it all quite fascinating. No doubt I'll have many more questions in the coming months. Anyway, today's question is on xpath. I'm trying to customize the HTML Generator 'scraper' example to extract bits from a given HTML page. Now, I have rudimentary knowledge of xpath syntax, but not enough I guess because I'm stuck. Here's a sample of the HTML to be scraped: --- table width=100% border=0 bis t=pr f=p020326.htm tr td nowrap align=right valign=top 26 Mar 2002 nbsp; /td td valign=top Financial Stability Forum holds its seventh meeting (a href=p020326.htmRead/a) /td /tr /bis bis t=pr f=p020318.htm tr td nowrap align=right valign=top [snip] /td /tr /bis Etc. -- The bis... stuff is used by another, non-XML process, but it seemed to me it should be a no-brainer to write an xpath argument that would pull out between the bis /bis and transform them. However, it isn't. Can anyone point me in the right general direction here? - Brent Eades, Almonte, Ontario http://www.almonte.com http://www.bankofcanada.ca - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faqs.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Greetings, and question
Brent, if you want to process the nodeset enclosed in the bis element, you could just try this: xsl:template match=/ xsl:element name=page xsl:apply-templates/ /xsl:element /xsl:template xsl:template match=bis xsl:element name=biselement xsl:copy-of select=./ /xsl:element /xsl:template But this is plain XSLT matching, nothing to do with XPATH. BTW, I've noticed your HTML is NOT XML-compliant, which will cause problems to XSLT: mind ! Best regards, - Luca Morandini GIS Consultant [EMAIL PROTECTED] http://utenti.tripod.it/lmorandini/index.html - -Original Message- From: Brent Eades [mailto:[EMAIL PROTECTED]] Sent: Thursday, April 04, 2002 8:15 PM To: [EMAIL PROTECTED] Subject: Greetings, and question Hello all, just getting up to speed on Cocoon, and finding it all quite fascinating. No doubt I'll have many more questions in the coming months. Anyway, today's question is on xpath. I'm trying to customize the HTML Generator 'scraper' example to extract bits from a given HTML page. Now, I have rudimentary knowledge of xpath syntax, but not enough I guess because I'm stuck. Here's a sample of the HTML to be scraped: --- table width=100% border=0 bis t=pr f=p020326.htm tr td nowrap align=right valign=top 26 Mar 2002 nbsp; /td td valign=top Financial Stability Forum holds its seventh meeting (a href=p020326.htmRead/a) /td /tr /bis bis t=pr f=p020318.htm tr td nowrap align=right valign=top [snip] /td /tr /bis Etc. -- The bis... stuff is used by another, non-XML process, but it seemed to me it should be a no-brainer to write an xpath argument that would pull out between the bis /bis and transform them. However, it isn't. Can anyone point me in the right general direction here? - Brent Eades, Almonte, Ontario http://www.almonte.com http://www.bankofcanada.ca - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faqs.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faqs.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Greetings, and question
On 4 Apr 2002 at 21:43, Luca Morandini wrote: xsl:template match=bis xsl:element name=biselement xsl:copy-of select=./ /xsl:element /xsl:template But this is plain XSLT matching, nothing to do with XPATH. BTW, I've noticed your HTML is NOT XML-compliant, which will cause problems to XSLT: mind ! OK, I think I follow your drift. As for the HTML... yes, it's pretty rough in that respect. Not my code, mind you :) Thanks, I'll try that route. - Brent Eades, Almonte, Ontario http://www.almonte.com http://www.bankofcanada.ca - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faqs.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
RE: Greetings, and question
From: Brent Eades [mailto:[EMAIL PROTECTED]] On 4 Apr 2002 at 21:43, Luca Morandini wrote: xsl:template match=bis xsl:element name=biselement xsl:copy-of select=./ /xsl:element /xsl:template But this is plain XSLT matching, nothing to do with XPATH. BTW, I've noticed your HTML is NOT XML-compliant, which will cause problems to XSLT: mind ! HTML generator with the help of jTidy will fix this. Brent: Start with simple: map:match pattern=yahoo map:generate type=html src=http://www.yahoo.com/ map:serialize type=xml/ /map:match Test it, and continue from this to next step: map:match pattern=yahoo map:generate type=html src=http://www.yahoo.com; map:parameter name=xpath value=/html/ /map:generate map:serialize type=xml/ /map:match Test it, see the result, go further. Say, use xpath value=/html/body/table. Vadim OK, I think I follow your drift. As for the HTML... yes, it's pretty rough in that respect. Not my code, mind you :) Thanks, I'll try that route. - Brent Eades, Almonte, Ontario http://www.almonte.com http://www.bankofcanada.ca - Please check that your question has not already been answered in the FAQ before posting. http://xml.apache.org/cocoon/faqs.html To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]