RE: Greetings, and question

2002-04-05 Thread Brent Eades

On 4 Apr 2002 at 19:36, Vadim Gritsenko wrote:

 Test it, and continue from this to next step:
 
 map:match pattern=yahoo
  map:generate type=html src=http://www.yahoo.com;
   map:parameter name=xpath value=/html/
  /map:generate
  map:serialize type=xml/
 /map:match
 
 Test it, see the result, go further. Say, use xpath
 value=/html/body/table.

Ah. Should have thought of changing the serializer to xml... this 
showed me where there were problems in the source document, even 
after being tidied. Which in turn allowed me to fine-tune the xpath 
statement. I have it working now, somewhat anyway. (I suspect the 
page may simply be too complex to pull out *exactly* what I want, but 
I'm getting closer at least.)

Thanks.

-
Brent Eades, Almonte, Ontario
 http://www.almonte.com
 http://www.bankofcanada.ca


-
Please check that your question has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faqs.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




Greetings, and question

2002-04-04 Thread Brent Eades

Hello all, just getting up to speed on Cocoon, and finding it all 
quite fascinating. No doubt I'll have many more questions in the 
coming months.

Anyway, today's question is on xpath. I'm trying to customize the 
HTML Generator 'scraper' example to extract bits from a given HTML 
page. Now, I have rudimentary knowledge of xpath syntax, but not 
enough I guess because I'm stuck.

Here's a sample of the HTML to be scraped:

---
table width=100% border=0
bis t=pr f=p020326.htm
tr
td nowrap align=right valign=top
26 Mar 2002 nbsp;
/td
td valign=top
Financial Stability Forum holds its seventh meeting 
(a href=p020326.htmRead/a)
/td
/tr
/bis

bis t=pr f=p020318.htm
tr
td nowrap align=right valign=top
[snip]
/td
/tr
/bis

Etc.
--

The bis... stuff is used by another, non-XML process, but it seemed 
to me it should be a no-brainer to write an xpath argument that would 
pull out between the bis /bis and transform them.

However, it isn't. Can anyone point me in the right general direction 
here?

-
Brent Eades, Almonte, Ontario
 http://www.almonte.com
 http://www.bankofcanada.ca


-
Please check that your question has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faqs.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Greetings, and question

2002-04-04 Thread Luca Morandini

Brent,

if you want to process the nodeset enclosed in the bis element, you could
just try this:

xsl:template match=/
xsl:element name=page
xsl:apply-templates/
/xsl:element
/xsl:template

xsl:template match=bis
xsl:element name=biselement
xsl:copy-of select=./
/xsl:element
/xsl:template

But this is plain XSLT matching, nothing to do with XPATH.

BTW, I've noticed your HTML is NOT XML-compliant, which will cause problems
to XSLT: mind !

Best regards,

-
   Luca Morandini
   GIS Consultant
  [EMAIL PROTECTED]
http://utenti.tripod.it/lmorandini/index.html
-


 -Original Message-
 From: Brent Eades [mailto:[EMAIL PROTECTED]]
 Sent: Thursday, April 04, 2002 8:15 PM
 To: [EMAIL PROTECTED]
 Subject: Greetings, and question


 Hello all, just getting up to speed on Cocoon, and finding it all
 quite fascinating. No doubt I'll have many more questions in the
 coming months.

 Anyway, today's question is on xpath. I'm trying to customize the
 HTML Generator 'scraper' example to extract bits from a given HTML
 page. Now, I have rudimentary knowledge of xpath syntax, but not
 enough I guess because I'm stuck.

 Here's a sample of the HTML to be scraped:

 ---
 table width=100% border=0
   bis t=pr f=p020326.htm
 tr
   td nowrap align=right valign=top
 26 Mar 2002 nbsp;
 /td
   td valign=top
 Financial Stability Forum holds its seventh meeting
 (a href=p020326.htmRead/a)
 /td
 /tr
 /bis

 bis t=pr f=p020318.htm
 tr
   td nowrap align=right valign=top
   [snip]
 /td
 /tr
 /bis

 Etc.
 --

 The bis... stuff is used by another, non-XML process, but it seemed
 to me it should be a no-brainer to write an xpath argument that would
 pull out between the bis /bis and transform them.

 However, it isn't. Can anyone point me in the right general direction
 here?

 -
 Brent Eades, Almonte, Ontario
  http://www.almonte.com
  http://www.bankofcanada.ca


 -
 Please check that your question has not already been answered in the
 FAQ before posting. http://xml.apache.org/cocoon/faqs.html

 To unsubscribe, e-mail: [EMAIL PROTECTED]
 For additional commands, e-mail: [EMAIL PROTECTED]



-
Please check that your question has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faqs.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Greetings, and question

2002-04-04 Thread Brent Eades

On 4 Apr 2002 at 21:43, Luca Morandini wrote:

 xsl:template match=bis
  xsl:element name=biselement
   xsl:copy-of select=./
  /xsl:element
 /xsl:template
 
 But this is plain XSLT matching, nothing to do with XPATH.
 
 BTW, I've noticed your HTML is NOT XML-compliant, which will cause
 problems to XSLT: mind !

OK, I think I follow your drift. As for the HTML... yes, it's pretty 
rough in that respect. Not my code, mind you :)

Thanks, I'll try that route.

-
Brent Eades, Almonte, Ontario
 http://www.almonte.com
 http://www.bankofcanada.ca


-
Please check that your question has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faqs.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]




RE: Greetings, and question

2002-04-04 Thread Vadim Gritsenko

 From: Brent Eades [mailto:[EMAIL PROTECTED]]
 
 On 4 Apr 2002 at 21:43, Luca Morandini wrote:
 
  xsl:template match=bis
   xsl:element name=biselement
xsl:copy-of select=./
   /xsl:element
  /xsl:template
 
  But this is plain XSLT matching, nothing to do with XPATH.
 
  BTW, I've noticed your HTML is NOT XML-compliant, which will cause
  problems to XSLT: mind !

HTML generator with the help of jTidy will fix this.

Brent:

Start with simple:

map:match pattern=yahoo
 map:generate type=html src=http://www.yahoo.com/
 map:serialize type=xml/
/map:match

Test it, and continue from this to next step:

map:match pattern=yahoo
 map:generate type=html src=http://www.yahoo.com;
  map:parameter name=xpath value=/html/
 /map:generate
 map:serialize type=xml/
/map:match

Test it, see the result, go further. Say, use xpath
value=/html/body/table.

Vadim
 
 OK, I think I follow your drift. As for the HTML... yes, it's pretty
 rough in that respect. Not my code, mind you :)
 
 Thanks, I'll try that route.
 
 -
 Brent Eades, Almonte, Ontario
  http://www.almonte.com
  http://www.bankofcanada.ca


-
Please check that your question has not already been answered in the
FAQ before posting. http://xml.apache.org/cocoon/faqs.html

To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]