Bill Janssen said:

> > I think the best thing to do is to try to avoid using multiple patterns,
> > esp. patterns that may show up inside the text.
> 
> The reason I'm writing it as 
>   StoryEnd: (<p>-<p>|<p>-=<p>|<!-- TextEnd -->)
> is that there are several different story page formats being used
> here, and I need to deal with all of the possibilities.  So I don't
> see how I can avoid this.  And none of these are patterns that show up
> in the text, so I'm not quite sure what you're suggesting?

None of them turn up in the text?  In that case I misunderstood what was
happening -- so I'll take another look... (time passes)

OK -- found it -- it was a bug in the HTML::Parser perl module -- it was
not reporting the last piece of text on the line unless it ended in a
newline, which the Yahoo Health stories did not.

There's a workaround now checked in, so if you try the development package
(building and uploading now) it should work OK.

--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/mailman/listinfo/sitescooper-talk

Reply via email to