Bill Janssen said: > > I think the best thing to do is to try to avoid using multiple patterns, > > esp. patterns that may show up inside the text. > > The reason I'm writing it as > StoryEnd: (<p>-<p>|<p>-=<p>|<!-- TextEnd -->) > is that there are several different story page formats being used > here, and I need to deal with all of the possibilities. So I don't > see how I can avoid this. And none of these are patterns that show up > in the text, so I'm not quite sure what you're suggesting? None of them turn up in the text? In that case I misunderstood what was happening -- so I'll take another look... (time passes) OK -- found it -- it was a bug in the HTML::Parser perl module -- it was not reporting the last piece of text on the line unless it ended in a newline, which the Yahoo Health stories did not. There's a workaround now checked in, so if you try the development package (building and uploading now) it should work OK. --j. _______________________________________________ Sitescooper-talk mailing list [EMAIL PROTECTED] http://lists.sourceforge.net/mailman/listinfo/sitescooper-talk
