Bill Janssen said:

> I'm currently using a StoryEnd pattern of (<p>-<p>|<p>-=<p>|<!-- TextEnd -->)
> on a story that contains the following text at the end:
> 
>   these hip protectors under your clothing. After you show, they can
>   see that under your normal trousers pockets there is something.''<p>-=<p>On
>   the Web:<p>World Health Organization about hip fractures:
> 
> What I *get*, in the scooped story, is
> 
>   these hip protectors under your clothing. After you show, they can
>   see that under your normal trousers pockets there is
>   </body></html>
>   </body></html>
> 
> So: what happened to the "something.''" part of the story?  By the
> way, this happens reliably to all the storiies scooped in this way.

This shouldn't be happening -- but it is.  It seems to be something that
Perl's regular expression code is doing.  I haven't found a way to avoid
it, apart from writing the patterns so they're more oriented towards
finding only one pattern in the page rather than several. :(

I think the best thing to do is to try to avoid using multiple patterns,
esp. patterns that may show up inside the text.

> And:  why are there two </body> tags in the output?

yep -- that's now fixed.  I've just checked in a stack of fixes for MHTML
mode, including a fix for that bug you reported at the weekend.

--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/mailman/listinfo/sitescooper-talk

Reply via email to