I'm currently using a StoryEnd pattern of (<p>-<p>|<p>-=<p>|<!-- TextEnd -->)
on a story that contains the following text at the end:

  these hip protectors under your clothing. After you show, they can
  see that under your normal trousers pockets there is something.''<p>-=<p>On
  the Web:<p>World Health Organization about hip fractures:

What I *get*, in the scooped story, is

  these hip protectors under your clothing. After you show, they can
  see that under your normal trousers pockets there is


  </body></html>


  </body></html>

So: what happened to the "something.''" part of the story?  By the
way, this happens reliably to all the storiies scooped in this way.

And:  why are there two </body> tags in the output?

Bill


PS:  Here's the site file I was working on:

URL: http://health.yahoo.com/health/ap/
Name: Yahoo Health
Description: health news from Yahoo
AuthorName: Bill Janssen
AuthorEmail: [EMAIL PROTECTED]
Levels: 2
StoryURL: http://dailynews.yahoo.com/.*
ContentsStart: </b>\n</TD> </tr>\n</table>
ContentsEnd: </table>.*Important Disclaimers
StoryToPrintableSub: 
s|http://dailynews.yahoo.com/h/(.*)|http://dailynews.yahoo.com/htx/$1|
StoryStart: <!-- YNEWS:STORY -->
StoryEnd: (<p>-<p>|<p>-=<p>|<!-- TextEnd -->)

I dumped it with 3.0.1, using the command line:

sitescooper.pl -dump -mhtml -noheaders -nofooters -refresh -site \
/import/sitescooper/local-sites/yahoo_health.site -filename \
YahooHealth
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/mailman/listinfo/sitescooper-talk

Reply via email to