Sitescooper (3.0.1) seems to be inappropriately removing internal
navigation links.  For example, try scooping with this site file:

  URL:  http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm
  Levels: 1

I used these parameters:

  sitescooper.pl -dump -html -refresh -noheaders -nofooters -site /tmp/foo.site 
>/tmp/foo.html

You wind up with an HTML file that starts with a nice dictionary of anchors:

  <UL>
    <LI><A href="#0__HASH__causes">What Causes Bleeding in the Digestive Tract?</a>
    <LI><A href="#0__HASH__recognized">How Is Bleeding in the Digestive Tract 
Recognized?</a>
    <LI><A href="#0__HASH__diagnosed">How Is Bleeding in the Digestive Tract 
Diagnosed?</a>
    <LI><A href="#0__HASH__treated">How Is Bleeding in the Digestive Tract Treated?</a>
  </UL>

(I trust you'll all excuse the grisly example material -- I'm building
e-books of all the articles on this site :-).

But the <A NAME=""> anchors referred to are not in the scooped HTML.
How can I get them back?!

Sitescooper generally takes out too much in the way of anchors.  For
example, if the site file listed above is changed to

  URL:  http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm
  Levels: 1
  StoryPostProcess: {
    s|^|<A HREF="http://www.niddk.nih.gov/">National Digestive Diseases Information 
Clearinghouse</a>|s;
  }

I'd expect to find a link in the generated HTML.  I *want* that link
there!  But no!  Instead, sitescooper converts that to an underline --
*bad* Sitescooper!

Bill



_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to