Sitescooper (3.0.1) seems to be inappropriately removing internal navigation links. For example, try scooping with this site file: URL: http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm Levels: 1 I used these parameters: sitescooper.pl -dump -html -refresh -noheaders -nofooters -site /tmp/foo.site >/tmp/foo.html You wind up with an HTML file that starts with a nice dictionary of anchors: <UL> <LI><A href="#0__HASH__causes">What Causes Bleeding in the Digestive Tract?</a> <LI><A href="#0__HASH__recognized">How Is Bleeding in the Digestive Tract Recognized?</a> <LI><A href="#0__HASH__diagnosed">How Is Bleeding in the Digestive Tract Diagnosed?</a> <LI><A href="#0__HASH__treated">How Is Bleeding in the Digestive Tract Treated?</a> </UL> (I trust you'll all excuse the grisly example material -- I'm building e-books of all the articles on this site :-). But the <A NAME=""> anchors referred to are not in the scooped HTML. How can I get them back?! Sitescooper generally takes out too much in the way of anchors. For example, if the site file listed above is changed to URL: http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm Levels: 1 StoryPostProcess: { s|^|<A HREF="http://www.niddk.nih.gov/">National Digestive Diseases Information Clearinghouse</a>|s; } I'd expect to find a link in the generated HTML. I *want* that link there! But no! Instead, sitescooper converts that to an underline -- *bad* Sitescooper! Bill _______________________________________________ Sitescooper-talk mailing list [EMAIL PROTECTED] http://lists.sourceforge.net/lists/listinfo/sitescooper-talk
