Phew! getting through some back bug-reports at last.
Firstly, Kennis K and Mike Krell -- the bug cases you reported seem to be
working OK in the current dev version as far as I can tell.
Now, on to this one...
Bill Janssen said:
> Sitescooper (3.0.1) seems to be inappropriately removing internal
> navigation links. For example, try scooping with this site file:
>
> URL: http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm
> Levels: 1
> You wind up with an HTML file that starts with a nice dictionary of anchors:
>
> <UL>
> <LI><A href="#0__HASH__causes">What Causes Bleeding in the Digestive Tract?</a>
> <LI><A href="#0__HASH__recognized">How Is Bleeding in the Digestive Tract
> Recognized?</a>
> <LI><A href="#0__HASH__diagnosed">How Is Bleeding in the Digestive Tract
> Diagnosed?</a>
> <LI><A href="#0__HASH__treated">How Is Bleeding in the Digestive Tract Treated?
> </a>
> </UL>
>
> (I trust you'll all excuse the grisly example material -- I'm building
> e-books of all the articles on this site :-).
>
> But the <A NAME=""> anchors referred to are not in the scooped HTML.
> How can I get them back?!
I've spotted this -- what's up is that the A NAME anchors are inside
WIDTH=20% table items, which sitescooper strips as "sidebar tables".
add
UseTableSmarts: 0
to the site file, and hey presto, it works OK!
BTW, in fact it may be worth adding "TableRender: flatten" to flatten out
the tables (although iSilo and Plucker more-or-less ignore table tags
anyway).
(I'm considering making TableRender: flatten the default anyway at this
stage.)
Next:
> Sitescooper generally takes out too much in the way of anchors. For
> example, if the site file listed above is changed to
>
> URL: http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm
> Levels: 1
> StoryPostProcess: {
> s|^|<A HREF="http://www.niddk.nih.gov/">National Digestive Diseases Informa
> tion
> Clearinghouse</a>|s;
> }
>
> I'd expect to find a link in the generated HTML. I *want* that link
> there! But no! Instead, sitescooper converts that to an underline --
> *bad* Sitescooper!
Good point.
Here's the trick -- use "href_external" instead of "href" and it'll work.
A little undocumented sitescooper magic there ;)
--j.
_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk