Phew! getting through some back bug-reports at last.

Firstly, Kennis K and Mike Krell -- the bug cases you reported seem to be
working OK in the current dev version as far as I can tell.

Now, on to this one...

Bill Janssen said:

> Sitescooper (3.0.1) seems to be inappropriately removing internal
> navigation links.  For example, try scooping with this site file:
> 
>   URL:  http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm
> Levels: 1

> You wind up with an HTML file that starts with a nice dictionary of anchors:
> 
>   <UL>
> <LI><A href="#0__HASH__causes">What Causes Bleeding in the Digestive Tract?</a>
> <LI><A href="#0__HASH__recognized">How Is Bleeding in the Digestive Tract 
> Recognized?</a>
> <LI><A href="#0__HASH__diagnosed">How Is Bleeding in the Digestive Tract 
> Diagnosed?</a>
> <LI><A href="#0__HASH__treated">How Is Bleeding in the Digestive Tract Treated?
> </a>
> </UL>
> 
> (I trust you'll all excuse the grisly example material -- I'm building
> e-books of all the articles on this site :-).
> 
> But the <A NAME=""> anchors referred to are not in the scooped HTML.
> How can I get them back?!

I've spotted this -- what's up is that the A NAME anchors are inside
WIDTH=20% table items, which sitescooper strips as "sidebar tables".
add

  UseTableSmarts: 0

to the site file, and hey presto, it works OK!

BTW, in fact it may be worth adding "TableRender: flatten" to flatten out
the tables (although iSilo and Plucker more-or-less ignore table tags
anyway).

(I'm considering making TableRender: flatten the default anyway at this
stage.)

Next:

> Sitescooper generally takes out too much in the way of anchors.  For
> example, if the site file listed above is changed to
> 
>   URL:  http://www.niddk.nih.gov/health/digest/pubs/bleeding/bleeding.htm
> Levels: 1
>   StoryPostProcess: {
>     s|^|<A HREF="http://www.niddk.nih.gov/">National Digestive Diseases Informa
> tion 
> Clearinghouse</a>|s;
>   }
> 
> I'd expect to find a link in the generated HTML.  I *want* that link
> there!  But no!  Instead, sitescooper converts that to an underline --
> *bad* Sitescooper!

Good point.

Here's the trick -- use "href_external" instead of "href" and it'll work.
A little undocumented sitescooper magic there ;)

--j.

_______________________________________________
Sitescooper-talk mailing list
[EMAIL PROTECTED]
http://lists.sourceforge.net/lists/listinfo/sitescooper-talk

Reply via email to