On Wed, 21 Mar 2001, Justin Mason wrote:

> 
> Jarl Friis said:
> 
> > In the URLProcessor.pm there are some lines
> >   if ($url =~ /[\/\&\?](\S{5,20}?)$/) {
> >     $self->{to_string_name} = $1;
> >   } else {
> >     $self->{to_string_name} = $url;
> >   }
> > they seem to give me problems with a story-link like
> > '/shareware/index.html' which in the example above means that tne
> > to_string_name becomes 'index.html' which seem to conflict with the
> > contents (level 1) url, hence the it stops because the queue seems empty.
> > Does someone know what to do about it?
> 
> Hmm...
> 
> This shouldn't happen, as the queue is actually keyed numerically.
> But anyway, I've checked in a fix for it so that sitescooper never
> uses index.html/cgi/shtml/whatever as the key...

The site is www.ing.dk, the story that made the error is gone, I'll let
you know if I encounter (and notices) it again; Even though I have tried
to reproduce the error on 
http://www.diku.dk/students/jarl/ing.dk.html
I haven't succeded, I have even copied the story and the story I lost
after the bad story, but now everything seem to work :-?

The included site-file contains the buf-url with the real url outcommented

The site is in danish, so you may not understand it. But the 2nd last
story has the title "Sharewarehjælp til Basic-programmering", and links to
http://www.ing.dk/shareware/index.html and that caused a problem actually
the problem was that I missed the last story. I am mad at myself I didn't
save the debug-info of the runs, sorry :-)

Thanks for an excelent program.

I guess I'll send you some danish news site files soon.

Jarl
# De Studerendes Vandreklub Kalender
# Author: Jarl Friis <[EMAIL PROTECTED]>

URL:            http://www.diku.dk/students/jarl/ing.dk.html
#URL:           www.ing.dk
ImageURL:       1
Name:           Ingeniøren
Levels:         2

AuthorName:     Jarl Friis
AuthorEmail:    [EMAIL PROTECTED]

Active: 1
ContentsIncludeStartPattern:    1
ContentsIncludeEndPattern:      1

ContentsStart:  <!-- Indholde Start -->

#This will include ShortNews
ContentsEnd:    </TD></TR></TABLE> <BR>

#This will NOT include the ShortNews:
#ContentsEnd:   <TR><TD COLSPAN="2"><IMG SRC="/ress/ramme/d.gif" WIDTH="2" HEIGHT="3" 
ALT=""></TD></TR></TABLE>

StoryStart:     <!-- .BeginEditable "trumpet" -->
StoryEnd:       <!-- .BeginEditable "hojre_spalte_nede_bund" -->

#StoryURL:       http://www.ing.dk.*
StorySkipURL:   mailto:.*

TableRender:    flatten
#StoryLifetime: 0
#ContentsCacheable: 0
#StoryCacheable:    0
#ContentsDiff: 0
#StoryDiff: 0

#ContentsHTMLPreProcess:        {
#        $_ =~ s,<p>,,sgi;
#        }

#StoryHTMLPreProcess:   {
#        $_ =~ s,<p>,,sgi;
#        }

#StoryPostProcess:      {
#        $_ =~ s,(<p>)+,,gsi;
#        }

Reply via email to