Re: crawl problems (a bug/patch)

Earl Cahill Thu, 20 Oct 2005 13:25:34 -0700


--- Sébastien LE CALLONNEC <[EMAIL PROTECTED]> wrote:


> Hi Earl,
> 
> Please, see my responses below.
> 
> 
> --- Earl Cahill <[EMAIL PROTECTED]> wrote:
> 
> 
> As you probably saw in the OutlinkExtractor class,
> the links are
> extracted with a Regexp.  I'm no expert in the
> matter, but that will
> certainly answer your questions below...
> 
> > So, three open questions
> > 
> > 1.  Why doesn't my link (<a
> > href=/sitemap.html>browse</a>) get parsed?
> 
> Because it doesn't match the aforementioned regexp.
> 
> > 2.  Why does my style get followed?
> 
> Because it matches the regexp.
> 
> > 3.  Where do I look for a list of all the failed
> > links?
> 
> I don't think there is any.
> 
> I have just created the issue in JIRA:
> http://issues.apache.org/jira/browse/NUTCH-119
> 
> 
> Regards,
> Sébastien.
> 
> 
> 
> 
>       
> 
>       
>               
>
___________________________________________________________________________
> 
> Appel audio GRATUIT partout dans le monde avec le
> nouveau Yahoo! Messenger 
> Téléchargez cette version sur
> http://fr.messenger.yahoo.com
> 



                
__________________________________ 
Yahoo! FareChase: Search multiple travel sites in one click.
http://farechase.yahoo.com

Re: crawl problems (a bug/patch)

Reply via email to