Re: crawl problems (a bug/patch)

Sébastien LE CALLONNEC Thu, 20 Oct 2005 12:33:17 -0700

Hi Earl,

Please, see my responses below.

--- Earl Cahill <[EMAIL PROTECTED]> wrote:

As you probably saw in the OutlinkExtractor class, the links are
extracted with a Regexp.  I'm no expert in the matter, but that will
certainly answer your questions below...

> So, three open questions
> 
> 1.  Why doesn't my link (<a
> href=/sitemap.html>browse</a>) get parsed?

Because it doesn't match the aforementioned regexp.

> 2.  Why does my style get followed?

Because it matches the regexp.

> 3.  Where do I look for a list of all the failed
> links?

I don't think there is any.

I have just created the issue in JIRA:
http://issues.apache.org/jira/browse/NUTCH-119

Regards,
Sébastien.

___________________________________________________________________________ 
Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger 
Téléchargez cette version sur http://fr.messenger.yahoo.com

Re: crawl problems (a bug/patch)

Reply via email to