Hi Earl, Please, see my responses below.
--- Earl Cahill <[EMAIL PROTECTED]> wrote: As you probably saw in the OutlinkExtractor class, the links are extracted with a Regexp. I'm no expert in the matter, but that will certainly answer your questions below... > So, three open questions > > 1. Why doesn't my link (<a > href=/sitemap.html>browse</a>) get parsed? Because it doesn't match the aforementioned regexp. > 2. Why does my style get followed? Because it matches the regexp. > 3. Where do I look for a list of all the failed > links? I don't think there is any. I have just created the issue in JIRA: http://issues.apache.org/jira/browse/NUTCH-119 Regards, Sébastien. ___________________________________________________________________________ Appel audio GRATUIT partout dans le monde avec le nouveau Yahoo! Messenger Téléchargez cette version sur http://fr.messenger.yahoo.com
