[
http://issues.apache.org/jira/browse/NUTCH-299?page=comments#action_12414643 ]
Stefan Neufeind commented on NUTCH-299:
---
Could you briefly explain what it does? Extract meta-data and index the comment
as content of that page? Or does it also follow
[
http://issues.apache.org/jira/browse/NUTCH-298?page=comments#action_12414647 ]
Stefan Neufeind commented on NUTCH-298:
---
Is the description-line of this bug correct? I've been indexing pages without
robots.txt, and I just checked that those hosts
[
http://issues.apache.org/jira/browse/NUTCH-299?page=comments#action_12414648 ]
Hasan Diwan commented on NUTCH-299:
---
Extracts and indexes meta-data. Doesn't follow the URL to the tracker. I would
add that if I have the time, or maybe someone else can.
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ]
Stefan Groschupf updated NUTCH-298:
---
Summary: if a 404 for a robots.txt is returned a NPE is thrown (was: if a
404 for a robots.txt is returned no page is fetched at all from the host)
[
http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414653 ]
Stefan Neufeind commented on NUTCH-294:
---
I'm not sure. On a quick run I wasn't able to get the
clustering-carrot2-plugin to work - though I thought I simply need to
Stefan Groschupf wrote:
a interesting tool:
http://tool.motoricerca.info/spam-detector/
Do you have good/bad experience with that tool? The idea to have
someething like this as a nutch-module (dropping pages or ranking them
very low) might come up :-)
From the FAQ I read that the author is a
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Chris A. Mattmann resolved NUTCH-258:
-
Resolution: Won't Fix
The use of LOG.severe in the fetcher indicates an unrecoverable error: thus,
this issue is not a bug, and in fact
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ]
Chris A. Mattmann closed NUTCH-258:
---
Won't fix: issue describes intended behavior of system (fetcher component).
Once Nutch logs a SEVERE log item, Nutch fails forevermore
The idea to have
someething like this as a nutch-module (dropping pages or ranking them
very low) might come up :-)
This will be a very long way.
I collect some thoughts and a list of web spam related papers in my
blog.
http://www.find23.net/Web-Site/blog/521BA1CD-14C4-4E84-A072-
Hi,
What exactly does this plugin do? I haven't seen it mentioned and the
README.txt doesn't really describe it.
Thanks,
Otis
- Original Message
From: [EMAIL PROTECTED]
To: nutch-commits@lucene.apache.org
Sent: Sunday, June 4, 2006 3:44:23 PM
Subject: [Nutch-cvs] svn commit: r411594
10 matches
Mail list logo