[jira] Commented: (NUTCH-299) Bittorrent Parser

2006-06-04 Thread Stefan Neufeind (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-299?page=comments#action_12414643 ] Stefan Neufeind commented on NUTCH-299: --- Could you briefly explain what it does? Extract meta-data and index the comment as content of that page? Or does it also follow

[jira] Commented: (NUTCH-298) if a 404 for a robots.txt is returned no page is fetched at all from the host

2006-06-04 Thread Stefan Neufeind (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-298?page=comments#action_12414647 ] Stefan Neufeind commented on NUTCH-298: --- Is the description-line of this bug correct? I've been indexing pages without robots.txt, and I just checked that those hosts

[jira] Commented: (NUTCH-299) Bittorrent Parser

2006-06-04 Thread Hasan Diwan (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-299?page=comments#action_12414648 ] Hasan Diwan commented on NUTCH-299: --- Extracts and indexes meta-data. Doesn't follow the URL to the tracker. I would add that if I have the time, or maybe someone else can.

[jira] Updated: (NUTCH-298) if a 404 for a robots.txt is returned a NPE is thrown

2006-06-04 Thread Stefan Groschupf (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-298?page=all ] Stefan Groschupf updated NUTCH-298: --- Summary: if a 404 for a robots.txt is returned a NPE is thrown (was: if a 404 for a robots.txt is returned no page is fetched at all from the host)

[jira] Commented: (NUTCH-294) Topic-maps of related searchwords

2006-06-04 Thread Stefan Neufeind (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-294?page=comments#action_12414653 ] Stefan Neufeind commented on NUTCH-294: --- I'm not sure. On a quick run I wasn't able to get the clustering-carrot2-plugin to work - though I thought I simply need to

Re: search engine spam detector

2006-06-04 Thread Stefan Neufeind
Stefan Groschupf wrote: a interesting tool: http://tool.motoricerca.info/spam-detector/ Do you have good/bad experience with that tool? The idea to have someething like this as a nutch-module (dropping pages or ranking them very low) might come up :-) From the FAQ I read that the author is a

[jira] Resolved: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann resolved NUTCH-258: - Resolution: Won't Fix The use of LOG.severe in the fetcher indicates an unrecoverable error: thus, this issue is not a bug, and in fact

[jira] Closed: (NUTCH-258) Once Nutch logs a SEVERE log item, Nutch fails forevermore

2006-06-04 Thread Chris A. Mattmann (JIRA)
[ http://issues.apache.org/jira/browse/NUTCH-258?page=all ] Chris A. Mattmann closed NUTCH-258: --- Won't fix: issue describes intended behavior of system (fetcher component). Once Nutch logs a SEVERE log item, Nutch fails forevermore

Re: search engine spam detector

2006-06-04 Thread Stefan Groschupf
The idea to have someething like this as a nutch-module (dropping pages or ranking them very low) might come up :-) This will be a very long way. I collect some thoughts and a list of web spam related papers in my blog. http://www.find23.net/Web-Site/blog/521BA1CD-14C4-4E84-A072-

Re: [Nutch-cvs] svn commit: r411594 - /lucene/nutch/trunk/contrib/web2/plugins/build.xml

2006-06-04 Thread ogjunk-nutch
Hi, What exactly does this plugin do? I haven't seen it mentioned and the README.txt doesn't really describe it. Thanks, Otis - Original Message From: [EMAIL PROTECTED] To: nutch-commits@lucene.apache.org Sent: Sunday, June 4, 2006 3:44:23 PM Subject: [Nutch-cvs] svn commit: r411594