Fetcher2 Reduce Phase Question

2008-04-11 Thread Sandeep Tata
Hi Folks, I was just wondering what computation really happens in the reduce phase for Fetcher2 ? I know that it is implemented as a MapRunnable -- but I see no explicit reducer being set for the job. Is the identity reducer being used ? Why can't we simply use job.setNumReduceTasks(0) ? Wouldn't

Re: Fetcher2 Reduce Phase Question

2008-04-11 Thread Andrzej Bialecki
Sandeep Tata wrote: Hi Folks, I was just wondering what computation really happens in the reduce phase for Fetcher2 ? If Fetcher was running in the parsing mode, then in the reduce phase Outlinks are separated from Parse output and stored in crawl_parse, and other data in parse_text and pars

Keywords in documents

2008-04-11 Thread Amit Kumar Verma
Hi All ! Is there a way that we can extract keywords out of a document which can then be fed into LuceneSummarizer to obtain summary of a text ? Is there any semantic summarizer incorporated in nutch ? Amit Kumar Verma Infosys Technologies Limited CAUTION - Disclaimer *

Re: Keywords in documents

2008-04-11 Thread ogjunk-nutch
Hi Amit, There is no semantic summarizer (What exactly would it do? Can you provide an example?). There is a more or less "standard" snippet/highlighter - a lot like what you see on Google's search results, for example. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - O

[jira] Created: (NUTCH-628) Host database to keep track of host-level information

2008-04-11 Thread Otis Gospodnetic (JIRA)
Host database to keep track of host-level information - Key: NUTCH-628 URL: https://issues.apache.org/jira/browse/NUTCH-628 Project: Nutch Issue Type: New Feature Components: fetc