Hi Folks,
I was just wondering what computation really happens in the reduce
phase for Fetcher2 ?
I know that it is implemented as a MapRunnable -- but I see no
explicit reducer being set for the job. Is the identity reducer being
used ? Why can't we simply use job.setNumReduceTasks(0) ?
Wouldn't
Sandeep Tata wrote:
Hi Folks,
I was just wondering what computation really happens in the reduce
phase for Fetcher2 ?
If Fetcher was running in the parsing mode, then in the reduce phase
Outlinks are separated from Parse output and stored in crawl_parse, and
other data in parse_text and pars
Hi All !
Is there a way that we can extract keywords out of a document which can then be
fed into LuceneSummarizer to obtain summary of a text ?
Is there any semantic summarizer incorporated in nutch ?
Amit Kumar Verma
Infosys Technologies Limited
CAUTION - Disclaimer *
Hi Amit,
There is no semantic summarizer (What exactly would it do? Can you provide an
example?).
There is a more or less "standard" snippet/highlighter - a lot like what you
see on Google's search results, for example.
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
- O
Host database to keep track of host-level information
-
Key: NUTCH-628
URL: https://issues.apache.org/jira/browse/NUTCH-628
Project: Nutch
Issue Type: New Feature
Components: fetc