Hugo Pinto wrote:
Hello,
I am using Nutch for mirroring, rather than crawling and indexing.
I need to access directly the cached data in my Nutch index, but I am
unable to find an easy way to do so.
I browsed the documentation(wiki, javadocs, and skimmed the code), but
found no straightforward
Hello,
I am looking for a way to search for multiple indexes from one webapp
and found some code. I can allways make one webapp = one website but
what if it grows?
Is it possible to make this code work:
in search.jsp
/*
Comment this original line of code and use code below.
Hi,
to enable language identifier :
in the conf/nutch-site.xml file add the plugin language-identifier in the
nameplugin.includes/name
Date: Thu, 5 Nov 2009 01:32:40 -0800
From: saurabhsuman...@rediff.com
To: nutch-user@lucene.apache.org
Subject: How to enable nutch language Identifier
i tried this once but before i knew it my log file was approaching a gig
within an hour or so!
I suggest maybe turning the debug logs on for hadoop before you do the
next crawl... you can do this by editing log4j.properties
and change the rootLogger from INFO to DEBUG
On Thu, Nov 5, 2009 at
hi there,
we tried a few things around this; one suggestion was to run on it on a
local machine; so i pulled one of our decent servers and got to work...
but surprisingly we got the same error on a local machine!
so it seems the hardware (VPS/Local) wasnt the culprit.. probably the
data, or the