Re: Direct Access to Cached Data

2009-11-05 Thread Andrzej Bialecki
Hugo Pinto wrote: Hello, I am using Nutch for mirroring, rather than crawling and indexing. I need to access directly the cached data in my Nutch index, but I am unable to find an easy way to do so. I browsed the documentation(wiki, javadocs, and skimmed the code), but found no straightforward

Multiple index from webapp

2009-11-05 Thread Bartosz Gadzimski
Hello, I am looking for a way to search for multiple indexes from one webapp and found some code. I can allways make one webapp = one website but what if it grows? Is it possible to make this code work: in search.jsp /* Comment this original line of code and use code below.

RE: How to enable nutch language Identifier

2009-11-05 Thread BELLINI ADAM
Hi, to enable language identifier : in the conf/nutch-site.xml file add the plugin language-identifier in the nameplugin.includes/name Date: Thu, 5 Nov 2009 01:32:40 -0800 From: saurabhsuman...@rediff.com To: nutch-user@lucene.apache.org Subject: How to enable nutch language Identifier

Re: MergeSegments - map reduce thread death

2009-11-05 Thread fadzi
i tried this once but before i knew it my log file was approaching a gig within an hour or so! I suggest maybe turning the debug logs on for hadoop before you do the next crawl... you can do this by editing log4j.properties and change the rootLogger from INFO to DEBUG On Thu, Nov 5, 2009 at

Re: MergeSegments - map reduce thread death

2009-11-05 Thread fadzi
hi there, we tried a few things around this; one suggestion was to run on it on a local machine; so i pulled one of our decent servers and got to work... but surprisingly we got the same error on a local machine! so it seems the hardware (VPS/Local) wasnt the culprit.. probably the data, or the