Malaga-fi Finnish plugin for Nutch

2010-04-12 Thread Hannu Väisänen
Malaga-fi is a Nutch plugin for indexing documents written in Finnish. Malaga-fi analyses words morphologically, converts them to a base form (that you find in dictionaries) and indexes the base forms, so that you find all inflections of a word by just searching for the base form. To use an

Re: Nutch and EC2

2010-04-12 Thread Stefano Cherchi
Hi Yves, I'm going to start some test of nutch+solr on EC2 in a couple of days, so I will be able to give you some feedback on it soon. I'm actually a little concerned about computing speed, rather than ram or disk space, because I've experienced a consistent lack of performance in

Opinion crawling

2010-04-12 Thread NareshG
Hi , I am newbie in nutch. As part of learning I have done some basic things in nutch like intranet crawling, internet crawling and tried plugin example etc. Actually our main objective is to do opinion crawling. Its like we need to crawl only html pages which contain opinions,i.e user reviews

Re: Nutch and EC2

2010-04-12 Thread Kevin Conor
My experience on EC2 has been that the RAM and disk space are overkill, while the computing speed is lacking. I had been running my crawler on a 1GB slicehost slice, and when I moved it over to a medium high-cpu instance on EC2 (~2x the cost), the generate and update steps took 50% longer. Right