Malaga-fi is a Nutch plugin for indexing documents written in Finnish.
Malaga-fi analyses words morphologically, converts them to a base form
(that you find in dictionaries) and indexes the base forms, so that
you find all inflections of a word by just searching for the base
form.
To use an
Hi Yves,
I'm going to start some test of nutch+solr on EC2 in a couple of days, so I
will be able to give you some feedback on it soon.
I'm actually a little concerned about computing speed, rather than ram or disk
space, because I've experienced a consistent lack of performance in
Hi ,
I am newbie in nutch. As part of learning I have done some basic things in
nutch like intranet crawling, internet crawling and tried plugin example
etc. Actually our main objective is to do opinion crawling.
Its like we need to crawl only html pages which contain opinions,i.e user
reviews
My experience on EC2 has been that the RAM and disk space are overkill,
while the computing speed is lacking. I had been running my crawler on a
1GB slicehost slice, and when I moved it over to a medium high-cpu instance
on EC2 (~2x the cost), the generate and update steps took 50% longer. Right