Hello,

I am going to perform some manipulations on extracted text presented as
array of strings, I need some advice. Need to retrieve Strings, store it
(some Strings can be repeated in a file few times), sort, calculate
statistics, store sorted subset in another file, etc.
Which class is better designed for this?
ArrayFile
MapFile
SequenceFile - I can sort by LongWritable, tried to sort by String -
unsuccessfully
SetFile

What is Map Reduce, could you please provide some overview?

I can't use Lucene because I don't want to analyze-tokenize strings.
Also, I don't want to reinvent a wheel, especially for a distributed NFS
- I want to use this power.

Thanks,
Fuad

Reply via email to