Hello, I am going to perform some manipulations on extracted text presented as array of strings, I need some advice. Need to retrieve Strings, store it (some Strings can be repeated in a file few times), sort, calculate statistics, store sorted subset in another file, etc. Which class is better designed for this? ArrayFile MapFile SequenceFile - I can sort by LongWritable, tried to sort by String - unsuccessfully SetFile
What is Map Reduce, could you please provide some overview? I can't use Lucene because I don't want to analyze-tokenize strings. Also, I don't want to reinvent a wheel, especially for a distributed NFS - I want to use this power. Thanks, Fuad
