Yes, you can write custom writable classes that detail and serialise your required data structure. If you have Hadoop: The Definitive Guide, checkout its section "Serialization" under chapter "Hadoop I/O".
On Tue, Apr 8, 2014 at 9:16 PM, Natalia Connolly <natalia.v.conno...@gmail.com> wrote: > Dear All, > > I was wondering if the following is possible using MapReduce. > > I would like to create a job that loops over a bunch of documents, > tokenizes them into ngrams, and stores the ngrams and not only the counts of > ngrams but also _which_ document(s) had this particular ngram. In other > words, the key would be the ngram but the value would be an integer (the > count) _and_ an array of document id's. > > Is this something that can be done? Any pointers would be appreciated. > > I am using Java, btw. > > Thank you, > > Natalia Connolly > -- Harsh J