Hello,

I am working on a hadoop based solr indexing system. The reason we are using 
hadoop is because we need to prepare the data (compute values and add them to 
the solr documents).

For a full index I am reading in the records and outputting a MapWritable with 
all the fields I want to index. I then have other Hadoop jobs which use this 
output as an input. They contribute new computed fields to each document at 
reduce time.

This feels wrong as I am making each map read in the full document when they 
may only need one or two fields from the Map to add their computed field.

Is it possible in Hadoop to request a slice of the MapWritable? Or perhaps a 
better way to structure this? Would I even want to?


Thanks for any help,
Darren

Reply via email to