Rong-en Fan wrote:
I have two questions regarding the mapfile in hadoop/hdfs. First, when using
MapFileOutputFormat as reducer's output, is there any way to change
the index interval (i.e., able to call setIndexInterval() on the
output MapFile)?
Not at present. It would probably be good to change MapFile to get this
value from the Configuration. A static method could be added,
MapFile#setIndexInterval(Configuration conf, int interval), that sets
io.mapfile.index.interval, and the MapFile constructor could read this
property from the Configuration. One could then use the static method
to set this on jobs.
If you need this, please file an issue in Jira. If possible, include a
patch too.
http://wiki.apache.org/hadoop/HowToContribute
Second, is it possible to tell what is the position in data file for a given
key, assuming index interval is 1 and # of keys are small?
One could read the index file explicitly. It's just a SequenceFile,
listing keys and positions in the data file. But why would you set
the index interval to 1? And why do you need to know the position?
Doug