i believe each mapper makes a copy since it reads in the data to be loaded into the dbm.
this needs to be optimized at some point (ideally we should be putting the dbm in distributed cache) ________________________________________ From: Gang Luo [lgpub...@yahoo.com.cn] Sent: Tuesday, August 10, 2010 3:04 PM To: hive-dev@hadoop.apache.org Subject: how jdbm is used in map join Hi all, Hive uses JDBM for the replicate table in map join. When multiple map tasks are running on the same node, will there be multiple copis of JDBM file generated, or will all the map task share the same copy? If it is the later, which mapper generates the file, and how to synchronize other mappers? Thanks, -Gang