i believe each mapper makes a copy since it reads in the data to be loaded into 
the dbm.

this needs to be optimized at some point (ideally we should be putting the dbm 
in distributed cache)
________________________________________
From: Gang Luo [lgpub...@yahoo.com.cn]
Sent: Tuesday, August 10, 2010 3:04 PM
To: hive-dev@hadoop.apache.org
Subject: how jdbm is used in map join

Hi all,
Hive uses JDBM for the replicate table in map join. When multiple map tasks are
running on the same node, will there be multiple copis of JDBM file generated,
or will all the map task share the same copy? If it is the later, which mapper
generates the file, and how to synchronize other mappers?

Thanks,
-Gang




Reply via email to