Hi Yaron Sqoop uses a similar implementation. You can get some details there.
Replies inline • (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take their input from files or DBs? > From my small experience Most MR jobs have data in hdfs. It is useful for > getting data out of rdbms to hadoop, sqoop implemenation is an example. • Since all mappers open a connection to the same DBS, one cannot use hundreds of mapper. Is there a solution to this problem? >Num of mappers shouldn't be more than the permissible number of connections >allowed for that db. Regards Bejoy KS Sent from handheld, please excuse typos. -----Original Message----- From: Yaron Gonen <[email protected]> Date: Tue, 11 Sep 2012 15:41:26 To: <[email protected]> Reply-To: [email protected] Subject: Some general questions about DBInputFormat Hi, After reviewing the class's (not very complicated) code, I have some questions I hope someone can answer: - (more general question) Are there many use-cases for using DBInputFormat? Do most Hadoop jobs take their input from files or DBs? - What happens when the database is updated during mappers' data retrieval phase? is there a way to lock the database before the data retrieval phase and release it afterwords? - Since all mappers open a connection to the same DBS, one cannot use hundreds of mapper. Is there a solution to this problem? Thanks, Yaron
