Hi Yaron

Sqoop uses a similar implementation. You can get some details there.

Replies inline
• (more general question) Are there many use-cases for using DBInputFormat? Do 
most Hadoop jobs take their input from files or DBs?

> From my small experience Most MR jobs have data in hdfs. It is useful for 
> getting data out of rdbms to hadoop, sqoop implemenation is an example.


• Since all mappers open a connection to the same DBS, one cannot use hundreds 
of mapper. Is there a solution to this problem? 

>Num of mappers shouldn't be more than the permissible number of connections 
>allowed for that db. 



Regards
Bejoy KS

Sent from handheld, please excuse typos.

-----Original Message-----
From: Yaron Gonen <[email protected]>
Date: Tue, 11 Sep 2012 15:41:26 
To: <[email protected]>
Reply-To: [email protected]
Subject: Some general questions about DBInputFormat

Hi,
After reviewing the class's (not very complicated) code, I have some
questions I hope someone can answer:

   - (more general question) Are there many use-cases for using
   DBInputFormat? Do most Hadoop jobs take their input from files or DBs?
   - What happens when the database is updated during mappers' data
   retrieval phase? is there a way to lock the database before the data
   retrieval phase and release it afterwords?
   - Since all mappers open a connection to the same DBS, one cannot use
   hundreds of mapper. Is there a solution to this problem?

Thanks,
Yaron

Reply via email to