Yes, these are JDBC calls done in the MR framework when inserting data into a SQL table. We are reading data from HBase, process the data in some way and then insert that data into a DBMS. The generated INSERT INTO calls are cumulated in a batch and executed by the MR framework via JDBC.
I currently investigate to write my own DBOutputFormat decendent, which has the ability to execute the JDBC batch more often, to overcome possibly Heap Space problems, when processing a large amount of HBase rows. Regards, Thomas -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Stack Sent: Montag, 26. September 2011 18:40 To: [email protected] Subject: Re: DBOutputFormat - JDBC batch size? On Mon, Sep 26, 2011 at 5:35 AM, Steinmaurer Thomas <[email protected]> wrote: > Is there a property to configure the executeBatch and commit interval > somewhere? > Are these JDBC methods? What JDBC Driver are you playing with? I'm not sure I understand your setup -- how JDBC calls end up as HTable invocations. There is a config. you can set for the server that will limit how much is returned size-wise: "hbase.client.scanner.max.result.size". Otherwise, you need to play w/ the HTable batch sizes and calls to flush. St.Ack
