Yes, these are JDBC calls done in the MR framework when inserting data
into a SQL table. We are reading data from HBase, process the data in
some way and then insert that data into a DBMS. The generated INSERT
INTO calls are cumulated in a batch and executed by the MR framework via
JDBC.

I currently investigate to write my own DBOutputFormat decendent, which
has the ability to execute the JDBC batch more often, to overcome
possibly Heap Space problems, when processing a large amount of HBase
rows.

Regards,
Thomas

-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of
Stack
Sent: Montag, 26. September 2011 18:40
To: [email protected]
Subject: Re: DBOutputFormat - JDBC batch size?

On Mon, Sep 26, 2011 at 5:35 AM, Steinmaurer Thomas
<[email protected]> wrote:
> Is there a property to configure the executeBatch and commit interval 
> somewhere?
>

Are these JDBC methods?  What JDBC Driver are you playing with?  I'm not
sure I understand your setup -- how JDBC calls end up as HTable
invocations.

There is a config. you can set for the server that will limit how much
is returned size-wise: "hbase.client.scanner.max.result.size".
Otherwise, you need to play w/ the HTable batch sizes and calls to
flush.

St.Ack

Reply via email to