Can you not force the driver to flush more often? You can ask the objects that hbase returns their size and keep an account. When it tips over your max size threshold, call force flush on the jdbc driver?
St.Ack On Mon, Sep 26, 2011 at 11:02 PM, Steinmaurer Thomas <[email protected]> wrote: > Yes, these are JDBC calls done in the MR framework when inserting data > into a SQL table. We are reading data from HBase, process the data in > some way and then insert that data into a DBMS. The generated INSERT > INTO calls are cumulated in a batch and executed by the MR framework via > JDBC. > > I currently investigate to write my own DBOutputFormat decendent, which > has the ability to execute the JDBC batch more often, to overcome > possibly Heap Space problems, when processing a large amount of HBase > rows. > > Regards, > Thomas > > -----Original Message----- > From: [email protected] [mailto:[email protected]] On Behalf Of > Stack > Sent: Montag, 26. September 2011 18:40 > To: [email protected] > Subject: Re: DBOutputFormat - JDBC batch size? > > On Mon, Sep 26, 2011 at 5:35 AM, Steinmaurer Thomas > <[email protected]> wrote: >> Is there a property to configure the executeBatch and commit interval >> somewhere? >> > > Are these JDBC methods? What JDBC Driver are you playing with? I'm not > sure I understand your setup -- how JDBC calls end up as HTable > invocations. > > There is a config. you can set for the server that will limit how much > is returned size-wise: "hbase.client.scanner.max.result.size". > Otherwise, you need to play w/ the HTable batch sizes and calls to > flush. > > St.Ack >
