On Fri, Jul 23, 2010 at 10:18 AM, Andrew Nguyen <[email protected]> wrote: > > The jython page on the wiki was extremely useful. I actually had never used > jython before but am a big fan of python for getting stuff up quickly so it > seemed to be a natural progression. Having said that, I am looking at > importing a ton of rows (not sure how much but hundreds of millions to > billions). Are there any good examples on doing this as efficiently as > possible? And, how does jython compare to a pure Java approach? >
There is an old blog of Ryan's from back when he was doing all he could to not sully his paws with dirty java: http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html Its an old post. Jython may have come on since then. > Currently, I have a for loop just calling table.put(p) repeatedly. I also > have WAL disabled, autoflush set to false, and increased the buffer. > Anything else I should consider? > You are on the right track. You might want to move to java but do the timing first. There is also http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk which has been buggy up to this though should be working now. Its good if you are doing single-columnfamily only imports. Usually you can see order-of-magnitude improvement in speeds bulk inserting. This bulk load facility got redone completely in TRUNK, and for sure it works now. Its super fancy; you can even bulk load into a running table; read more here: http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html St.Ack > Thanks! > > --Andrew > > -- > Andrew Nguyen > [email protected] > > The information contained in this electronic message and any attachments to > this message are intended for the exclusive use of the addressee(s) and may > contain confidential or privileged information. Any unauthorized review, > dissemination, distribution, or copying of this communication is prohibited. > If you are not the intended recipient, please notify the sender immediately > by reply e-mail, and destroy all copies of this message and any attachments > from your files. > > > > > > > On Jul 23, 2010, at 10:05 AM, Stack wrote: > >> This is just our noisy client talking about the caching of region >> locations out on the cluster (You are at DEBUG level). Turn off DEBUG >> in client if you'd rather not see the messages -- see the FAQ for how >> -- or just ignore. When they turn WARN or ERROR, start paying >> attention. >> >> Did they jython page up on wiki help? >> Yours, >> St.Ack >> >> On Fri, Jul 23, 2010 at 9:58 AM, Andrew Nguyen >> <[email protected]> wrote: >>> Hello all, >>> >>> I am running a job from jython that is importing time series data into >>> HBase. I started to see the following messages and wanted to dive deeper >>> to find out if they are true errors or just debug messages: >>> >>> 10/07/23 09:51:07 DEBUG client.HConnectionManager$TableServers: Reloading >>> region subset,a40506-2016/07/23-20:33:30.296,1279902520534 location because >>> regionserver didn't accept updates; tries=0 of max=10, waiting=1000ms >>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: Cached >>> location for .META.,,1 is 10.10.11.3:60020 >>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: >>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 >>> because: No server address listed in .META. for region >>> subset,a40506-2016/07/24-07:00:35.528,1279903897169 >>> 10/07/23 09:51:09 DEBUG client.HConnectionManager$TableServers: Cached >>> location for subset,a40506-2016/07/24-07:00:35.528,1279903897169 is >>> 10.10.11.2:60020 >>> >>> I did some searches on google and this seems to point at the potential lack >>> of memory. Currently, HBase is setup with a heap of 2G for each slave, and >>> there are 6 slaves. Each slave has a total of 8G of RAM installed. If you >>> guys have any guidance on what other settings I should look for, please let >>> me know. >>> >>> Thanks! >>> >>> --Andrew >>> >>> -- >>> Andrew Nguyen >>> [email protected] >>> >>> The information contained in this electronic message and any attachments to >>> this message are intended for the exclusive use of the addressee(s) and may >>> contain confidential or privileged information. Any unauthorized review, >>> dissemination, distribution, or copying of this communication is >>> prohibited. If you are not the intended recipient, please notify the >>> sender immediately by reply e-mail, and destroy all copies of this message >>> and any attachments from your files. >>> >>> >>> >>> >>> >>> >>> > >
