On Fri, Jul 23, 2010 at 10:18 AM, Andrew Nguyen
<[email protected]> wrote:
>
> The jython page on the wiki was extremely useful.  I actually had never used 
> jython before but am a big fan of python for getting stuff up quickly so it 
> seemed to be a natural progression.  Having said that, I am looking at 
> importing a ton of rows (not sure how much but hundreds of millions to 
> billions).  Are there any good examples on doing this as efficiently as 
> possible?  And, how does jython compare to a pure Java approach?
>

There is an old blog of Ryan's from back when he was doing all he
could to not sully his paws with dirty java:
http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
 Its an old post.  Jython may have come on since then.

> Currently, I have a for loop just calling table.put(p) repeatedly.  I also 
> have WAL disabled, autoflush set to false, and increased the buffer.  
> Anything else I should consider?
>

You are on the right track.  You might want to move to java but do the
timing first.

There is also 
http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
which has been buggy up to this though should be working now.  Its
good if you are doing single-columnfamily only imports.   Usually you
can see order-of-magnitude improvement in speeds bulk inserting.  This
bulk load facility got redone completely in TRUNK, and for sure it
works now.  Its super fancy; you can even bulk load into a running
table; read more here:
http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html

St.Ack

> Thanks!
>
> --Andrew
>
> --
> Andrew Nguyen
> [email protected]
>
> The information contained in this electronic message and any attachments to 
> this message are intended for the exclusive use of the addressee(s) and may 
> contain confidential or privileged information.  Any unauthorized review, 
> dissemination, distribution, or copying of this communication is prohibited.  
> If you are not the intended recipient, please notify the sender immediately 
> by reply e-mail, and destroy all copies of this message and any attachments 
> from your files.
>
>
>
>
>
>
> On Jul 23, 2010, at 10:05 AM, Stack wrote:
>
>> This is just our noisy client talking about the caching of region
>> locations out on the cluster (You are at DEBUG level).  Turn off DEBUG
>> in client if you'd rather not see the messages -- see the FAQ for how
>> -- or just ignore.  When they turn WARN or ERROR, start paying
>> attention.
>>
>> Did they jython page up on wiki help?
>> Yours,
>> St.Ack
>>
>> On Fri, Jul 23, 2010 at 9:58 AM, Andrew Nguyen
>> <[email protected]> wrote:
>>> Hello all,
>>>
>>> I am running a job from jython that is importing time series data into 
>>> HBase.  I started to see the following messages and wanted to dive deeper 
>>> to find out if they are true errors or just debug messages:
>>>
>>> 10/07/23 09:51:07 DEBUG client.HConnectionManager$TableServers: Reloading 
>>> region subset,a40506-2016/07/23-20:33:30.296,1279902520534 location because 
>>> regionserver didn't accept updates; tries=0 of max=10, waiting=1000ms
>>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: Cached 
>>> location for .META.,,1 is 10.10.11.3:60020
>>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: 
>>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 
>>> because: No server address listed in .META. for region 
>>> subset,a40506-2016/07/24-07:00:35.528,1279903897169
>>> 10/07/23 09:51:09 DEBUG client.HConnectionManager$TableServers: Cached 
>>> location for subset,a40506-2016/07/24-07:00:35.528,1279903897169 is 
>>> 10.10.11.2:60020
>>>
>>> I did some searches on google and this seems to point at the potential lack 
>>> of memory.  Currently, HBase is setup with a heap of 2G for each slave, and 
>>> there are 6 slaves.  Each slave has a total of 8G of RAM installed.  If you 
>>> guys have any guidance on what other settings I should look for, please let 
>>> me know.
>>>
>>> Thanks!
>>>
>>> --Andrew
>>>
>>> --
>>> Andrew Nguyen
>>> [email protected]
>>>
>>> The information contained in this electronic message and any attachments to 
>>> this message are intended for the exclusive use of the addressee(s) and may 
>>> contain confidential or privileged information.  Any unauthorized review, 
>>> dissemination, distribution, or copying of this communication is 
>>> prohibited.  If you are not the intended recipient, please notify the 
>>> sender immediately by reply e-mail, and destroy all copies of this message 
>>> and any attachments from your files.
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>
>

Reply via email to