Re: Running an jython import job

Andrew Nguyen Fri, 23 Jul 2010 14:25:47 -0700

Thanks for the info.  I actually used that blog post as a starting point for my 
work with jython.


I will also take a look at the bulk loading you referenced below.  We are 
currently only doing single-cf imports.

--Andrew

--
Andrew Nguyen
[email protected]

The information contained in this electronic message and any attachments to 
this message are intended for the exclusive use of the addressee(s) and may 
contain confidential or privileged information.  Any unauthorized review, 
dissemination, distribution, or copying of this communication is prohibited.  
If you are not the intended recipient, please notify the sender immediately by 
reply e-mail, and destroy all copies of this message and any attachments from 
your files.






On Jul 23, 2010, at 10:31 AM, Stack wrote:

> On Fri, Jul 23, 2010 at 10:18 AM, Andrew Nguyen
> <[email protected]> wrote:
>> 
>> The jython page on the wiki was extremely useful.  I actually had never used 
>> jython before but am a big fan of python for getting stuff up quickly so it 
>> seemed to be a natural progression.  Having said that, I am looking at 
>> importing a ton of rows (not sure how much but hundreds of millions to 
>> billions).  Are there any good examples on doing this as efficiently as 
>> possible?  And, how does jython compare to a pure Java approach?
>> 
> 
> There is an old blog of Ryan's from back when he was doing all he
> could to not sully his paws with dirty java:
> http://ryantwopointoh.blogspot.com/2009/01/performance-of-hbase-importing.html
> Its an old post.  Jython may have come on since then.
> 
>> Currently, I have a for loop just calling table.put(p) repeatedly.  I also 
>> have WAL disabled, autoflush set to false, and increased the buffer.  
>> Anything else I should consider?
>> 
> 
> You are on the right track.  You might want to move to java but do the
> timing first.
> 
> There is also 
> http://hbase.apache.org/docs/r0.20.5/api/org/apache/hadoop/hbase/mapreduce/package-summary.html#bulk
> which has been buggy up to this though should be working now.  Its
> good if you are doing single-columnfamily only imports.   Usually you
> can see order-of-magnitude improvement in speeds bulk inserting.  This
> bulk load facility got redone completely in TRUNK, and for sure it
> works now.  Its super fancy; you can even bulk load into a running
> table; read more here:
> http://hbase.apache.org/docs/r0.89.20100621/bulk-loads.html
> 
> St.Ack
> 
>> Thanks!
>> 
>> --Andrew
>> 
>> --
>> Andrew Nguyen
>> [email protected]
>> 
>> The information contained in this electronic message and any attachments to 
>> this message are intended for the exclusive use of the addressee(s) and may 
>> contain confidential or privileged information.  Any unauthorized review, 
>> dissemination, distribution, or copying of this communication is prohibited. 
>>  If you are not the intended recipient, please notify the sender immediately 
>> by reply e-mail, and destroy all copies of this message and any attachments 
>> from your files.
>> 
>> 
>> 
>> 
>> 
>> 
>> On Jul 23, 2010, at 10:05 AM, Stack wrote:
>> 
>>> This is just our noisy client talking about the caching of region
>>> locations out on the cluster (You are at DEBUG level).  Turn off DEBUG
>>> in client if you'd rather not see the messages -- see the FAQ for how
>>> -- or just ignore.  When they turn WARN or ERROR, start paying
>>> attention.
>>> 
>>> Did they jython page up on wiki help?
>>> Yours,
>>> St.Ack
>>> 
>>> On Fri, Jul 23, 2010 at 9:58 AM, Andrew Nguyen
>>> <[email protected]> wrote:
>>>> Hello all,
>>>> 
>>>> I am running a job from jython that is importing time series data into 
>>>> HBase.  I started to see the following messages and wanted to dive deeper 
>>>> to find out if they are true errors or just debug messages:
>>>> 
>>>> 10/07/23 09:51:07 DEBUG client.HConnectionManager$TableServers: Reloading 
>>>> region subset,a40506-2016/07/23-20:33:30.296,1279902520534 location 
>>>> because regionserver didn't accept updates; tries=0 of max=10, 
>>>> waiting=1000ms
>>>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: Cached 
>>>> location for .META.,,1 is 10.10.11.3:60020
>>>> 10/07/23 09:51:08 DEBUG client.HConnectionManager$TableServers: 
>>>> locateRegionInMeta attempt 0 of 10 failed; retrying after sleep of 1000 
>>>> because: No server address listed in .META. for region 
>>>> subset,a40506-2016/07/24-07:00:35.528,1279903897169
>>>> 10/07/23 09:51:09 DEBUG client.HConnectionManager$TableServers: Cached 
>>>> location for subset,a40506-2016/07/24-07:00:35.528,1279903897169 is 
>>>> 10.10.11.2:60020
>>>> 
>>>> I did some searches on google and this seems to point at the potential 
>>>> lack of memory.  Currently, HBase is setup with a heap of 2G for each 
>>>> slave, and there are 6 slaves.  Each slave has a total of 8G of RAM 
>>>> installed.  If you guys have any guidance on what other settings I should 
>>>> look for, please let me know.
>>>> 
>>>> Thanks!
>>>> 
>>>> --Andrew
>>>> 
>>>> --
>>>> Andrew Nguyen
>>>> [email protected]
>>>> 
>>>> The information contained in this electronic message and any attachments 
>>>> to this message are intended for the exclusive use of the addressee(s) and 
>>>> may contain confidential or privileged information.  Any unauthorized 
>>>> review, dissemination, distribution, or copying of this communication is 
>>>> prohibited.  If you are not the intended recipient, please notify the 
>>>> sender immediately by reply e-mail, and destroy all copies of this message 
>>>> and any attachments from your files.
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>>>> 
>> 
>>

Re: Running an jython import job

Reply via email to