Re: Slow MR data load to table

Bradford Stephens Tue, 21 Dec 2010 15:20:33 -0800

Yes, a good point. Swappiness is set to 60 -- suppost I should set it to 0?


On Tue, Dec 21, 2010 at 5:58 AM, Lars George <[email protected]> wrote:
> Hi Bradford,
>
> I heard this before recently and one of the things that bit the person
> in question in the butt was swapping. Could you check that all
> machines are positively healthy and not swapping etc. - just to rule
> out the (not so) obvious stuff.
>
> Lars
>
> On Mon, Dec 20, 2010 at 8:22 PM, Bradford Stephens
> <[email protected]> wrote:
>> Aaaand, LZO is not enabled.
>>
>> On Mon, Dec 20, 2010 at 8:22 PM, Bradford Stephens
>> <[email protected]> wrote:
>>> FYI, here is the hbase-site: http://pastebin.com/z9aqy3dQ
>>>
>>> Also, in hbase-env:
>>>
>>> export HBASE_OPTS="-XX:+HeapDumpOnOutOfMemoryError
>>> -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode"
>>>
>>> Hrm, that seems suboptimal....
>>>
>>> On Mon, Dec 20, 2010 at 7:55 PM, Bradford Stephens
>>> <[email protected]> wrote:
>>>> Greetings HBase Homies,
>>>>
>>>> I'm running the .89 dev release (though I had this problem in .20.6 as
>>>> well).  Trying to load 10 x 8.5 CSV files from HDFS into an empty
>>>> HBase table.
>>>>
>>>> Getting pretty slow loads ... 85,000 records/minute/node. I'd expect
>>>> this to be at least 5x faster based on past experience. Cluster has 5
>>>> RSs, on AWS, 7 GB RAM x 8 "cores". c1.xlarge. Occasionally I'm getting
>>>> "Failed to report status for 601 seconds. Killing!" on maptasks. WAL
>>>> is disabled.
>>>>
>>>> What's odd is, I could have sworn it used to be *much* faster last
>>>> week. I don't remember the code changing. Could it be environmental?
>>>> top isn't displaying anything interesting.
>>>>
>>>> The schema is pretty simple. Each record is maybe 1k:
>>>> id_set:id, id_set:mid, id_set:aguid, id_set:sid
>>>> metadata:seq, metadata:rdu, metadata:deploytype, metadata:ver, 
>>>> metadata:type
>>>> event:event
>>>> data_set:ts, data_set:data, data_set:geo
>>>>
>>>> The code is simple (didn't write it):
>>>> (Main): http://pastebin.com/vmPgeqNj
>>>> (Mapper): http://pastebin.com/T2BQjs0k
>>>>
>>>> The logs are quite boring:
>>>> HMaster: http://pastebin.com/zvyvNc3k
>>>> Reigonserver: http://pastebin.com/QvJ4J7Ps
>>>>
>>>>
>>>> Any ideas?
>>>>
>>>> --
>>>> Bradford Stephens,
>>>> Founder, Drawn to Scale
>>>> drawntoscalehq.com
>>>> 727.697.7528
>>>>
>>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>>> solution. Process, store, query, search, and serve all your data.
>>>>
>>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>>> Media, and Computer Science
>>>>
>>>
>>>
>>>
>>> --
>>> Bradford Stephens,
>>> Founder, Drawn to Scale
>>> drawntoscalehq.com
>>> 727.697.7528
>>>
>>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>>> solution. Process, store, query, search, and serve all your data.
>>>
>>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>>> Media, and Computer Science
>>>
>>
>>
>>
>> --
>> Bradford Stephens,
>> Founder, Drawn to Scale
>> drawntoscalehq.com
>> 727.697.7528
>>
>> http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
>> solution. Process, store, query, search, and serve all your data.
>>
>> http://www.roadtofailure.com -- The Fringes of Scalability, Social
>> Media, and Computer Science
>>
>



-- 
Bradford Stephens,
Founder, Drawn to Scale
drawntoscalehq.com
727.697.7528

http://www.drawntoscalehq.com --  The intuitive, cloud-scale data
solution. Process, store, query, search, and serve all your data.

http://www.roadtofailure.com -- The Fringes of Scalability, Social
Media, and Computer Science

Re: Slow MR data load to table

Reply via email to