Re: Hbase scans taking a lot of time

Jean-Marc Spaggiari Fri, 25 Jan 2013 10:06:30 -0800

Hi Vibhav,

Do you really need 13 diffefent columns familly? Can't you find a way
to bundle that into 1 or 2 max CF? Maybe by prefixing the colument
name?


That might help...

JM

2013/1/25, Vibhav Mundra <[email protected]>:
> The number of column families I have is 13, which I guess is okie?
>
> -Vibhav
>
>
> On Fri, Jan 25, 2013 at 11:01 PM, Luke Lu <[email protected]> wrote:
>
>> You'll have this problem if you have a large number of column families
>> being scanned/populated at the same time. Make sure the data you
>> scan/populate frequently are in the same column family (you can have many
>> columns in a column family). Unlike BigTable/Hypertable which has the
>> concept of locality/access groups, HBase always stores column families in
>> separate files, essentially making column family not only a logic
>> grouping
>> mechanism but also a physical locality group.
>>
>>
>> On Fri, Jan 25, 2013 at 1:10 AM, Vibhav Mundra <[email protected]> wrote:
>>
>> > I am facing a very strange problem with HBase.
>> >
>> > This what I did:
>> > a) Create a table, using pre partioned splits.
>> > b) Also the column familes are zipped with lzo compression.
>> > c) Using the above configuration I am able to populate 2 million row
>> > per
>> > min in the Hbase.
>> > d) I have created a table with 300 million odd rows, which roughy took
>> me 3
>> > hours to populate and the data size is of 25GB.
>> >
>> > e) But when I query for data the performance I am getting is very bad.
>> >    Basically this is what I am seeing: High CPU, no disk I/O and
>> > network
>> > I/O is happening at the rate of 6~7MB secs.
>> >
>> >
>> > Because of this, if I scan the entries of the table using Hive it is
>> taking
>> > ages.
>> > Basically it is taking around 24 hours to scan the table. Any idea, of
>> how
>> > to debug.
>> >
>> >
>> > -Vibhav
>> >
>>
>

Re: Hbase scans taking a lot of time

Reply via email to