Re: High CPU Utilization by meta region

2016-12-05 Thread Timothy Brown
We discovered that we were running the canary on all of our RegionServers.
These were reading from meta and causing the CPU usage. When we stopped the
canary, the CPU utilization on the RS that only contained the meta region
looked like the usage on the other RegionServers.

On Thu, Dec 1, 2016 at 8:43 PM, Stack  wrote:

> On Thu, Dec 1, 2016 at 3:48 PM, Timothy Brown  wrote:
> >
> > >
> > >
> > If you turn on RPC-level TRACE logging for a minute or so, anything about
> > > the client addresses that seems interesting?
> >
> >
> > Nothing seemed interesting to me but you may have a different opinion.
> > Here's the logs http://pastebin.com/FE8qVNH4
> >
> >
> Yeah. Nothing in there. There is no TRACE logging in there.
>
> St.Ack
>


Re: High CPU Utilization by meta region

2016-12-01 Thread Stack
On Thu, Dec 1, 2016 at 3:48 PM, Timothy Brown  wrote:
>
> >
> >
> If you turn on RPC-level TRACE logging for a minute or so, anything about
> > the client addresses that seems interesting?
>
>
> Nothing seemed interesting to me but you may have a different opinion.
> Here's the logs http://pastebin.com/FE8qVNH4
>
>
Yeah. Nothing in there. There is no TRACE logging in there.

St.Ack


Re: High CPU Utilization by meta region

2016-12-01 Thread Timothy Brown
Thanks for the help. I've added some more responses inline.

On Tue, Nov 29, 2016 at 9:51 PM, Stack  wrote:

> On Mon, Nov 28, 2016 at 10:25 AM, Timothy Brown 
> wrote:
>
> > Responses inlined.
> >
> > ...
>
> > > >
> > > > What is the difference when you compare servers? More requests? More
> > i/o?
> > > Thread dump the metadata server and let us see a link in here? (What
> you
> > > attached below is cut-off... just as it is getting to the good part).
> > >
> > >
> > > There are more requests to the server containing meta. The network in
> > bytes are greater for the meta regionserver than the others but the
> network
> > out bytes are less.
> >
> > Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/
> > 54494127/thread_dump.txt. I apologize for the cliffhanger.
> >
> >
> The in bytes are < the out bytes on the hbase:meta server? Or compared to
> other servers? Queries are usually smaller than response and in hbase:meta
> case, I'd think that we'd be mostly querying/reading with out much bigger
> than in.
>
The bytes out was compared to the other servers.

>
> Anything else running on this machine besides Master?

No

>
>
If you turn on RPC-level TRACE logging for a minute or so, anything about
> the client addresses that seems interesting?


Nothing seemed interesting to me but you may have a different opinion.
Here's the logs http://pastebin.com/FE8qVNH4

>
>
Looking at the thread dump (thanks), you have 1k handlers running?
>
> Thread 1037 (B.defaultRpcServer.handler=999,queue=99,port=60020):
>
> They are all idle in this thread dump (Same for the readers).
>
> I've found that having handlers == # of cpus seems to do the best when
> mostly a random read workload If lots of writes, good to have a few
> extras in case one gets occupied but 1k is a little OTT. Any particular
> reason for this many handlers? Would suggest trying way less. Might help w/
> CPU. 1k is a lot.
>

This looks like a config that was changed in the past that we wanted to
revisit. I'll try decreasing it and letting you know of the results.
Hopefully this is the culprit since our servers only have 4 CPUs.

>
> GCG1? (See HBASE-17072 CPU usage starts to climb up to 90-100% when using
> G1GC; purge ThreadLocal usage)
>
> We're not using GCG1.

>
> >
> > >
> > > > Here's some more info about our cluster:
> > > > HBase version 1.2
> > > >
> > >
> > > Which 1.2?
> > >
> > > 1.2.0 which is bundled with CDH 5.8.0
> >
> > >
> > >
> > > > Number of regions: 72
> > > > Number of tables: 97
> > > >
> > >
> > > On whole cluster? (Can't have more tables than regions...)
> > >
> > >
> > > An error on my part, I meant to put 72 region servers.
> >
> >
> > >
> > > > Approx. requests per second to meta region server: 3k
> > > >
> >
>
> That is not much. If all cached should be able to do way more than that.
>
> That's what I was thinking but we still get over 70% CPU usage on that
region server when it only hosts the meta region. We're running on it on
AWS d2.xlarge instance.

>
>
> > >
> > > Can you see who is hitting he meta region most? (Enable rpc-level TRACE
> > > logging on the server hosting meta for a minute or so and see where the
> > > requests are coming in from).
> > >
> > > What is your cache hit rate? Can you get it higher?
> > >
> > > Cache hit rate is above 99%. We see very little disk reads.
> >
> >
> > > Is there much writing going on against meta? Or is cluster stable
> regards
> > > region movement/creation?
> > >
> > > Writing is very infrequent. The cluster is stable with regards to
> region
> > movement and creation.
> >
> > >
> > >
> > > > Approx. requests per second to entire HBase cluster: 90k
> > > >
> > > > Additional info:
> > > >
> > > >
> > > > From Storefile Metrics:
> > > > Stores Num: 1
> > > > Storefiles: 1
> > > > Storefile Size: 30m
> > > > Uncompressed Storefile Size: 30m
> >
>
> Super small.
>
> St.Ack
>
>
>
>
> > > > Index Size: 459k
> > > >
> > > >
> > > This from meta table? That is very small.
> > >
> > > Yes this is from the meta table.
> >
> >
> > >
> > > >
> > > > I/O for the region server with only meta on it:
> > > > 48M bytes in
> > > >
> > >
> > >
> > > Whats all the writing about?
> > >
> > > I'm not sure. According to the AWS dashboard there are no disk writes
> at
> > that time.
> >
> > >
> > >
> > > > 5.9B bytes out
> > > >
> > > >
> > > This is disk or network? If network, is that 5.9 bytes?
> > >
> > > This is network and thats 5.9 billion byes. (I'm using the AWS
> dashboard
> > for this)
> >
> >
> > > Thanks Tim,
> > > S
> > >
> > >
> > >
> > > > I used the debug dump on the region server's UI but it was too large
> > > > for paste bin so here's a portion of it:
> http://pastebin.com/nkYhEceE
> > > >
> > > >
> > > > Thanks for the help,
> > > >
> > > > Tim
> > > >
> > >
> >
>


Re: High CPU Utilization by meta region

2016-11-29 Thread Stack
On Mon, Nov 28, 2016 at 10:25 AM, Timothy Brown  wrote:

> Responses inlined.
>
> ...

> > >
> > > What is the difference when you compare servers? More requests? More
> i/o?
> > Thread dump the metadata server and let us see a link in here? (What you
> > attached below is cut-off... just as it is getting to the good part).
> >
> >
> > There are more requests to the server containing meta. The network in
> bytes are greater for the meta regionserver than the others but the network
> out bytes are less.
>
> Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/
> 54494127/thread_dump.txt. I apologize for the cliffhanger.
>
>
The in bytes are < the out bytes on the hbase:meta server? Or compared to
other servers? Queries are usually smaller than response and in hbase:meta
case, I'd think that we'd be mostly querying/reading with out much bigger
than in.

Anything else running on this machine besides Master?

If you turn on RPC-level TRACE logging for a minute or so, anything about
the client addresses that seems interesting?

Looking at the thread dump (thanks), you have 1k handlers running?

Thread 1037 (B.defaultRpcServer.handler=999,queue=99,port=60020):

They are all idle in this thread dump (Same for the readers).

I've found that having handlers == # of cpus seems to do the best when
mostly a random read workload If lots of writes, good to have a few
extras in case one gets occupied but 1k is a little OTT. Any particular
reason for this many handlers? Would suggest trying way less. Might help w/
CPU. 1k is a lot.

GCG1? (See HBASE-17072 CPU usage starts to climb up to 90-100% when using
G1GC; purge ThreadLocal usage)


>
> >
> > > Here's some more info about our cluster:
> > > HBase version 1.2
> > >
> >
> > Which 1.2?
> >
> > 1.2.0 which is bundled with CDH 5.8.0
>
> >
> >
> > > Number of regions: 72
> > > Number of tables: 97
> > >
> >
> > On whole cluster? (Can't have more tables than regions...)
> >
> >
> > An error on my part, I meant to put 72 region servers.
>
>
> >
> > > Approx. requests per second to meta region server: 3k
> > >
>

That is not much. If all cached should be able to do way more than that.



> >
> > Can you see who is hitting he meta region most? (Enable rpc-level TRACE
> > logging on the server hosting meta for a minute or so and see where the
> > requests are coming in from).
> >
> > What is your cache hit rate? Can you get it higher?
> >
> > Cache hit rate is above 99%. We see very little disk reads.
>
>
> > Is there much writing going on against meta? Or is cluster stable regards
> > region movement/creation?
> >
> > Writing is very infrequent. The cluster is stable with regards to region
> movement and creation.
>
> >
> >
> > > Approx. requests per second to entire HBase cluster: 90k
> > >
> > > Additional info:
> > >
> > >
> > > From Storefile Metrics:
> > > Stores Num: 1
> > > Storefiles: 1
> > > Storefile Size: 30m
> > > Uncompressed Storefile Size: 30m
>

Super small.

St.Ack




> > > Index Size: 459k
> > >
> > >
> > This from meta table? That is very small.
> >
> > Yes this is from the meta table.
>
>
> >
> > >
> > > I/O for the region server with only meta on it:
> > > 48M bytes in
> > >
> >
> >
> > Whats all the writing about?
> >
> > I'm not sure. According to the AWS dashboard there are no disk writes at
> that time.
>
> >
> >
> > > 5.9B bytes out
> > >
> > >
> > This is disk or network? If network, is that 5.9 bytes?
> >
> > This is network and thats 5.9 billion byes. (I'm using the AWS dashboard
> for this)
>
>
> > Thanks Tim,
> > S
> >
> >
> >
> > > I used the debug dump on the region server's UI but it was too large
> > > for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
> > >
> > >
> > > Thanks for the help,
> > >
> > > Tim
> > >
> >
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Timothy Brown
Responses inlined.

On Mon, Nov 28, 2016 at 12:45 PM, Stack  wrote:

> On Sun, Nov 27, 2016 at 6:53 PM, Timothy Brown 
> wrote:
>
> > Hi Everyone,
> >
> > I apologize for starting an additional thread about this but I wasn't
> > subscribed to the users mailing list when I sent the original and can't
> > figure out how to respond to the original :(
> >
> > Original Message:
> >
> > We are seeing about 80% CPU utilization on the Region Server that solely
> > serves the meta table while other region servers typically have under 50%
> > CPU utilization. Is this expected?
> >
> > What is the difference when you compare servers? More requests? More i/o?
> Thread dump the metadata server and let us see a link in here? (What you
> attached below is cut-off... just as it is getting to the good part).
>
>
> There are more requests to the server containing meta. The network in
bytes are greater for the meta regionserver than the others but the network
out bytes are less.

Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/
54494127/thread_dump.txt. I apologize for the cliffhanger.


>
> > Here's some more info about our cluster:
> > HBase version 1.2
> >
>
> Which 1.2?
>
> 1.2.0 which is bundled with CDH 5.8.0

>
>
> > Number of regions: 72
> > Number of tables: 97
> >
>
> On whole cluster? (Can't have more tables than regions...)
>
>
> An error on my part, I meant to put 72 region servers.


>
> > Approx. requests per second to meta region server: 3k
> >
>
> Can you see who is hitting he meta region most? (Enable rpc-level TRACE
> logging on the server hosting meta for a minute or so and see where the
> requests are coming in from).
>
> What is your cache hit rate? Can you get it higher?
>
> Cache hit rate is above 99%. We see very little disk reads.


> Is there much writing going on against meta? Or is cluster stable regards
> region movement/creation?
>
> Writing is very infrequent. The cluster is stable with regards to region
movement and creation.

>
>
> > Approx. requests per second to entire HBase cluster: 90k
> >
> > Additional info:
> >
> >
> > From Storefile Metrics:
> > Stores Num: 1
> > Storefiles: 1
> > Storefile Size: 30m
> > Uncompressed Storefile Size: 30m
> > Index Size: 459k
> >
> >
> This from meta table? That is very small.
>
> Yes this is from the meta table.


>
> >
> > I/O for the region server with only meta on it:
> > 48M bytes in
> >
>
>
> Whats all the writing about?
>
> I'm not sure. According to the AWS dashboard there are no disk writes at
that time.

>
>
> > 5.9B bytes out
> >
> >
> This is disk or network? If network, is that 5.9 bytes?
>
> This is network and thats 5.9 billion byes. (I'm using the AWS dashboard
for this)


> Thanks Tim,
> S
>
>
>
> > I used the debug dump on the region server's UI but it was too large
> > for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
> >
> >
> > Thanks for the help,
> >
> > Tim
> >
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Stack
On Sun, Nov 27, 2016 at 6:53 PM, Timothy Brown  wrote:

> Hi Everyone,
>
> I apologize for starting an additional thread about this but I wasn't
> subscribed to the users mailing list when I sent the original and can't
> figure out how to respond to the original :(
>
> Original Message:
>
> We are seeing about 80% CPU utilization on the Region Server that solely
> serves the meta table while other region servers typically have under 50%
> CPU utilization. Is this expected?
>
> What is the difference when you compare servers? More requests? More i/o?
Thread dump the metadata server and let us see a link in here? (What you
attached below is cut-off... just as it is getting to the good part).



> Here's some more info about our cluster:
> HBase version 1.2
>

Which 1.2?



> Number of regions: 72
> Number of tables: 97
>

On whole cluster? (Can't have more tables than regions...)



> Approx. requests per second to meta region server: 3k
>

Can you see who is hitting he meta region most? (Enable rpc-level TRACE
logging on the server hosting meta for a minute or so and see where the
requests are coming in from).

What is your cache hit rate? Can you get it higher?

Is there much writing going on against meta? Or is cluster stable regards
region movement/creation?



> Approx. requests per second to entire HBase cluster: 90k
>
> Additional info:
>
>
> From Storefile Metrics:
> Stores Num: 1
> Storefiles: 1
> Storefile Size: 30m
> Uncompressed Storefile Size: 30m
> Index Size: 459k
>
>
This from meta table? That is very small.


>
> I/O for the region server with only meta on it:
> 48M bytes in
>


Whats all the writing about?



> 5.9B bytes out
>
>
This is disk or network? If network, is that 5.9 bytes?

Thanks Tim,
S



> I used the debug dump on the region server's UI but it was too large
> for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
>
>
> Thanks for the help,
>
> Tim
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Timothy Brown
Hi Ted,

The region server hosting hbase:meta only has the meta region on it so it
has 1 region while other region servers can have more than 100 regions on
them.
I didn't notice anything interesting in the logs in my opinion. Is there
anything in particular I should watch out for?
The hbase:meta table was major compacted yesterday and we're still
experiencing the issue.

Thanks for the quick response,
Tim


On Mon, Nov 28, 2016 at 5:45 AM, Ted Yu  wrote:

> Does the region server hosting hbase:meta have roughly the same number of
> regions as the other servers ?
> Did you find anything interesting in the server log (where hbase:meta is
> hosted) ?
> Have you tried major compacting the hbase:meta table ?
> In 1.2, DEFAULT_HBASE_META_VERSIONS is still 10. See HBASE-16832
>
>
> On Sunday, November 27, 2016 6:53 PM, Timothy Brown <
> t...@siftscience.com> wrote:
>
>
>  Hi Everyone,
>
> I apologize for starting an additional thread about this but I wasn't
> subscribed to the users mailing list when I sent the original and can't
> figure out how to respond to the original :(
>
> Original Message:
>
> We are seeing about 80% CPU utilization on the Region Server that solely
> serves the meta table while other region servers typically have under 50%
> CPU utilization. Is this expected?
>
> Here's some more info about our cluster:
> HBase version 1.2
> Number of regions: 72
> Number of tables: 97
> Approx. requests per second to meta region server: 3k
> Approx. requests per second to entire HBase cluster: 90k
>
> Additional info:
>
>
> From Storefile Metrics:
> Stores Num: 1
> Storefiles: 1
> Storefile Size: 30m
> Uncompressed Storefile Size: 30m
> Index Size: 459k
>
>
> I/O for the region server with only meta on it:
> 48M bytes in
> 5.9B bytes out
>
> I used the debug dump on the region server's UI but it was too large
> for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE
>
>
> Thanks for the help,
>
> Tim
>
>
>
>


Re: High CPU Utilization by meta region

2016-11-28 Thread Ted Yu
Does the region server hosting hbase:meta have roughly the same number of 
regions as the other servers ?
Did you find anything interesting in the server log (where hbase:meta is 
hosted) ?
Have you tried major compacting the hbase:meta table ?
In 1.2, DEFAULT_HBASE_META_VERSIONS is still 10. See HBASE-16832
 

On Sunday, November 27, 2016 6:53 PM, Timothy Brown  
wrote:
 

 Hi Everyone,

I apologize for starting an additional thread about this but I wasn't
subscribed to the users mailing list when I sent the original and can't
figure out how to respond to the original :(

Original Message:

We are seeing about 80% CPU utilization on the Region Server that solely
serves the meta table while other region servers typically have under 50%
CPU utilization. Is this expected?

Here's some more info about our cluster:
HBase version 1.2
Number of regions: 72
Number of tables: 97
Approx. requests per second to meta region server: 3k
Approx. requests per second to entire HBase cluster: 90k

Additional info:


>From Storefile Metrics:
Stores Num: 1
Storefiles: 1
Storefile Size: 30m
Uncompressed Storefile Size: 30m
Index Size: 459k


I/O for the region server with only meta on it:
48M bytes in
5.9B bytes out

I used the debug dump on the region server's UI but it was too large
for paste bin so here's a portion of it: http://pastebin.com/nkYhEceE


Thanks for the help,

Tim


   

Re: High CPU utilization by meta region

2016-11-22 Thread Jean-Marc Spaggiari
To add to what Stack asked, do you have the metrics for your META vs he
other regions? Is the meta hot-spotted, which might create an increase on
the CPU usage? Not just the requests per seconds, but also the number of
calls. Does the META have way more? Or almost the same? Or less?

thanks,

JMS


2016-11-22 0:04 GMT-05:00 Stack :

> Can we see configs -- encodings? -- and a thread dump?  Any I/O? If you
> look in HDFS, many files under hbase:meta? Is it big? When was last time it
> major compacted?
>
> Thanks,
> S
>
> On Mon, Nov 21, 2016 at 5:50 PM, Timothy Brown 
> wrote:
>
> > Hi,
> >
> > We are seeing about 80% CPU utilization on the Region Server that solely
> > serves the meta table while other region servers typically have under 50%
> > CPU utilization. Is this expected?
> >
> > Here's some more info about our cluster:
> > HBase version 1.2
> > Number of regions: 72
> > Number of tables: 97
> > Approx. requests per second to meta region server: 3k
> > Approx. requests per second to entire HBase cluster: 90k
> >
> > Let me know what other information would be useful.
> >
> > Thanks for the help,
> > Tim
> >
>


Re: High CPU utilization by meta region

2016-11-21 Thread Stack
Can we see configs -- encodings? -- and a thread dump?  Any I/O? If you
look in HDFS, many files under hbase:meta? Is it big? When was last time it
major compacted?

Thanks,
S

On Mon, Nov 21, 2016 at 5:50 PM, Timothy Brown  wrote:

> Hi,
>
> We are seeing about 80% CPU utilization on the Region Server that solely
> serves the meta table while other region servers typically have under 50%
> CPU utilization. Is this expected?
>
> Here's some more info about our cluster:
> HBase version 1.2
> Number of regions: 72
> Number of tables: 97
> Approx. requests per second to meta region server: 3k
> Approx. requests per second to entire HBase cluster: 90k
>
> Let me know what other information would be useful.
>
> Thanks for the help,
> Tim
>