Thanks for the help. I've added some more responses inline.

On Tue, Nov 29, 2016 at 9:51 PM, Stack <st...@duboce.net> wrote:

> On Mon, Nov 28, 2016 at 10:25 AM, Timothy Brown <t...@siftscience.com>
> wrote:
>
> > Responses inlined.
> >
> > ...
>
> > > >
> > > > What is the difference when you compare servers? More requests? More
> > i/o?
> > > Thread dump the metadata server and let us see a link in here? (What
> you
> > > attached below is cut-off... just as it is getting to the good part).
> > >
> > >
> > > There are more requests to the server containing meta. The network in
> > bytes are greater for the meta regionserver than the others but the
> network
> > out bytes are less.
> >
> > Here's a dropbox link to the output https://dl.dropboxusercontent.com/u/
> > 54494127/thread_dump.txt. I apologize for the cliffhanger.
> >
> >
> The in bytes are < the out bytes on the hbase:meta server? Or compared to
> other servers? Queries are usually smaller than response and in hbase:meta
> case, I'd think that we'd be mostly querying/reading with out much bigger
> than in.
>
The bytes out was compared to the other servers.

>
> Anything else running on this machine besides Master?

No

>
>
If you turn on RPC-level TRACE logging for a minute or so, anything about
> the client addresses that seems interesting?


Nothing seemed interesting to me but you may have a different opinion.
Here's the logs http://pastebin.com/FE8qVNH4

>
>
Looking at the thread dump (thanks), you have 1k handlers running?
>
> Thread 1037 (B.defaultRpcServer.handler=999,queue=99,port=60020):
>
> They are all idle in this thread dump (Same for the readers).
>
> I've found that having handlers == # of cpus seems to do the best when
> mostly a random read workload.... If lots of writes, good to have a few
> extras in case one gets occupied but 1k is a little OTT. Any particular
> reason for this many handlers? Would suggest trying way less. Might help w/
> CPU. 1k is a lot.
>

This looks like a config that was changed in the past that we wanted to
revisit. I'll try decreasing it and letting you know of the results.
Hopefully this is the culprit since our servers only have 4 CPUs.

>
> GCG1? (See HBASE-17072 CPU usage starts to climb up to 90-100% when using
> G1GC; purge ThreadLocal usage)
>
> We're not using GCG1.

>
> >
> > >
> > > > Here's some more info about our cluster:
> > > > HBase version 1.2
> > > >
> > >
> > > Which 1.2?
> > >
> > > 1.2.0 which is bundled with CDH 5.8.0
> >
> > >
> > >
> > > > Number of regions: 72
> > > > Number of tables: 97
> > > >
> > >
> > > On whole cluster? (Can't have more tables than regions...)
> > >
> > >
> > > An error on my part, I meant to put 72 region servers.
> >
> >
> > >
> > > > Approx. requests per second to meta region server: 3k
> > > >
> >
>
> That is not much. If all cached should be able to do way more than that.
>
> That's what I was thinking but we still get over 70% CPU usage on that
region server when it only hosts the meta region. We're running on it on
AWS d2.xlarge instance.

>
>
> > >
> > > Can you see who is hitting he meta region most? (Enable rpc-level TRACE
> > > logging on the server hosting meta for a minute or so and see where the
> > > requests are coming in from).
> > >
> > > What is your cache hit rate? Can you get it higher?
> > >
> > > Cache hit rate is above 99%. We see very little disk reads.
> >
> >
> > > Is there much writing going on against meta? Or is cluster stable
> regards
> > > region movement/creation?
> > >
> > > Writing is very infrequent. The cluster is stable with regards to
> region
> > movement and creation.
> >
> > >
> > >
> > > > Approx. requests per second to entire HBase cluster: 90k
> > > >
> > > > Additional info:
> > > >
> > > >
> > > > From Storefile Metrics:
> > > > Stores Num: 1
> > > > Storefiles: 1
> > > > Storefile Size: 30m
> > > > Uncompressed Storefile Size: 30m
> >
>
> Super small.
>
> St.Ack
>
>
>
>
> > > > Index Size: 459k
> > > >
> > > >
> > > This from meta table? That is very small.
> > >
> > > Yes this is from the meta table.
> >
> >
> > >
> > > >
> > > > I/O for the region server with only meta on it:
> > > > 48M bytes in
> > > >
> > >
> > >
> > > Whats all the writing about?
> > >
> > > I'm not sure. According to the AWS dashboard there are no disk writes
> at
> > that time.
> >
> > >
> > >
> > > > 5.9B bytes out
> > > >
> > > >
> > > This is disk or network? If network, is that 5.9 bytes?
> > >
> > > This is network and thats 5.9 billion byes. (I'm using the AWS
> dashboard
> > for this)
> >
> >
> > > Thanks Tim,
> > > S
> > >
> > >
> > >
> > > > I used the debug dump on the region server's UI but it was too large
> > > > for paste bin so here's a portion of it:
> http://pastebin.com/nkYhEceE
> > > >
> > > >
> > > > Thanks for the help,
> > > >
> > > > Tim
> > > >
> > >
> >
>

Reply via email to