Re: Creating HBase table with presplits

2016-12-13 Thread Sachin Jain
Thanks Saad!!

This is exactly similar to what I had planned to implement i.e to map your
known keyspack to known keyspace by using a hash algorithm like MD5. Then
split the table. Thanks once again!!


On Fri, Dec 2, 2016 at 7:18 PM, Saad Mufti  wrote:

> Forgot to mention in above example you would presplit into 1024 regions,
> starting from "" to "1023" (start keys).
>
> Cheers.
>
> 
> Saad
>
>
> On Fri, Dec 2, 2016 at 8:47 AM, Saad Mufti  wrote:
>
> > One way to do this without knowing your data (still need some idea of
> size
> > of keyspace) is to prepend a fixed numeric prefix from a suitable range
> > based on a good hash like MD5. For example, let us say you can predict
> your
> > data will fit in about 1024 regions. You can decide to prepend a prefix
> > from  to 1024 to all you keys based on a suitable hash.
> >
> > The pros:
> >
> > 1. you get to pre-split without knowing your keyspace
> > 2. very hard if not impossible for unknown data providers to send you
> data
> > in some order that generates hotspots (unless of course the same key is
> > repeated over and over, still have to watch out for that)
> >
> > The cons:
> >
> > 1. lose the ability to do scan in "natural" sorted order of your keyspace
> > as that order is not preserved anymore in HBase
> > 2. if you miscalculate your keyspace size by a lot, you are stuck with
> the
> > hash function and range you selected even if you later get more regions
> > unless you're willing to do complete migration to a new table
> >
> > Hope above helps.
> >
> > 
> > Saad
> >
> >
> > On Tue, Nov 29, 2016 at 4:28 AM, Sachin Jain 
> > wrote:
> >
> >> Thanks Dave for your suggestions!
> >> Will let you know if I find some approach to tackle this situation.
> >>
> >> Regards
> >>
> >> On Mon, Nov 28, 2016 at 9:05 PM, Dave Latham 
> wrote:
> >>
> >> > If you truly have no way to predict anything about the distribution of
> >> your
> >> > data across the row key space, then you are correct that there is no
> >> way to
> >> > presplit your regions in an effective way.  Either you need to make
> some
> >> > starting guess, such as a small number of uniform splits, or wait
> until
> >> you
> >> > have some information about what the data will look like.
> >> >
> >> > Dave
> >> >
> >> > On Mon, Nov 28, 2016 at 12:42 AM, Sachin Jain <
> sachinjain...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > I was going though pre-splitting a table article [0] and it is
> >> mentioned
> >> > > that it is generally best practice to presplit your table. But don't
> >> we
> >> > > need to know the data in advance in order to presplit it.
> >> > >
> >> > > Question: What should be the best practice when we don't know what
> >> data
> >> > is
> >> > > going to be inserted into HBase. Essentially I don't know the key
> >> range
> >> > so
> >> > > if I specify wrong splits, then either first or last split can be a
> >> hot
> >> > > region in my system.
> >> > >
> >> > > [0]: https://hbase.apache.org/book.html#rowkey.regionsplits
> >> > >
> >> > > Thanks
> >> > > -Sachin
> >> > >
> >> >
> >>
> >
> >
>


Re: [ANNOUNCE] New HBase Committer Josh Elser

2016-12-13 Thread Rohan Pednekar
Congratulations Josh!

Best,
Rohan

> On Dec 12, 2016, at 2:48 PM, Enis Söztutar  wrote:
> 
> Congrats Josh!
> 
> Enis
> 
> On Mon, Dec 12, 2016 at 11:39 AM, Esteban Gutierrez 
> wrote:
> 
>> Congrats and welcome, Josh!
>> 
>> esteban.
>> 
>> 
>> --
>> Cloudera, Inc.
>> 
>> 
>> On Sun, Dec 11, 2016 at 10:17 PM, Yu Li  wrote:
>> 
>>> Congratulations and welcome!
>>> 
>>> Best Regards,
>>> Yu
>>> 
>>> On 12 December 2016 at 12:47, Mikhail Antonov 
>>> wrote:
>>> 
 Congratulations Josh!
 
 -Mikhail
 
 On Sun, Dec 11, 2016 at 5:20 PM, 张铎  wrote:
 
> Congratulations!
> 
> 2016-12-12 9:03 GMT+08:00 Jerry He :
> 
>> Congratulations , Josh!
>> 
>> Good work on the PQS too.
>> 
>> Jerry
>> 
>> On Sun, Dec 11, 2016 at 12:14 PM, Josh Elser 
 wrote:
>> 
>>> Thanks, all. I'm looking forward to continuing to work with you
>>> all!
>>> 
>>> 
>>> Nick Dimiduk wrote:
>>> 
 On behalf of the Apache HBase PMC, I am pleased to announce that
 Josh
 Elser
 has accepted the PMC's invitation to become a committer on the
> project.
>> We
 appreciate all of Josh's generous contributions thus far and
>> look
>> forward
 to his continued involvement.
 
 Allow me to be the first to congratulate and welcome Josh into
>> his
 new
 role!
 
 
>> 
> 
 
 
 
 --
 Thanks,
 Michael Antonov
 
>>> 
>> 



Re: Hot Region Server With No Hot Region

2016-12-13 Thread Stack
On Tue, Dec 13, 2016 at 12:47 PM, Saad Mufti  wrote:

> Thanks everyone for the feedback. We tracked this down to having a bad
> design using dynamic columns, there were a few (very few) rows that
> accumulated up to 200,000 dynamic columns. When we got any activity that
> caused us to try to read one of these rows, it resulted in a hot region
> server.
>
> Follow up question, we are now in the process of cleaning up those rows as
> identified, but but some are so big that trying to read them in the cleanup
> process kills it with out of memory exceptions. Is there any way to
> identify rows with too many columns without actually reading them all?
>
>
Can you upgrade and then read with partials enabled?

How are you doing your cleaning?

(In the past I've heard of folks narrowing down the culprit storefiles and
then offline rewriting hfiles with a variant on ./hbase/bin/hbase --config
~/conf_hbase org.apache.hadoop.hbase.io.hfile.HFile)

St.Ack







> Thanks.
>
> 
> Saad
>
>
> On Sat, Dec 3, 2016 at 3:20 PM, Ted Yu  wrote:
>
> > I took a look at the stack trace.
> >
> > Region server log would give us more detail on the frequency and duration
> > of compactions.
> >
> > Cheers
> >
> > On Sat, Dec 3, 2016 at 7:39 AM, Jeremy Carroll 
> > wrote:
> >
> > > I would check compaction, investigate throttling if it's causing high
> > CPU.
> > >
> > > On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti 
> wrote:
> > >
> > > > No.
> > > >
> > > > 
> > > > Saad
> > > >
> > > >
> > > > On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu 
> > wrote:
> > > >
> > > > > Some how I couldn't access the pastebin (I am in China now).
> > > > > Did the region server showing hotspot host meta ?
> > > > > Thanks
> > > > >
> > > > > On Friday, December 2, 2016 11:53 AM, Saad Mufti <
> > > > saad.mu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >
> > > > >  We're in AWS with D2.4xLarge instances. Each instance has 12
> > > independent
> > > > > spindles/disks from what I can tell.
> > > > >
> > > > > We have charted get_rate and mutate_rate by host and
> > > > >
> > > > > a) mutate_rate shows no real outliers
> > > > > b) read_rate shows the overall rate on the "hotspot" region server
> > is a
> > > > bit
> > > > > higher than every other server, not severely but enough that it is
> a
> > > bit
> > > > > noticeable. But when we chart get_rate on that server by region, no
> > one
> > > > > region stands out.
> > > > >
> > > > > get_rate chart by host:
> > > > >
> > > > > https://snag.gy/hmoiDw.jpg
> > > > >
> > > > > mutate_rate chart by host:
> > > > >
> > > > > https://snag.gy/jitdMN.jpg
> > > > >
> > > > > 
> > > > > Saad
> > > > >
> > > > >
> > > > > 
> > > > > Saad
> > > > >
> > > > >
> > > > > On Fri, Dec 2, 2016 at 2:34 PM, John Leach <
> jle...@splicemachine.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Here is what I see...
> > > > > >
> > > > > >
> > > > > > * Short Compaction Running on Heap
> > > > > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > > > > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547"
> -
> > > > > Thread
> > > > > > t@242
> > > > > >java.lang.Thread.State: RUNNABLE
> > > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> > > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > > internalEncode(FastDiffDeltaEncoder.java:245)
> > > > > >at org.apache.hadoop.hbase.io.encoding.
> > BufferedDataBlockEncoder.
> > > > > > encode(BufferedDataBlockEncoder.java:987)
> > > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > > encode(FastDiffDeltaEncoder.java:58)
> > > > > >at org.apache.hadoop.hbase.io
> > > > .hfile.HFileDataBlockEncoderImpl.encode(
> > > > > > HFileDataBlockEncoderImpl.java:97)
> > > > > >at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > > > > > HFileBlock.java:866)
> > > > > >at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > > > > > HFileWriterV2.java:270)
> > > > > >at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > > > > > HFileWriterV3.java:87)
> > > > > >at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > > > > > append(StoreFile.java:949)
> > > > > >at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > > Compactor.performCompaction(Compactor.java:282)
> > > > > >at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > > DefaultCompactor.compact(DefaultCompactor.java:105)
> > > > > >at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > > > > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> > > > > >at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > > > > > HStore.java:1233)
> > > > > >at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > > > > > 

Re: Hot Region Server With No Hot Region

2016-12-13 Thread Ted Yu
I was looking at CellCounter but it doesn't provide what you are looking
for.

Maybe we can enhance it such that given threshold on the number
of qualifiers in a row (say 100,000), output the rows which have at least
these many qualifiers.

On Tue, Dec 13, 2016 at 12:47 PM, Saad Mufti  wrote:

> Thanks everyone for the feedback. We tracked this down to having a bad
> design using dynamic columns, there were a few (very few) rows that
> accumulated up to 200,000 dynamic columns. When we got any activity that
> caused us to try to read one of these rows, it resulted in a hot region
> server.
>
> Follow up question, we are now in the process of cleaning up those rows as
> identified, but but some are so big that trying to read them in the cleanup
> process kills it with out of memory exceptions. Is there any way to
> identify rows with too many columns without actually reading them all?
>
> Thanks.
>
> 
> Saad
>
>
> On Sat, Dec 3, 2016 at 3:20 PM, Ted Yu  wrote:
>
> > I took a look at the stack trace.
> >
> > Region server log would give us more detail on the frequency and duration
> > of compactions.
> >
> > Cheers
> >
> > On Sat, Dec 3, 2016 at 7:39 AM, Jeremy Carroll 
> > wrote:
> >
> > > I would check compaction, investigate throttling if it's causing high
> > CPU.
> > >
> > > On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti 
> wrote:
> > >
> > > > No.
> > > >
> > > > 
> > > > Saad
> > > >
> > > >
> > > > On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu 
> > wrote:
> > > >
> > > > > Some how I couldn't access the pastebin (I am in China now).
> > > > > Did the region server showing hotspot host meta ?
> > > > > Thanks
> > > > >
> > > > > On Friday, December 2, 2016 11:53 AM, Saad Mufti <
> > > > saad.mu...@gmail.com>
> > > > > wrote:
> > > > >
> > > > >
> > > > >  We're in AWS with D2.4xLarge instances. Each instance has 12
> > > independent
> > > > > spindles/disks from what I can tell.
> > > > >
> > > > > We have charted get_rate and mutate_rate by host and
> > > > >
> > > > > a) mutate_rate shows no real outliers
> > > > > b) read_rate shows the overall rate on the "hotspot" region server
> > is a
> > > > bit
> > > > > higher than every other server, not severely but enough that it is
> a
> > > bit
> > > > > noticeable. But when we chart get_rate on that server by region, no
> > one
> > > > > region stands out.
> > > > >
> > > > > get_rate chart by host:
> > > > >
> > > > > https://snag.gy/hmoiDw.jpg
> > > > >
> > > > > mutate_rate chart by host:
> > > > >
> > > > > https://snag.gy/jitdMN.jpg
> > > > >
> > > > > 
> > > > > Saad
> > > > >
> > > > >
> > > > > 
> > > > > Saad
> > > > >
> > > > >
> > > > > On Fri, Dec 2, 2016 at 2:34 PM, John Leach <
> jle...@splicemachine.com
> > >
> > > > > wrote:
> > > > >
> > > > > > Here is what I see...
> > > > > >
> > > > > >
> > > > > > * Short Compaction Running on Heap
> > > > > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > > > > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547"
> -
> > > > > Thread
> > > > > > t@242
> > > > > >java.lang.Thread.State: RUNNABLE
> > > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> > > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > > internalEncode(FastDiffDeltaEncoder.java:245)
> > > > > >at org.apache.hadoop.hbase.io.encoding.
> > BufferedDataBlockEncoder.
> > > > > > encode(BufferedDataBlockEncoder.java:987)
> > > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > > encode(FastDiffDeltaEncoder.java:58)
> > > > > >at org.apache.hadoop.hbase.io
> > > > .hfile.HFileDataBlockEncoderImpl.encode(
> > > > > > HFileDataBlockEncoderImpl.java:97)
> > > > > >at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > > > > > HFileBlock.java:866)
> > > > > >at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > > > > > HFileWriterV2.java:270)
> > > > > >at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > > > > > HFileWriterV3.java:87)
> > > > > >at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > > > > > append(StoreFile.java:949)
> > > > > >at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > > Compactor.performCompaction(Compactor.java:282)
> > > > > >at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > > DefaultCompactor.compact(DefaultCompactor.java:105)
> > > > > >at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > > > > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> > > > > >at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > > > > > HStore.java:1233)
> > > > > >at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > > > > > HRegion.java:1770)
> > > > > >at 

Re: Hot Region Server With No Hot Region

2016-12-13 Thread Saad Mufti
Thanks everyone for the feedback. We tracked this down to having a bad
design using dynamic columns, there were a few (very few) rows that
accumulated up to 200,000 dynamic columns. When we got any activity that
caused us to try to read one of these rows, it resulted in a hot region
server.

Follow up question, we are now in the process of cleaning up those rows as
identified, but but some are so big that trying to read them in the cleanup
process kills it with out of memory exceptions. Is there any way to
identify rows with too many columns without actually reading them all?

Thanks.


Saad


On Sat, Dec 3, 2016 at 3:20 PM, Ted Yu  wrote:

> I took a look at the stack trace.
>
> Region server log would give us more detail on the frequency and duration
> of compactions.
>
> Cheers
>
> On Sat, Dec 3, 2016 at 7:39 AM, Jeremy Carroll 
> wrote:
>
> > I would check compaction, investigate throttling if it's causing high
> CPU.
> >
> > On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti  wrote:
> >
> > > No.
> > >
> > > 
> > > Saad
> > >
> > >
> > > On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu 
> wrote:
> > >
> > > > Some how I couldn't access the pastebin (I am in China now).
> > > > Did the region server showing hotspot host meta ?
> > > > Thanks
> > > >
> > > > On Friday, December 2, 2016 11:53 AM, Saad Mufti <
> > > saad.mu...@gmail.com>
> > > > wrote:
> > > >
> > > >
> > > >  We're in AWS with D2.4xLarge instances. Each instance has 12
> > independent
> > > > spindles/disks from what I can tell.
> > > >
> > > > We have charted get_rate and mutate_rate by host and
> > > >
> > > > a) mutate_rate shows no real outliers
> > > > b) read_rate shows the overall rate on the "hotspot" region server
> is a
> > > bit
> > > > higher than every other server, not severely but enough that it is a
> > bit
> > > > noticeable. But when we chart get_rate on that server by region, no
> one
> > > > region stands out.
> > > >
> > > > get_rate chart by host:
> > > >
> > > > https://snag.gy/hmoiDw.jpg
> > > >
> > > > mutate_rate chart by host:
> > > >
> > > > https://snag.gy/jitdMN.jpg
> > > >
> > > > 
> > > > Saad
> > > >
> > > >
> > > > 
> > > > Saad
> > > >
> > > >
> > > > On Fri, Dec 2, 2016 at 2:34 PM, John Leach  >
> > > > wrote:
> > > >
> > > > > Here is what I see...
> > > > >
> > > > >
> > > > > * Short Compaction Running on Heap
> > > > > "regionserver/ip-10-99-181-146.aolp-prd.us-east-1.ec2.
> > > > > aolcloud.net/10.99.181.146:60020-shortCompactions-1480229281547" -
> > > > Thread
> > > > > t@242
> > > > >java.lang.Thread.State: RUNNABLE
> > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > compressSingleKeyValue(FastDiffDeltaEncoder.java:270)
> > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > internalEncode(FastDiffDeltaEncoder.java:245)
> > > > >at org.apache.hadoop.hbase.io.encoding.
> BufferedDataBlockEncoder.
> > > > > encode(BufferedDataBlockEncoder.java:987)
> > > > >at org.apache.hadoop.hbase.io.encoding.FastDiffDeltaEncoder.
> > > > > encode(FastDiffDeltaEncoder.java:58)
> > > > >at org.apache.hadoop.hbase.io
> > > .hfile.HFileDataBlockEncoderImpl.encode(
> > > > > HFileDataBlockEncoderImpl.java:97)
> > > > >at org.apache.hadoop.hbase.io.hfile.HFileBlock$Writer.write(
> > > > > HFileBlock.java:866)
> > > > >at org.apache.hadoop.hbase.io.hfile.HFileWriterV2.append(
> > > > > HFileWriterV2.java:270)
> > > > >at org.apache.hadoop.hbase.io.hfile.HFileWriterV3.append(
> > > > > HFileWriterV3.java:87)
> > > > >at org.apache.hadoop.hbase.regionserver.StoreFile$Writer.
> > > > > append(StoreFile.java:949)
> > > > >at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > Compactor.performCompaction(Compactor.java:282)
> > > > >at org.apache.hadoop.hbase.regionserver.compactions.
> > > > > DefaultCompactor.compact(DefaultCompactor.java:105)
> > > > >at org.apache.hadoop.hbase.regionserver.DefaultStoreEngine$
> > > > > DefaultCompactionContext.compact(DefaultStoreEngine.java:124)
> > > > >at org.apache.hadoop.hbase.regionserver.HStore.compact(
> > > > > HStore.java:1233)
> > > > >at org.apache.hadoop.hbase.regionserver.HRegion.compact(
> > > > > HRegion.java:1770)
> > > > >at org.apache.hadoop.hbase.regionserver.CompactSplitThread$
> > > > > CompactionRunner.run(CompactSplitThread.java:520)
> > > > >at java.util.concurrent.ThreadPoolExecutor.runWorker(
> > > > > ThreadPoolExecutor.java:1142)
> > > > >at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> > > > > ThreadPoolExecutor.java:617)
> > > > >at java.lang.Thread.run(Thread.java:745)
> > > > >
> > > > >
> > > > > * WAL Syncs waiting…  ALL 5
> > > > > "sync.0" - Thread t@202
> > > > >java.lang.Thread.State: TIMED_WAITING
> > > > >at java.lang.Object.wait(Native Method)
> > > > >- waiting on 

Re: RegionSevevers repeatedly getting killed with GC Pause and Zookeeper timeout

2016-12-13 Thread Stack
On Tue, Dec 13, 2016 at 12:13 AM, Sandeep Reddy 
wrote:

> This week also we are facing same problem.
>
> At lest from 4 to 5 months we haven't changed any HBase configuration.
>
> All of sudden we are started seeing this pattern where regionservers are
> getting killed due GC pause & later zookeeper timeouts.
>
>
What changed? More reading/writing? Less machines in cluster?




> We are using 5 GB for HBase heap & 6 GB for bucket cache.
>
>
Why this particular setup?

Have you tried giving 11G to HBase instead? Onheap is more performant if
most of your workload fits in RAM (does it?). Otherwise bucket cache is
better ... Can you give HBase more heap?

What versions are you running?

The posted log is reporting on a server dying. As per Duo Zhang, have you
tried correlate GC logs w/ events in RegionServer log.

Have you tried any tuning/diagnosis at all?

St.Ack



>
> Following is the log from one of the regionserver:
>
> 2016-12-12 17:38:59,142 WARN  [regionserver60020.periodicFlusher]
> util.Sleeper: We slept 30938ms instead of 1ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2016-12-12 17:40:25,076 INFO  [SplitLogWorker-HOST30,60020,
> 1481556258263-SendThread(HOSTM5:4181)] zookeeper.ClientCnxn: Socket
> connection established to HOSTM5/192.168.190.179:4181, initiating session
> 2016-12-12 17:38:59,142 WARN  [regionserver60020] util.Sleeper: We slept
> 19044ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see http://hbase.apache.org/book.
> html#trouble.rs.runtime.zkexpired
> 2016-12-12 17:38:54,384 WARN  [regionserver60020.compactionChecker]
> util.Sleeper: We slept 23805ms instead of 1ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2016-12-12 17:38:16,281 INFO  [regionserver60020-SendThread(HOSTM3:4181)]
> zookeeper.ClientCnxn: Socket connection established to HOSTM3/
> 192.168.167.7:4181, initiating session
> 2016-12-12 17:40:25,091 INFO  [regionserver60020-SendThread(HOSTM1:4181)]
> zookeeper.ClientCnxn: Socket connection established to HOSTM1/
> 192.168.178.226:4181, initiating session
> 2016-12-12 17:40:25,093 INFO  [regionserver60020-SendThread(HOSTM3:4181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 128812ms for sessionid 0x558f30318e204de, closing socket connection and
> attempting reconnect
> 2016-12-12 17:40:25,093 INFO  [regionserver60020-SendThread(HOSTM2:4181)]
> zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session
> 0x558f30318e204df has expired, closing socket connection
> 2016-12-12 17:40:25,093 INFO  [SplitLogWorker-HOST30,60020,
> 1481556258263-SendThread(HOSTM5:4181)] zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x458f30318de051d has expired,
> closing socket connection
> 2016-12-12 17:40:25,089 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 42156 lease expired on region
> PostsAnalysis-2016-11-5,exam,1480313370104.4e37b0f96946a104474a8edbba4f87
> fd.
> 2016-12-12 17:40:25,193 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 42155 lease expired on region
> PostsAnalysis-2016-11-4,exam,1480313365296.4c80cf384fcdc7bfb7c83f625f936c
> fe.
> 2016-12-12 17:40:25,194 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> HOST30,60020,1481556258263: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing HOST30,60020,1481556258263 as
> dead server
> at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:370)
> at org.apache.hadoop.hbase.master.ServerManager.
> regionServerReport(ServerManager.java:275)
> at org.apache.hadoop.hbase.master.HMaster.
> regionServerReport(HMaster.java:1339)
> at org.apache.hadoop.hbase.protobuf.generated.
> RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(
> RegionServerStatusProtos.java:7912)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(
> FifoRpcScheduler.java:74)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing 

Re: [ANNOUNCE] New HBase Committer Josh Elser

2016-12-13 Thread Phil Yang
Congratulations!

Thanks,
Phil


2016-12-13 12:56 GMT+08:00 ramkrishna vasudevan <
ramkrishna.s.vasude...@gmail.com>:

> Congratulations Josh !!!
>
> Regards
> Ram
>
> On Tue, Dec 13, 2016 at 4:18 AM, Enis Söztutar  wrote:
>
> > Congrats Josh!
> >
> > Enis
> >
> > On Mon, Dec 12, 2016 at 11:39 AM, Esteban Gutierrez <
> este...@cloudera.com>
> > wrote:
> >
> > > Congrats and welcome, Josh!
> > >
> > > esteban.
> > >
> > >
> > > --
> > > Cloudera, Inc.
> > >
> > >
> > > On Sun, Dec 11, 2016 at 10:17 PM, Yu Li  wrote:
> > >
> > > > Congratulations and welcome!
> > > >
> > > > Best Regards,
> > > > Yu
> > > >
> > > > On 12 December 2016 at 12:47, Mikhail Antonov 
> > > > wrote:
> > > >
> > > > > Congratulations Josh!
> > > > >
> > > > > -Mikhail
> > > > >
> > > > > On Sun, Dec 11, 2016 at 5:20 PM, 张铎  wrote:
> > > > >
> > > > > > Congratulations!
> > > > > >
> > > > > > 2016-12-12 9:03 GMT+08:00 Jerry He :
> > > > > >
> > > > > > > Congratulations , Josh!
> > > > > > >
> > > > > > > Good work on the PQS too.
> > > > > > >
> > > > > > > Jerry
> > > > > > >
> > > > > > > On Sun, Dec 11, 2016 at 12:14 PM, Josh Elser <
> els...@apache.org>
> > > > > wrote:
> > > > > > >
> > > > > > > > Thanks, all. I'm looking forward to continuing to work with
> you
> > > > all!
> > > > > > > >
> > > > > > > >
> > > > > > > > Nick Dimiduk wrote:
> > > > > > > >
> > > > > > > >> On behalf of the Apache HBase PMC, I am pleased to announce
> > that
> > > > > Josh
> > > > > > > >> Elser
> > > > > > > >> has accepted the PMC's invitation to become a committer on
> the
> > > > > > project.
> > > > > > > We
> > > > > > > >> appreciate all of Josh's generous contributions thus far and
> > > look
> > > > > > > forward
> > > > > > > >> to his continued involvement.
> > > > > > > >>
> > > > > > > >> Allow me to be the first to congratulate and welcome Josh
> into
> > > his
> > > > > new
> > > > > > > >> role!
> > > > > > > >>
> > > > > > > >>
> > > > > > >
> > > > > >
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > Thanks,
> > > > > Michael Antonov
> > > > >
> > > >
> > >
> >
>


Graph Analytics on HBase with HGraphDB and Giraph

2016-12-13 Thread Robert Yokota
Hi,

In case anyone is interested in analyzing graphs in HBase with Apache
Giraph, this might be helpful:

https://yokota.blog/2016/12/13/graph-analytics-on-hbase-with-hgraphdb-and-giraph/


Re: RegionSevevers repeatedly getting killed with GC Pause and Zookeeper timeout

2016-12-13 Thread 张铎
Can you post the GC logs around 2016-12-12 17:38? It seems that your RS run
STW Full GCs time by time.

As stack suggested above, you'd better get a heap dump and find out the
objects that occupy the heap space.

And what's your on heap block cache config and memstore config?

Thanks.

2016-12-13 16:13 GMT+08:00 Sandeep Reddy :

> This week also we are facing same problem.
>
> At lest from 4 to 5 months we haven't changed any HBase configuration.
>
> All of sudden we are started seeing this pattern where regionservers are
> getting killed due GC pause & later zookeeper timeouts.
>
> We are using 5 GB for HBase heap & 6 GB for bucket cache.
>
>
> Following is the log from one of the regionserver:
>
> 2016-12-12 17:38:59,142 WARN  [regionserver60020.periodicFlusher]
> util.Sleeper: We slept 30938ms instead of 1ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2016-12-12 17:40:25,076 INFO  [SplitLogWorker-HOST30,60020,
> 1481556258263-SendThread(HOSTM5:4181)] zookeeper.ClientCnxn: Socket
> connection established to HOSTM5/192.168.190.179:4181, initiating session
> 2016-12-12 17:38:59,142 WARN  [regionserver60020] util.Sleeper: We slept
> 19044ms instead of 3000ms, this is likely due to a long garbage collecting
> pause and it's usually bad, see http://hbase.apache.org/book.
> html#trouble.rs.runtime.zkexpired
> 2016-12-12 17:38:54,384 WARN  [regionserver60020.compactionChecker]
> util.Sleeper: We slept 23805ms instead of 1ms, this is likely due to a
> long garbage collecting pause and it's usually bad, see
> http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
> 2016-12-12 17:38:16,281 INFO  [regionserver60020-SendThread(HOSTM3:4181)]
> zookeeper.ClientCnxn: Socket connection established to HOSTM3/
> 192.168.167.7:4181, initiating session
> 2016-12-12 17:40:25,091 INFO  [regionserver60020-SendThread(HOSTM1:4181)]
> zookeeper.ClientCnxn: Socket connection established to HOSTM1/
> 192.168.178.226:4181, initiating session
> 2016-12-12 17:40:25,093 INFO  [regionserver60020-SendThread(HOSTM3:4181)]
> zookeeper.ClientCnxn: Client session timed out, have not heard from server
> in 128812ms for sessionid 0x558f30318e204de, closing socket connection and
> attempting reconnect
> 2016-12-12 17:40:25,093 INFO  [regionserver60020-SendThread(HOSTM2:4181)]
> zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session
> 0x558f30318e204df has expired, closing socket connection
> 2016-12-12 17:40:25,093 INFO  [SplitLogWorker-HOST30,60020,
> 1481556258263-SendThread(HOSTM5:4181)] zookeeper.ClientCnxn: Unable to
> reconnect to ZooKeeper service, session 0x458f30318de051d has expired,
> closing socket connection
> 2016-12-12 17:40:25,089 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 42156 lease expired on region
> PostsAnalysis-2016-11-5,exam,1480313370104.4e37b0f96946a104474a8edbba4f87
> fd.
> 2016-12-12 17:40:25,193 INFO  [regionserver60020.leaseChecker]
> regionserver.HRegionServer: Scanner 42155 lease expired on region
> PostsAnalysis-2016-11-4,exam,1480313365296.4c80cf384fcdc7bfb7c83f625f936c
> fe.
> 2016-12-12 17:40:25,194 FATAL [regionserver60020]
> regionserver.HRegionServer: ABORTING region server
> HOST30,60020,1481556258263: org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing HOST30,60020,1481556258263 as
> dead server
> at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:370)
> at org.apache.hadoop.hbase.master.ServerManager.
> regionServerReport(ServerManager.java:275)
> at org.apache.hadoop.hbase.master.HMaster.
> regionServerReport(HMaster.java:1339)
> at org.apache.hadoop.hbase.protobuf.generated.
> RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(
> RegionServerStatusProtos.java:7912)
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
> at org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(
> FifoRpcScheduler.java:74)
> at java.util.concurrent.Executors$RunnableAdapter.
> call(Executors.java:471)
> at java.util.concurrent.FutureTask.run(FutureTask.java:262)
> at java.util.concurrent.ThreadPoolExecutor.runWorker(
> ThreadPoolExecutor.java:1145)
> at java.util.concurrent.ThreadPoolExecutor$Worker.run(
> ThreadPoolExecutor.java:615)
> at java.lang.Thread.run(Thread.java:745)
>
> org.apache.hadoop.hbase.YouAreDeadException: 
> org.apache.hadoop.hbase.YouAreDeadException:
> Server REPORT rejected; currently processing HOST30,60020,1481556258263 as
> dead server
> at org.apache.hadoop.hbase.master.ServerManager.
> checkIsDead(ServerManager.java:370)
> at org.apache.hadoop.hbase.master.ServerManager.
> regionServerReport(ServerManager.java:275)

Re: RegionSevevers repeatedly getting killed with GC Pause and Zookeeper timeout

2016-12-13 Thread Sandeep Reddy
This week also we are facing same problem.

At lest from 4 to 5 months we haven't changed any HBase configuration.

All of sudden we are started seeing this pattern where regionservers are 
getting killed due GC pause & later zookeeper timeouts.

We are using 5 GB for HBase heap & 6 GB for bucket cache.


Following is the log from one of the regionserver:

2016-12-12 17:38:59,142 WARN  [regionserver60020.periodicFlusher] util.Sleeper: 
We slept 30938ms instead of 1ms, this is likely due to a long garbage 
collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-12-12 17:40:25,076 INFO  
[SplitLogWorker-HOST30,60020,1481556258263-SendThread(HOSTM5:4181)] 
zookeeper.ClientCnxn: Socket connection established to 
HOSTM5/192.168.190.179:4181, initiating session
2016-12-12 17:38:59,142 WARN  [regionserver60020] util.Sleeper: We slept 
19044ms instead of 3000ms, this is likely due to a long garbage collecting 
pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-12-12 17:38:54,384 WARN  [regionserver60020.compactionChecker] 
util.Sleeper: We slept 23805ms instead of 1ms, this is likely due to a long 
garbage collecting pause and it's usually bad, see 
http://hbase.apache.org/book.html#trouble.rs.runtime.zkexpired
2016-12-12 17:38:16,281 INFO  [regionserver60020-SendThread(HOSTM3:4181)] 
zookeeper.ClientCnxn: Socket connection established to 
HOSTM3/192.168.167.7:4181, initiating session
2016-12-12 17:40:25,091 INFO  [regionserver60020-SendThread(HOSTM1:4181)] 
zookeeper.ClientCnxn: Socket connection established to 
HOSTM1/192.168.178.226:4181, initiating session
2016-12-12 17:40:25,093 INFO  [regionserver60020-SendThread(HOSTM3:4181)] 
zookeeper.ClientCnxn: Client session timed out, have not heard from server in 
128812ms for sessionid 0x558f30318e204de, closing socket connection and 
attempting reconnect
2016-12-12 17:40:25,093 INFO  [regionserver60020-SendThread(HOSTM2:4181)] 
zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 
0x558f30318e204df has expired, closing socket connection
2016-12-12 17:40:25,093 INFO  
[SplitLogWorker-HOST30,60020,1481556258263-SendThread(HOSTM5:4181)] 
zookeeper.ClientCnxn: Unable to reconnect to ZooKeeper service, session 
0x458f30318de051d has expired, closing socket connection
2016-12-12 17:40:25,089 INFO  [regionserver60020.leaseChecker] 
regionserver.HRegionServer: Scanner 42156 lease expired on region 
PostsAnalysis-2016-11-5,exam,1480313370104.4e37b0f96946a104474a8edbba4f87fd.
2016-12-12 17:40:25,193 INFO  [regionserver60020.leaseChecker] 
regionserver.HRegionServer: Scanner 42155 lease expired on region 
PostsAnalysis-2016-11-4,exam,1480313365296.4c80cf384fcdc7bfb7c83f625f936cfe.
2016-12-12 17:40:25,194 FATAL [regionserver60020] regionserver.HRegionServer: 
ABORTING region server HOST30,60020,1481556258263: 
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently 
processing HOST30,60020,1481556258263 as dead server
at 
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:370)
at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:275)
at 
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1339)
at 
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:7912)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at 
org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:745)

org.apache.hadoop.hbase.YouAreDeadException: 
org.apache.hadoop.hbase.YouAreDeadException: Server REPORT rejected; currently 
processing HOST30,60020,1481556258263 as dead server
at 
org.apache.hadoop.hbase.master.ServerManager.checkIsDead(ServerManager.java:370)
at 
org.apache.hadoop.hbase.master.ServerManager.regionServerReport(ServerManager.java:275)
at 
org.apache.hadoop.hbase.master.HMaster.regionServerReport(HMaster.java:1339)
at 
org.apache.hadoop.hbase.protobuf.generated.RegionServerStatusProtos$RegionServerStatusService$2.callBlockingMethod(RegionServerStatusProtos.java:7912)
at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2027)
at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:108)
at 
org.apache.hadoop.hbase.ipc.FifoRpcScheduler$1.run(FifoRpcScheduler.java:74)