from:"Heng Chen"

Re: [ANNOUNCE] New HBase committer Phil Yang

2016-11-29 Thread Heng Chen

Congratulations!!

2016-11-30 8:32 GMT+08:00 Stephen Jiang :
> Congratulations, Phil!
>
> On Tue, Nov 29, 2016 at 2:42 PM, Andrew Purtell  wrote:
>
>> Congratulations and welcome, Phil!
>>
>>
>> On Tue, Nov 29, 2016 at 1:49 AM, Duo Zhang  wrote:
>>
>> > On behalf of the Apache HBase PMC, I am pleased to announce that Phil
>> Yang
>> > has accepted the PMC's invitation to become a committer on the project.
>> We
>> > appreciate all of Phil's generous contributions thus far and look forward
>> > to his continued involvement.
>> >
>> > Congratulations and welcome, Phil!
>> >
>>
>>
>>
>> --
>> Best regards,
>>
>>- Andy
>>
>> Problems worthy of attack prove their worth by hitting back. - Piet Hein
>> (via Tom White)
>>

Re: Use experience and performance data of offheap from Alibaba online cluster

2016-11-19 Thread Heng Chen

The performance looks great!

2016-11-19 18:03 GMT+08:00 Ted Yu :
> Opening a JIRA would be fine.
> This makes it easier for people to obtain the patch(es).
>
> Cheers
>
>> On Nov 18, 2016, at 11:35 PM, Anoop John  wrote:
>>
>> Because of some compatibility issues, we decide that this will be done
>> in 2.0 only..  Ya as Andy said, it would be great to share the 1.x
>> backported patches.  Is it a mega patch at ur end?  Or issue by issue
>> patches?  Latter would be best.  Pls share patches in some place and a
>> list of issues backported. I can help with verifying the issues once
>> so as to make sure we dont miss any...
>>
>> -Anoop-
>>
>>> On Sat, Nov 19, 2016 at 12:32 AM, Enis Söztutar  wrote:
>>> Thanks for sharing this. Great work.
>>>
>>> I don't see any reason why we cannot backport to branch-1.
>>>
>>> Enis
>>>
>>> On Fri, Nov 18, 2016 at 9:37 AM, Andrew Purtell 
>>> wrote:
>>>
 Yes, please, the patches will be useful to the community even if we decide
 not to backport into an official 1.x release.


>> On Nov 18, 2016, at 12:25 PM, Bryan Beaudreault <
> bbeaudrea...@hubspot.com> wrote:
>
> Is the backported patch available anywhere? Not seeing it on the
 referenced
> JIRA. If it ends up not getting officially backported to branch-1 due to
> 2.0 around the corner, some of us who build our own deploy may want to
> integrate into our builds. Thanks! These numbers look great
>
>> On Fri, Nov 18, 2016 at 12:20 PM Anoop John 
 wrote:
>>
>> Hi Yu Li
>>  Good to see that the off heap work help you..  The perf
>> numbers looks great.  So this is a compare of on heap L1 cache vs off
 heap
>> L2 cache(HBASE-11425 enabled).   So for 2.0 we should make L2 off heap
>> cache ON by default I believe.  Will raise a jira for that we can
 discuss
>> under that.   Seems like L2 off heap cache for data blocks and L1 cache
 for
>> index blocks seems a right choice.
>>
>> Thanks for the backport and the help in testing the feature..  You were
>> able to find some corner case bugs and helped community to fix them..
>> Thanks goes to ur whole team.
>>
>> -Anoop-
>>
>>
>>> On Fri, Nov 18, 2016 at 10:14 PM, Yu Li  wrote:
>>>
>>> Sorry guys, let me retry the inline images:
>>>
>>> Performance w/o offheap:
>>>
>>>
>>> Performance w/ offheap:
>>>
>>>
>>> Peak Get QPS of one single RS during Singles' Day (11/11):
>>>
>>>
>>>
>>> And attach the files in case inline still not working:
>>>
>>> Performance_without_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwbWEzUGktYVIya3JkcXVjRkFvVG
 NtM0VxWC1n/view?usp=drive_web
>>>
>>>
>>> Performance_with_offheap.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uweGR2cnJEU0M1MWwtRFJ5YkxUeF
 VrcUdPc2ww/view?usp=drive_web
>>>
>>>
>>> Peak_Get_QPS_of_Single_RS.png
>>> <
>> https://drive.google.com/file/d/0B017Q40_F5uwQ2FkR2k0ZmEtRVNGSFp5RUxHM3
 F6bHpNYnJz/view?usp=drive_web
>>>
>>>
>>>
>>>
>>> Best Regards,
>>> Yu
>>>
 On 18 November 2016 at 19:29, Ted Yu  wrote:

 Yu:
 With positive results, more hbase users would be asking for the
 backport
 of offheap read path patches.

 Do you think you or your coworker has the bandwidth to publish
 backport
 for branch-1 ?

 Thanks

> On Nov 18, 2016, at 12:11 AM, Yu Li  wrote:
>
> Dear all,
>
> We have backported read path offheap (HBASE-11425) to our customized
 hbase-1.1.2 (thanks @Anoop for the help/support) and run it online for
>> more
 than a month, and would like to share our experience, for what it's
>> worth
 (smile).
>
> Generally speaking, we gained a better and more stable
 throughput/performance with offheap, and below are some details:
> 1. QPS become more stable with offheap
>
> Performance w/o offheap:
>
>
>
> Performance w/ offheap:
>
>
>
> These data come from our online A/B test cluster (with 450 physical
 machines, and each with 256G memory + 64 core) with real world
>> workloads,
 it shows using offheap we could gain a more stable throughput as well
 as
 better performance
>
> Not showing fully online data here because for online we published
 the
 version with both offheap and NettyRpcServer together, so no
 standalone
 comparison data for offheap
>
> 2. Full GC frequency and

Re: [ANNOUNCE] Stephen Yuan Jiang joins Apache HBase PMC

2016-10-16 Thread Heng Chen

Congrats!  :)

2016-10-16 8:19 GMT+08:00 Jerry He :
> Congratulations, Stephen.
>
> Jerry
>
> On Fri, Oct 14, 2016 at 12:56 PM, Dima Spivak  wrote:
>
>> Congrats, Stephen!
>>
>> -Dima
>>
>> On Fri, Oct 14, 2016 at 11:27 AM, Enis Söztutar  wrote:
>>
>> > On behalf of the Apache HBase PMC, I am happy to announce that Stephen
>> has
>> > accepted our invitation to become a PMC member of the Apache HBase
>> project.
>> >
>> > Stephen has been working on HBase for a couple of years, and is already a
>> > committer for more than a year. Apart from his contributions in proc v2,
>> > hbck and other areas, he is also helping for the 2.0 release which is the
>> > most important milestone for the project this year.
>> >
>> > Welcome to the PMC Stephen,
>> > Enis
>> >
>>

Re: Increased response time of hbase calls

2016-09-22 Thread Heng Chen

Not sure hbase m7 is which version of hbase in community.

Is your batch load job some kind of bulk load or just call HTable API
to dump data to HBase?


2016-09-22 14:30 GMT+08:00 Dima Spivak :
> Hey Deepak,
>
> Assuming I understand your question, I think you'd be better served
> reaching out to MapR directly. Our community isn't involved in M7 so the
> average user (or dev) wouldn't know about the ins and outs of that
> offering.
>
> On Wednesday, September 21, 2016, Deepak Khandelwal <
> dkhandelwal@gmail.com> wrote:
>
>> Hi all
>>
>> I am facing an issue while accessing data from an hbase m7 table which has
>> about 50 million records.
>>
>> In a single Api request, we make 3 calls to hbase m7.
>> 1. Single Multi get to fetch about 30 records
>> 2. Single multi-put to update about 500 records
>> 3. Single multi-get to fetch about 15 records
>>
>> We consistently get the response in less than 200 seconds for approx
>> 99%calls. We have a tps of about 200 with 8vm's.
>> But we get issue everyday between 4pm and 6pm when Api response time gets
>> significant increase to from 200ms to 7-8sec. This happens because we have
>> a daily batch load That runs between 4and 6pm that puts multiple entries
>> into same hbase table.
>>
>> We are trying to find a solution to this problem that why response time
>> increases when batch load runs. We cannot change the time of batch job. Is
>> there anything we could do to resolve this issue?any help or pointers would
>> be much appreciated. Thanks
>>
>
>
> --
> -Dima

Re: [ANNOUNCE] Duo Zhang (张铎) joins the Apache HBase PMC

2016-09-06 Thread Heng Chen

OH!  congrats,  DUO!

2016-09-07 12:26 GMT+08:00 Stack :

> On behalf of the Apache HBase PMC I am pleased to announce that 张铎
> has accepted our invitation to become a PMC member on the Apache
> HBase project. Duo has healthy notions on where the project should be
> headed and over the last year and more has been working furiously to take
> us there.
>
> Please join me in welcoming Duo to the HBase PMC!
>
> One of us!
> St.Ack
>

Re: hbase get big table problem

2016-06-22 Thread Heng Chen

8000/200 = 40,  if your table balance enough,  each RS will serve 40
requests per second,  that is OK for RS.   Have you try set xmn smaller to
reduce young generation?

2016-06-22 16:12 GMT+08:00 jinhong lu <lujinho...@gmail.com>:

> 400 regions, 8000 eps for the whole table. hbase 1.0, AND heap
> -Xmx32G,-Xms32G, -Xmn4G
>
> Thanks,
> lujinhong
>
> > 在 2016年6月22日，15:53，Heng Chen <heng.chen.1...@gmail.com> 写道：
> >
> > How many regions do you have for the table?  8000 qps for one RS or for
> the
> > whole table?  What's your java heap size now?  and what's your hbase
> > version?
> >
> >
> > 2016-06-22 12:39 GMT+08:00 jinhong lu <lujinho...@gmail.com>:
> >
> >> I got a cluster of 200 regionserver, and one of the tables is about 3T
> and
> >> 5 billion lines. Is it possible to get about 8000 Gets per second(about
> >> 100,000 lines)?
> >>
> >> I found YOUNG GC occurs every several senconds, each GC cost about
> >> 1second. if I set -Xmn bigger, the GC occurs every several minutes, but
> >> each GC cost more time.
> >>
> >> Any suggetion? thanks.
> >>
> >>
> >>
> >> =
> >> Thanks,
> >> lujinhong
> >>
> >>
>
>

Re: after server restart - getting exception - java.io.IOException: Timed out waiting for lock for row

2016-06-22 Thread Heng Chen

which thread hold the row lock? could you dump the jstack with 'jstack -l
pid' ?

2016-06-22 16:14 GMT+08:00 vishnu rao <jaihind...@gmail.com>:

> hi Heng.
>
> 2016-06-22 08:13:42,256 WARN
> [B.defaultRpcServer.handler=32,queue=2,port=16020] regionserver.HRegion:
> Failed getting lock in batch put,
> row=\x01\xD6\xFD\xC9\xDC\xE4\x08\xC4\x0D\xBESM\xC2\x82\x14Z
>
> java.io.IOException: Timed out waiting for lock for row:
> \x01\xD6\xFD\xC9\xDC\xE4\x08\xC4\x0D\xBESM\xC2\x82\x14Z
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5051)
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2944)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2801)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2743)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2031)
>
> at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>
> at java.lang.Thread.run(Thread.java:745)
>
> On Wed, Jun 22, 2016 at 3:50 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > Could you paste the whole jstack and relates rs log?   It seems row write
> > lock was occupied by some thread.  Need more information to find it.
> >
> > 2016-06-22 13:48 GMT+08:00 vishnu rao <jaihind...@gmail.com>:
> >
> > > need some help. this has happened for 2 of my servers
> > > -
> > >
> > > *[B.defaultRpcServer.handler=2,queue=2,port=16020]
> regionserver.HRegion:
> > > Failed getting lock in batch put,
> > > row=a\xF7\x1D\xCBdR\xBC\xEC_\x18D>\xA2\xD0\x95\xFF*
> > >
> > > *java.io.IOException: Timed out waiting for lock for row:
> > > a\xF7\x1D\xCBdR\xBC\xEC_\x18D>\xA2\xD0\x95\xFF*
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5051)
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2944)
> > >
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2801)
> > >
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2743)
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2031)
> > >
> > > at
> > >
> > >
> >
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
> > >
> > > at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
> > >
> > > at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
> > >
> > > at
> > >
> >
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
> > >
> > > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> > >
> > > at java.lang.Thread.run(Thread.java:745)
> > >
> > > --
> > > with regards,
> > > ch Vishnu
> > > mash213.wordpress.com
> > > doodle-vishnu.blogspot.in
> > >
> >
>
>
>
> --
> with regards,
> ch Vishnu
> mash213.wordpress.com
> doodle-vishnu.blogspot.in
>

Re: hbase get big table problem

2016-06-22 Thread Heng Chen

How many regions do you have for the table?  8000 qps for one RS or for the
whole table?  What's your java heap size now?  and what's your hbase
version?


2016-06-22 12:39 GMT+08:00 jinhong lu :

> I got a cluster of 200 regionserver, and one of the tables is about 3T and
> 5 billion lines. Is it possible to get about 8000 Gets per second(about
> 100,000 lines)?
>
> I found YOUNG GC occurs every several senconds, each GC cost about
> 1second. if I set -Xmn bigger, the GC occurs every several minutes, but
> each GC cost more time.
>
> Any suggetion? thanks.
>
>
>
> =
> Thanks,
> lujinhong
>
>

Re: after server restart - getting exception - java.io.IOException: Timed out waiting for lock for row

2016-06-22 Thread Heng Chen

Could you paste the whole jstack and relates rs log?   It seems row write
lock was occupied by some thread.  Need more information to find it.

2016-06-22 13:48 GMT+08:00 vishnu rao :

> need some help. this has happened for 2 of my servers
> -
>
> *[B.defaultRpcServer.handler=2,queue=2,port=16020]  regionserver.HRegion:
> Failed getting lock in batch put,
> row=a\xF7\x1D\xCBdR\xBC\xEC_\x18D>\xA2\xD0\x95\xFF*
>
> *java.io.IOException: Timed out waiting for lock for row:
> a\xF7\x1D\xCBdR\xBC\xEC_\x18D>\xA2\xD0\x95\xFF*
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.getRowLockInternal(HRegion.java:5051)
>
> at
>
> org.apache.hadoop.hbase.regionserver.HRegion.doMiniBatchMutation(HRegion.java:2944)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2801)
>
> at
> org.apache.hadoop.hbase.regionserver.HRegion.batchMutate(HRegion.java:2743)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doBatchOp(RSRpcServices.java:692)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.doNonAtomicRegionMutation(RSRpcServices.java:654)
>
> at
>
> org.apache.hadoop.hbase.regionserver.RSRpcServices.multi(RSRpcServices.java:2031)
>
> at
>
> org.apache.hadoop.hbase.protobuf.generated.ClientProtos$ClientService$2.callBlockingMethod(ClientProtos.java:32213)
>
> at org.apache.hadoop.hbase.ipc.RpcServer.call(RpcServer.java:2114)
>
> at org.apache.hadoop.hbase.ipc.CallRunner.run(CallRunner.java:101)
>
> at
> org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(RpcExecutor.java:130)
>
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
>
> at java.lang.Thread.run(Thread.java:745)
>
> --
> with regards,
> ch Vishnu
> mash213.wordpress.com
> doodle-vishnu.blogspot.in
>

Re: Re: maybe waste on blockCache

2016-06-19 Thread Heng Chen

If BlockCache is on,  the block will be cached in memory.  Please see
documents about blockcache, http://hbase.apache.org/book.html#block.cache

2016-06-17 8:57 GMT+08:00 WangYQ <wangyongqiang0...@163.com>:

>
>
> I set all user tables with blockCache on, but set the IN_MEMORY conf to
> false
>
>
>
>
>
>
>
>
> At 2016-06-16 18:18:44, "Heng Chen" <heng.chen.1...@gmail.com> wrote:
> >bq. if we do not set any user tables IN_MEMORY to true, then the whole
> >hbase just need to cache hbase:meta data to in_memory LruBlockCache.
> >
> >You set blockcache to be false for other tables?
> >
> >2016-06-16 16:21 GMT+08:00 WangYQ <wangyongqiang0...@163.com>:
> >
> >> in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max
> heap
> >> to 10G
> >> in default:
> >> the size of in_memory priority of LruBlockCache is :
> >> 10G * 0.4 * 0.25 = 1G
> >>
> >>
> >> 0.4: hfile.block.cache.size
> >> 0.25: hbase.lru.blockcache.memory.percentage
> >>
> >>
> >> if we do not set any user tables IN_MEMORY to true, then the whole hbase
> >> just need to cache hbase:meta data to in_memory LruBlockCache.
> >> hbase:meta does not split , so just need one regionServer to cache, so
> >> there is some waste in blockCache
> >>
> >>
> >> i think the regionServer open hbase:meta need to set  in_memory
> >> LruBlockCache to a certain size
> >> other regionServer set hbase.lru.blockcache.memory.percentage to 0, do
> not
> >> need to allocate  in_memory LruBlockCache.
>

Re: maybe waste on blockCache

2016-06-16 Thread Heng Chen

bq. if we do not set any user tables IN_MEMORY to true, then the whole
hbase just need to cache hbase:meta data to in_memory LruBlockCache.

You set blockcache to be false for other tables?

2016-06-16 16:21 GMT+08:00 WangYQ :

> in hbase 0.98.10, if we use LruBlockCache, and set regionServer's max heap
> to 10G
> in default:
> the size of in_memory priority of LruBlockCache is :
> 10G * 0.4 * 0.25 = 1G
>
>
> 0.4: hfile.block.cache.size
> 0.25: hbase.lru.blockcache.memory.percentage
>
>
> if we do not set any user tables IN_MEMORY to true, then the whole hbase
> just need to cache hbase:meta data to in_memory LruBlockCache.
> hbase:meta does not split , so just need one regionServer to cache, so
> there is some waste in blockCache
>
>
> i think the regionServer open hbase:meta need to set  in_memory
> LruBlockCache to a certain size
> other regionServer set hbase.lru.blockcache.memory.percentage to 0, do not
> need to allocate  in_memory LruBlockCache.

Re: Is there any way to release rpc handler when all handlers are occupied by big request.

2016-06-14 Thread Heng Chen

My HBase version is 0.98.17.

My key point is that should we supply one tool which could kill big request
just like mysql kill slow query?

2016-06-15 1:50 GMT+08:00 Esteban Gutierrez <este...@cloudera.com>:

> Hi Heng,
>
> That sound like some issue from older versions from HBase. Can you please
> give us a little more of details? like which version of  HBase are you
> using and a stack dump from the RS?
>
> cheers,
> esteban.
>
>
> --
> Cloudera, Inc.
>
>
> On Mon, Jun 13, 2016 at 8:35 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > Currently,  we found sometimes our RS handlers were occupied by some big
> > request. For example, when handlers  read one same big block from hdfs
> > simultaneously,  all handlers will wait, except one handler do read block
> > from hdfs and put it in cache, and other handlers will read the block
> from
> > cache.  At this time, RS fall in stuck mode.  And i have to restart the
> RS
> > to recover.
> >
> >
> > Is there any way i can release the handlers and reject the same request?
> >
>

Is there any way to release rpc handler when all handlers are occupied by big request.

2016-06-13 Thread Heng Chen

Currently,  we found sometimes our RS handlers were occupied by some big
request. For example, when handlers  read one same big block from hdfs
simultaneously,  all handlers will wait, except one handler do read block
from hdfs and put it in cache, and other handlers will read the block from
cache.  At this time, RS fall in stuck mode.  And i have to restart the RS
to recover.


Is there any way i can release the handlers and reject the same request?

Re: open store and open storeFile use the same conf

2016-06-03 Thread Heng Chen

They have different default values,  and according to contact of
HSTORE_OPEN_AND_CLOSE_THREADS_MAX,   i should be OK. It just represents the
max limit threads in pool.

/**
 * The default number for the max number of threads used for opening and
 * closing stores or store files in parallel
 */
public static final int DEFAULT_HSTORE_OPEN_AND_CLOSE_THREADS_MAX = 1;


2016-06-03 14:15 GMT+08:00 WangYQ :

> in hbase 0.98.10, class HRegion, line 1277 to 1286:
> there are two methods: "getStoreOpenAndCloseThread" and
> "getStoreFileOpenAndCloseThreadPool", getStoreOpenAndCloseThread is to get
> the thread pool size for open/close Stores, and
> getStoreFileOpenAndCloseThreadPool is used to get pool size for open/close
> storeFiles, but they use the same conf: "HSTORE_OPEN_AND_CLOSE_THREADS_MAX".
>
>
> there shoud be no relation with store number and storeFile number, so we
> should use different conf for two methods
>
>

Re: Memstore blocking

2016-06-02 Thread Heng Chen

Something wrong in snappy Library?

Have you try to not use compression?

2016-06-03 11:13 GMT+08:00 吴国泉wgq :

> HI STACK:
>
>1.   The log is very large,so I pick some of it. But it seems not
> provide valuable info.Here is the region named
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be  can’t flush.
>
>   When the flush Thread works well, The log is like this:
>   2016-05-24 12:38:27,071 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 16681
> 2016-05-24 12:38:37,071 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 8684
> 2016-05-24 12:38:43,753 INFO  [MemStoreFlusher.2] regionserver.HRegion:
> Started memstore flush for
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., current region
> memstore size 305.3 K
> 2016-05-24 12:38:43,753 WARN  [MemStoreFlusher.2] wal.FSHLog: Couldn't
> find oldest seqNum for the region we are about to flush:
> [dd8f92e3c161a8534b30ab17c28ae8be]
> 2016-05-24 12:38:43,816 INFO  [MemStoreFlusher.2]
> regionserver.DefaultStoreFlusher: Flushed, sequenceid=54259, memsize=305.3
> K, hasBloomFilter=true, into tmp file
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d
> 2016-05-24 12:38:43,822 DEBUG [MemStoreFlusher.2]
> regionserver.HRegionFileSystem: Committing store file
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/.tmp/fddbb05945354d5cbdae4afd24e5bb9d
> as
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d
> 2016-05-24 12:38:43,837 INFO  [MemStoreFlusher.2] regionserver.HStore:
> Added
> hdfs://cluster-tc-qtrace:8020/hbase/tc_qtrace/data/default/qtrace/dd8f92e3c161a8534b30ab17c28ae8be/t/fddbb05945354d5cbdae4afd24e5bb9d,
> entries=108, sequenceid=54259, filesize=68.3 K
> 2016-05-24 12:38:43,837 INFO  [MemStoreFlusher.2] regionserver.HRegion:
> Finished memstore flush of ~305.3 K/312664, currentsize=0/0 for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be. in 84ms,
> sequenceid=54259, compaction requested=true
>
>   And when the  flush Thread does not  work well,The log just always
> shows :
> 2016-05-25 14:57:02,588 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 18068
> 2016-05-25 14:57:12,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 13165
> 2016-05-25 14:57:20,656 DEBUG [MemStoreFlusher.36] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:57:22,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 5526
> 2016-05-25 14:57:28,113 DEBUG [MemStoreFlusher.34] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:57:32,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 8178
> 2016-05-25 14:57:40,767 DEBUG [MemStoreFlusher.9] regionserver.HRegion:
> NOT flushing memstore for region
> qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be., flushing=true,
> writesEnabled=true
> 2016-05-25 14:57:42,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 22068
> 2016-05-25 14:57:52,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 5492
> 2016-05-25 14:58:02,587 INFO  [regionserver60020.periodicFlusher]
> regionserver.HRegionServer: regionserver60020.periodicFlusher requesting
> flush for region qtrace,,1458012479440.dd8f92e3c161a8534b30ab17c28ae8be.
> after a delay of 10472
> 2016-05-25 14:58:04,655 DEBUG [MemStoreFlusher.23] regionserver.HRegion:
> NOT flushing memstore for region
>

Re: region stuck in failed close state

2016-05-30 Thread Heng Chen

@Ted.  the log i paste is in INFO Level,  i have change it to be DEBUG
Level yesterday,  If this happen again, i will upload the debug level log.

2016-05-30 21:57 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:

> There is debug log in HRegion#replayWALFlushStartMarker :
>
>   LOG.debug(getRegionInfo().getEncodedName() + " : "
>
>   + " Prepared flush with seqId:" +
> flush.getFlushSequenceNumber());
>
> ...
>
> LOG.debug(getRegionInfo().getEncodedName() + " : "
>
>   + " Prepared empty flush with seqId:" +
> flush.getFlushSequenceNumber());
>
> I searched for them in the log you attached to HBASE-15900 but didn't find
> any occurrence.
>
> FYI
>
> On Mon, May 30, 2016 at 2:59 AM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > I  find something useful.
> >
> > When we do region.close,  if there is one compaction or flush in
> progress,
> > close will wait for compaction or flush be finished.
> >
> > {code: title=HRegion.java}
> >
> > @Override
> > public void waitForFlushesAndCompactions() {
> >   synchronized (writestate) {
> > if (this.writestate.readOnly) {
> >   // we should not wait for replayed flushed if we are read only
> > (for example in case the
> >   // region is a secondary replica).
> >   return;
> > }
> > boolean interrupted = false;
> > try {
> >   while (writestate.compacting > 0 || writestate.flushing) {
> > LOG.debug("waiting for " + writestate.compacting + " compactions"
> >   + (writestate.flushing ? " & cache flush" : "") + " to
> > complete for region " + this);
> > try {
> >   writestate.wait();
> > } catch (InterruptedException iex) {
> >   // essentially ignore and propagate the interrupt back up
> >   LOG.warn("Interrupted while waiting");
> >   interrupted = true;
> > }
> >   }
> > } finally {
> >   if (interrupted) {
> > Thread.currentThread().interrupt();
> >   }
> > }
> >   }
> > }
> >
> > {code}
> >
> > And writestate.flushing will be set to be true in two place:
> >
> > HRegion.flushCache and HRegion.replayWALFlushStartMarker
> >
> > {code: title=HRegion.flushCache}
> >
> > synchronized (writestate) {
> >   if (!writestate.flushing && writestate.writesEnabled) {
> > this.writestate.flushing = true;
> >   } else {
> > **
> >   }
> > }
> >
> > {code}
> >
> > {code: title=HRegion.replayWALFlushStartMarker}
> >
> > synchronized (writestate) {
> >   try {
> > **
> > if (!writestate.flushing) {
> >
> > this.writestate.flushing = true;
> > *...*
> >
> > * }*
> >
> > {code}
> >
> >
> > Notice that,  in HRegion.replayWALFlushStartMarker,  we did not check
> > writestate.writesEnabled before set writestate.flushing to be true.
> >
> > So if region.close wake up in writestate.wait, but the lock acquried by
> > HRegion.replayWALFlushStartMarker,  the flushing will be set to be true
> > again,  and region.close will stuck in writestate.wait forever.
> >
> >
> > Will it happen in real logical?
> >
> >
> > 2016-05-27 10:44 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:
> >
> > > Thanks guys,  yesterday i restart relate RS and failed close region
> > reopen
> > > successfuly.  But today, there is another region fall in this state.
> > >
> > > I paste relate RS' jstack information.  This time the failed close
> region
> > > is 9368190b3ba46238534b6307702aabae
> > >
> > > 2016-05-26 21:50 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
> > >
> > >> Heng:
> > >> Can you pastebin the complete stack trace for the region server ?
> > >>
> > >> Snippet from region server log may also provide more clue.
> > >>
> > >> Thanks
> > >>
> > >> On Wed, May 25, 2016 at 9:48 PM, Heng Chen <heng.chen.1...@gmail.com>
> > >> wrote:
> > >>
> > >> > On master web UI, i could see region
> > (c371fb20c372b8edbf54735409ab5c4a)
> > >> > always in failed close state,  So balancer could not run.
> > >> >
> > >>

Re: region stuck in failed close state

2016-05-30 Thread Heng Chen

I  find something useful.

When we do region.close,  if there is one compaction or flush in progress,
close will wait for compaction or flush be finished.

{code: title=HRegion.java}

@Override
public void waitForFlushesAndCompactions() {
  synchronized (writestate) {
if (this.writestate.readOnly) {
  // we should not wait for replayed flushed if we are read only
(for example in case the
  // region is a secondary replica).
  return;
}
boolean interrupted = false;
try {
  while (writestate.compacting > 0 || writestate.flushing) {
LOG.debug("waiting for " + writestate.compacting + " compactions"
  + (writestate.flushing ? " & cache flush" : "") + " to
complete for region " + this);
try {
  writestate.wait();
} catch (InterruptedException iex) {
  // essentially ignore and propagate the interrupt back up
  LOG.warn("Interrupted while waiting");
  interrupted = true;
}
  }
} finally {
  if (interrupted) {
Thread.currentThread().interrupt();
  }
}
  }
}

{code}

And writestate.flushing will be set to be true in two place:

HRegion.flushCache and HRegion.replayWALFlushStartMarker

{code: title=HRegion.flushCache}

synchronized (writestate) {
  if (!writestate.flushing && writestate.writesEnabled) {
this.writestate.flushing = true;
  } else {
**
  }
}

{code}

{code: title=HRegion.replayWALFlushStartMarker}

synchronized (writestate) {
  try {
**
if (!writestate.flushing) {

this.writestate.flushing = true;
*...*

* }*

{code}


Notice that,  in HRegion.replayWALFlushStartMarker,  we did not check
writestate.writesEnabled before set writestate.flushing to be true.

So if region.close wake up in writestate.wait, but the lock acquried by
HRegion.replayWALFlushStartMarker,  the flushing will be set to be true
again,  and region.close will stuck in writestate.wait forever.


Will it happen in real logical?


2016-05-27 10:44 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:

> Thanks guys,  yesterday i restart relate RS and failed close region reopen
> successfuly.  But today, there is another region fall in this state.
>
> I paste relate RS' jstack information.  This time the failed close region
> is 9368190b3ba46238534b6307702aabae
>
> 2016-05-26 21:50 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:
>
>> Heng:
>> Can you pastebin the complete stack trace for the region server ?
>>
>> Snippet from region server log may also provide more clue.
>>
>> Thanks
>>
>> On Wed, May 25, 2016 at 9:48 PM, Heng Chen <heng.chen.1...@gmail.com>
>> wrote:
>>
>> > On master web UI, i could see region (c371fb20c372b8edbf54735409ab5c4a)
>> > always in failed close state,  So balancer could not run.
>> >
>> >
>> > i check the region on RS,  and found logs about this region
>> >
>> > 2016-05-26 12:42:10,490 INFO  [MemStoreFlusher.1]
>> > regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean up
>> > 'too many store files'; waited long enough... proceeding with flush of
>> >
>> >
>> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
>> > 2016-05-26 12:42:20,043 INFO
>> >  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
>> > regionserver.HRegionServer:
>> > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
>> > requesting flush for region
>> >
>> >
>> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
>> > after a delay of 20753
>> > 2016-05-26 12:42:30,043 INFO
>> >  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
>> > regionserver.HRegionServer:
>> > dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
>> > requesting flush for region
>> >
>> >
>> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
>> > after a delay of 7057
>> >
>> >
>> > relate jstack information like below:
>> >
>> > Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2):
>> >   State: WAITING
>> >   Blocked count: 1
>> >   Waited count: 2
>> >   Waiting on
>> > org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c
>> >   Stack:
>> > java.lang.Object.wait(Native Method)
>> > java.lang.Object.wait(Object.java:502)
>> >
>> >
>> org.apache.hadoop.hbase.regionserver.HRegion.waitFo

Re: region stuck in failed close state

2016-05-26 Thread Heng Chen

And there is another question about failed close state,  does it mean the
region in this state could be read and write normally?

2016-05-26 12:48 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:

>
> On master web UI, i could see region (c371fb20c372b8edbf54735409ab5c4a)
> always in failed close state,  So balancer could not run.
>
>
> i check the region on RS,  and found logs about this region
>
> 2016-05-26 12:42:10,490 INFO  [MemStoreFlusher.1]
> regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean up
> 'too many store files'; waited long enough... proceeding with flush of
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> 2016-05-26 12:42:20,043 INFO
>  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> regionserver.HRegionServer:
> dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> requesting flush for region
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> after a delay of 20753
> 2016-05-26 12:42:30,043 INFO
>  [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
> regionserver.HRegionServer:
> dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
> requesting flush for region
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
> after a delay of 7057
>
>
> relate jstack information like below:
>
> Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2):
>   State: WAITING
>   Blocked count: 1
>   Waited count: 2
>   Waiting on org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c
>   Stack:
> java.lang.Object.wait(Native Method)
> java.lang.Object.wait(Object.java:502)
> 
> org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
> org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
> org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)
> 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> java.lang.Thread.run(Thread.java:745)
>
>
> Our HBase cluster version is 1.1.1,   i try to compact this region, compact 
> stuck in progress 89.58%
>
> frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
>  85860221 85860221
> 89.58%
>
>

region stuck in failed close state

2016-05-25 Thread Heng Chen

On master web UI, i could see region (c371fb20c372b8edbf54735409ab5c4a)
always in failed close state,  So balancer could not run.


i check the region on RS,  and found logs about this region

2016-05-26 12:42:10,490 INFO  [MemStoreFlusher.1]
regionserver.MemStoreFlusher: Waited 90447ms on a compaction to clean up
'too many store files'; waited long enough... proceeding with flush of
frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
2016-05-26 12:42:20,043 INFO
 [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
regionserver.HRegionServer:
dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
requesting flush for region
frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
after a delay of 20753
2016-05-26 12:42:30,043 INFO
 [dx-pipe-regionserver4-online,16020,1464166626969_ChoreService_1]
regionserver.HRegionServer:
dx-pipe-regionserver4-online,16020,1464166626969-MemstoreFlusherChore
requesting flush for region
frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
after a delay of 7057


relate jstack information like below:

Thread 12403 (RS_CLOSE_REGION-dx-pipe-regionserver4-online:16020-2):
  State: WAITING
  Blocked count: 1
  Waited count: 2
  Waiting on org.apache.hadoop.hbase.regionserver.HRegion$WriteState@1390594c
  Stack:
java.lang.Object.wait(Native Method)
java.lang.Object.wait(Object.java:502)

org.apache.hadoop.hbase.regionserver.HRegion.waitForFlushesAndCompactions(HRegion.java:1512)
org.apache.hadoop.hbase.regionserver.HRegion.doClose(HRegion.java:1371)
org.apache.hadoop.hbase.regionserver.HRegion.close(HRegion.java:1336)

org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:138)
org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)

java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)

java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)


Our HBase cluster version is 1.1.1,   i try to compact this region,
compact stuck in progress 89.58%

frog_stastic,\xFC\xAD\xD4\x07_{211}_1460209650596,1464149036644.c371fb20c372b8edbf54735409ab5c4a.
85860221 85860221
89.58%

Re: hbase rowkey design

2016-05-16 Thread Heng Chen

In my company, we calculate UV/PV offline in batch, and update every day.

If do it online, url + timestamp could be the rowkey.



2016-05-16 18:13 GMT+08:00 齐忠 <cente...@gmail.com>:

> Yes, like google analytics.
>
> 2016-05-16 17:48 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:
> > You want to calculate UV/PV online?
> >
> > 2016-05-16 16:46 GMT+08:00 齐忠 <cente...@gmail.com>:
> >
> >> I have very large log(50T per day),
> >>
> >> My log event as follows
> >>
> >> url,visitid,requesttime
> >>
> >> http://www.aaa.com?a=b=d=f, 1, 1463387380
> >> http://www.aaa.com?a=b=d=fa, 1, 1463387280
> >> http://www.aaa.com?a=b=d=fa, 2, 1463387280
> >> http://www.aaa.com?a=b=d=fab, 2, 1463387280
> >> http://www.aaa.com?a=b=d=f, 1, 1463387380
> >>
> >>
> >> When a user enters a part of the url, and returns the
> >> uv(UniqueVisitor) pv(PageView)。
> >>
> >> for example
> >>
> >> input: e=f*
> >>
> >> output: uv=2,pv=5,
> >>
> >> input: e=fa
> >>
> >> output:uv=2,pv=3
> >>
> >> How to design rowkey?
> >>
> >> Thanks.
> >>
>
>
>
> --
> cente...@gmail.com|齐忠
>

Re: hbase rowkey design

2016-05-16 Thread Heng Chen

You want to calculate UV/PV online?

2016-05-16 16:46 GMT+08:00 齐忠 :

> I have very large log(50T per day),
>
> My log event as follows
>
> url,visitid,requesttime
>
> http://www.aaa.com?a=b=d=f, 1, 1463387380
> http://www.aaa.com?a=b=d=fa, 1, 1463387280
> http://www.aaa.com?a=b=d=fa, 2, 1463387280
> http://www.aaa.com?a=b=d=fab, 2, 1463387280
> http://www.aaa.com?a=b=d=f, 1, 1463387380
>
>
> When a user enters a part of the url, and returns the
> uv(UniqueVisitor) pv(PageView)。
>
> for example
>
> input: e=f*
>
> output: uv=2,pv=5,
>
> input: e=fa
>
> output:uv=2,pv=3
>
> How to design rowkey?
>
> Thanks.
>

Re: New blog post: HDFS HSM and HBase by Jingcheng Du and Wei Zhou

2016-04-25 Thread Heng Chen

That's great!  We are ready to use SSD to improve read performance now.

2016-04-23 8:25 GMT+08:00 Stack :

> It is well worth the read. It goes deep so is a bit long and I had to cut
> it up to do Apache Blog sized bits. Start reading here:
> https://blogs.apache.org/hbase/entry/hdfs_hsm_and_hbase_part
>
> St.Ack
>

Re: "Some problems after upgrade from 0.98.6 to 0.98.17"

2016-04-07 Thread Heng Chen

bq. Any other symptom that you observed ?

And in web-UI,  there are alway some split regions in one table.  The split
regions number seems never be changed after upgraded.

2016-04-07 21:37 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:

> The 'No StoreFiles for' is logged at DEBUG level.
>
> Was 9151f75eaa7d00a81e5001f4744b8b6a among the regions which didn't finish
> split ?
>
> Can you pastebin more of the master log during this period ?
>
> Any other symptom that you observed ?
>
> Cheers
>
> On Thu, Apr 7, 2016 at 12:59 AM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > This is log about one region i grep from master log
> >
> > 2016-04-07 12:20:53,984 INFO  [AM.ZK.Worker-pool2-t145]
> > master.RegionStates: Transition null to {9151f75eaa7d00a81e5001f4744b8b6a
> > state=SPLITTING_NEW, ts=1460002853984,
> > server=dx-ape-regionserver40-online,60020,1459998494013}
> > 2016-04-07 12:20:55,326 INFO  [AM.ZK.Worker-pool2-t147]
> > master.RegionStates: Transition {9151f75eaa7d00a81e5001f4744b8b6a
> > state=SPLITTING_NEW, ts=1460002855326,
> > server=dx-ape-regionserver40-online,60020,1459998494013} to
> > {9151f75eaa7d00a81e5001f4744b8b6a state=OPEN, ts=1460002855326,
> > server=dx-ape-regionserver40-online,60020,1459998494013}
> > 2016-04-07 12:20:55,326 INFO  [AM.ZK.Worker-pool2-t147]
> > master.RegionStates: Onlined 9151f75eaa7d00a81e5001f4744b8b6a on
> > dx-ape-regionserver40-online,60020,1459998494013
> > 2016-04-07 12:20:55,328 INFO  [AM.ZK.Worker-pool2-t147]
> > master.AssignmentManager: Handled SPLIT event;
> >
> >
> parent=apolo_pdf,\x00\x00\x00\x00\x01\x94\xC0\xA8,1457693428562.3410ea47a97d0aefb12ec62e8e89b605.,
> > daughter
> >
> >
> a=apolo_pdf,\x00\x00\x00\x00\x01\x94\xC0\xA8,1460002853961.9151f75eaa7d00a81e5001f4744b8b6a.,
> > daughter
> >
> >
> b=apolo_pdf,\x00\x00\x00\x00\x01\xA2\x0D\x96,1460002853961.a7d6d735ccbf47e0a9d3016b8fef181a.,
> > on dx-ape-regionserver40-online,60020,1459998494013
> > 2016-04-07 12:21:44,083 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: reference
> >
> >
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/16b7f857eb6741a5bcaaa5516034929f.3410ea47a97d0aefb12ec62e8e89b605'
> > to region=3410ea47a97d0aefb12ec62e8e89b605
> > hfile=16b7f857eb6741a5bcaaa5516034929f
> > 2016-04-07 12:21:44,123 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: reference
> >
> >
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/1a3f77a7588a4ad38d34ed97f6c095be.3410ea47a97d0aefb12ec62e8e89b605'
> > to region=3410ea47a97d0aefb12ec62e8e89b605
> > hfile=1a3f77a7588a4ad38d34ed97f6c095be
> > 2016-04-07 12:21:44,132 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: reference
> >
> >
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/544552889a4c4de99d52814b3c229c30.3410ea47a97d0aefb12ec62e8e89b605'
> > to region=3410ea47a97d0aefb12ec62e8e89b605
> > hfile=544552889a4c4de99d52814b3c229c30
> > 2016-04-07 12:21:44,138 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: reference
> >
> >
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/fcf658954ff84e998eb71ee6477c2ebe.3410ea47a97d0aefb12ec62e8e89b605'
> > to region=3410ea47a97d0aefb12ec62e8e89b605
> > hfile=fcf658954ff84e998eb71ee6477c2ebe
> > 2016-04-07 12:21:44,138 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: reference
> >
> >
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/16b7f857eb6741a5bcaaa5516034929f.3410ea47a97d0aefb12ec62e8e89b605'
> > to region=3410ea47a97d0aefb12ec62e8e89b605
> > hfile=16b7f857eb6741a5bcaaa5516034929f
> > 2016-04-07 12:21:44,177 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: reference
> >
> >
> 'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/1a3f77a7588a4ad38d34ed97f6c095be.3410ea47a97d0aefb12ec62e8e89b605'
> > to region=3410ea47a97d0aefb12ec62e8e89b605
> > hfile=1a3f77a7588a4ad38d34ed97f6c095be
> > 2016-04-07 12:21:44,179 DEBUG
> > [dx-ape-hmaster1-online,6,1459998573658-ClusterStatusChore]
> > regionserver.StoreFileInfo: r

Re: "Some problems after upgrade from 0.98.6 to 0.98.17"

2016-04-07 Thread Heng Chen

.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=1a3f77a7588a4ad38d34ed97f6c095be
2016-04-07 12:23:54,261 DEBUG [region-location-4]
regionserver.StoreFileInfo: reference
'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/544552889a4c4de99d52814b3c229c30.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=544552889a4c4de99d52814b3c229c30
2016-04-07 12:23:54,302 DEBUG [region-location-4]
regionserver.StoreFileInfo: reference
'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/fcf658954ff84e998eb71ee6477c2ebe.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=fcf658954ff84e998eb71ee6477c2ebe
2016-04-07 12:23:54,302 DEBUG [region-location-4]
regionserver.StoreFileInfo: reference
'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/16b7f857eb6741a5bcaaa5516034929f.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=16b7f857eb6741a5bcaaa5516034929f
2016-04-07 12:23:54,341 DEBUG [region-location-4]
regionserver.StoreFileInfo: reference
'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/1a3f77a7588a4ad38d34ed97f6c095be.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=1a3f77a7588a4ad38d34ed97f6c095be
2016-04-07 12:23:54,382 DEBUG [region-location-4]
regionserver.StoreFileInfo: reference
'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/544552889a4c4de99d52814b3c229c30.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=544552889a4c4de99d52814b3c229c30
2016-04-07 12:23:54,383 DEBUG [region-location-4]
regionserver.StoreFileInfo: reference
'hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/d/fcf658954ff84e998eb71ee6477c2ebe.3410ea47a97d0aefb12ec62e8e89b605'
to region=3410ea47a97d0aefb12ec62e8e89b605
hfile=fcf658954ff84e998eb71ee6477c2ebe
2016-04-07 12:23:54,386 DEBUG [region-location-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/m
2016-04-07 12:25:54,033 DEBUG [region-location-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://hdfs-master:8020/hbase/data/default/apolo_pdf/9151f75eaa7d00a81e5001f4744b8b6a/m

2016-04-07 15:50 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:

> hi, guys:
>
>i upgrade our cluster recently,  after upgrade, i found some wired
> problems:
>
>
> in Master.log,  there some a lots of logs like below:
>
> 2016-04-07 11:57:00,597 DEBUG [region-location-0]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_tutor_user_resource/23a62ee5b1d64bd8f57f4e31c383e343/m
> 2016-04-07 11:57:00,597 DEBUG [region-location-3]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_solar_resource/72c36fc12fcc95c66167c2199d8bcc36/m
> 2016-04-07 11:57:02,297 DEBUG [region-location-4]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_solar_user_image/c2712ea3fbb58eaf5cb7ded2b40c1df5/d
> 2016-04-07 11:57:02,337 DEBUG [region-location-4]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_solar_user_image/c2712ea3fbb58eaf5cb7ded2b40c1df5/m
> 2016-04-07 11:57:58,335 DEBUG [region-location-3]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_tarzan_user_resource/f75f5acbd22266ea596d0f1ceb0130f9/d
> 2016-04-07 11:57:58,335 DEBUG [region-location-3]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_tarzan_user_resource/f75f5acbd22266ea596d0f1ceb0130f9/m
> 2016-04-07 11:57:58,826 DEBUG [region-location-3]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_user_resource/75dc4eb5143d28a25c287918c368f6c5/d
> 2016-04-07 11:57:58,866 DEBUG [region-location-0]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_resource/08cffb3598c6f4446a68aabf53b1e498/d
> 2016-04-07 11:57:58,867 DEBUG [region-location-3]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_user_resource/75dc4eb5143d28a25c287918c368f6c5/m
> 2016-04-07 11:57:58,906 DEBUG [region-location-0]
> regionserver.HRegionFileSystem: No StoreFiles for:
> hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_resource/08cffb3598c6f4446a68aabf53b1e498/m
> 2016-04-07 11:57:59,504 DEBUG [region-location-0]
>

"Some problems after upgrade from 0.98.6 to 0.98.17"

2016-04-07 Thread Heng Chen

hi, guys:

   i upgrade our cluster recently,  after upgrade, i found some wired
problems:


in Master.log,  there some a lots of logs like below:

2016-04-07 11:57:00,597 DEBUG [region-location-0]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_tutor_user_resource/23a62ee5b1d64bd8f57f4e31c383e343/m
2016-04-07 11:57:00,597 DEBUG [region-location-3]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_solar_resource/72c36fc12fcc95c66167c2199d8bcc36/m
2016-04-07 11:57:02,297 DEBUG [region-location-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_solar_user_image/c2712ea3fbb58eaf5cb7ded2b40c1df5/d
2016-04-07 11:57:02,337 DEBUG [region-location-4]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_solar_user_image/c2712ea3fbb58eaf5cb7ded2b40c1df5/m
2016-04-07 11:57:58,335 DEBUG [region-location-3]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_tarzan_user_resource/f75f5acbd22266ea596d0f1ceb0130f9/d
2016-04-07 11:57:58,335 DEBUG [region-location-3]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_tarzan_user_resource/f75f5acbd22266ea596d0f1ceb0130f9/m
2016-04-07 11:57:58,826 DEBUG [region-location-3]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_user_resource/75dc4eb5143d28a25c287918c368f6c5/d
2016-04-07 11:57:58,866 DEBUG [region-location-0]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_resource/08cffb3598c6f4446a68aabf53b1e498/d
2016-04-07 11:57:58,867 DEBUG [region-location-3]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_user_resource/75dc4eb5143d28a25c287918c368f6c5/m
2016-04-07 11:57:58,906 DEBUG [region-location-0]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_apolo_resource/08cffb3598c6f4446a68aabf53b1e498/m
2016-04-07 11:57:59,504 DEBUG [region-location-0]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_solar_user_resource/2c9272fcde86a28b0cf6b479ef0b3745/d
2016-04-07 11:57:59,544 DEBUG [region-location-0]
regionserver.HRegionFileSystem: No StoreFiles for:
hdfs://common-cluster:8020/hbase/data/default/gallery_solar_user_resource/2c9272fcde86a28b0cf6b479ef0b3745/m



And in web-UI,  there are alway some split regions in one table.  The split
regions number seems never be changed after upgraded.


I am not sure the reason of this problem,  have you meet this in practice?

Re: Hbase schema for time based analytics

2016-03-22 Thread Heng Chen

OpenTSDB + 1:)

2016-03-23 11:49 GMT+08:00 Wojciech Indyk :

> Hi Prem!
> Look at OpenTSDB http://opentsdb.net/
> --
> Kind regards/ Pozdrawiam,
> Wojciech Indyk
> http://datacentric.pl
>
>
> 2016-03-07 11:26 GMT+01:00 Prem Yadav :
> > Hi,
> > we have a use case where we need to get the data for a day/week/month.
> >
> > We need additional filters for the data so it will be like "select where
> > data between  and  where filter= > column=value>
> >
> > This is telemetry type data and is heavily written. About 2+TB per day.
> >
> > How should we go about the schema?
> >
> > Also, can someone point me to a good schema design tutorial?
> >
> > Thanks
>

Re: why Hbase only split regions in one RegionServer

2016-03-15 Thread Heng Chen

bq. the table I created by default having only one region

Why not pre-split table into more regions when create it?

2016-03-16 11:38 GMT+08:00 Ted Yu :

> When one region is split into two, both daughter regions are opened on the
> same server where parent region was opened.
>
> Can you provide a bit more information:
>
> release of hbase
> whether balancer was turned on - you can inspect master log to see if
> balancer was on
>
> Consider pastebinning portion of master log.
>
> Thanks
>
> On Tue, Mar 15, 2016 at 4:43 PM, jackwang  wrote:
>
> > I was writing 300GiB data to my Hbase table user_info, the table I
> created
> > by
> > default having only one region. When the writing was going I saw one
> region
> > became two regions and more late on it became 8 regions. But my confusion
> > is
> > that the 8 regions were kept in the same RegionServer.
> >
> > Why Hbase didn't split the regions to different RegionServer. btw, I had
> 10
> > physical RegionsServers in my Hbase cluster, and the region size I set is
> > 20GiB, Thanks!
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/why-Hbase-only-split-regions-in-one-RegionServer-tp4078497.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

Re: HBase poor write performance

2016-03-08 Thread Heng Chen

what is your HLogs File num during test,   is it always the max number
(IIRC, default is 34?).

How many DNs in your hdfs?

2016-03-09 1:31 GMT+08:00 Frank Luo :

> 0.98
>
> "Light" means not enough to trigger compacts during actively write.
>
> -Original Message-
> From: saint@gmail.com [mailto:saint@gmail.com] On Behalf Of Stack
> Sent: Tuesday, March 08, 2016 11:29 AM
> To: Hbase-User
> Subject: Re: HBase poor write performance
>
> On Tue, Mar 8, 2016 at 8:49 AM, Frank Luo  wrote:
>
> > Akmal,
> >
> > We have been suffering the issue for two years now without a good
> > solution. From what I learned, it is not really a good idea to do
> > heavy online hbase puts. The first thing you encounter will be
> > performance caused by compact no matter how you tune parameters. Then
> > later on you will see job failures because hbase operation timeouts
> and/or region server crashes.
> >
> > Light writes, heavy reads are generally OK.
> >
> >
> What version are you running Frank?
>
> Yes, bulk load is >>> than Puts via API but I'd be interested in what
> 'light' means for you.
>
> Thanks,
> St.Ack
>
>
>
> > For heavy puts, the best practice is to prepare tables offline, then
> > turn it on for reads.
> >
> > If online heavy puts not avoidable, you might get the best out of it
> > if you manage compact/split by yourself. Meaning when # of files per
> > region reaches certain number, stops writing, performs  compacts and
> > splits with large regions; then resume writing.
> >
> > I hope it helps.
> >
> > Frank Luo
> >
> > From: Akmal Abbasov [mailto:akmal.abba...@icloud.com]
> > Sent: Tuesday, March 08, 2016 10:29 AM
> > To: user@hbase.apache.org
> > Subject: HBase poor write performance
> >
> > Hi,
> > I'm testing HBase to choose the right hardware configurations for a
> > heavy write use case. I'm testing using YCSB.
> > The cluster consist of 2 masters, and 5 regionservers(4 cores, 14GB
> > ram, 4x512GB SSD).
> > I've created a new table in HBase, presplit it to 50 regions. I'm
> > running
> > 3 clients each running 50 threads, to insert data.
> > I'm using the default HBase settings. After running few tests, I can
> > see that the cluster is underutilized, in fact memory usage is around
> 30%.
> > The main problem I see for now is compactions, compactionQueueLength
> > is growing very fast, and compaction process is always running.
> > I found that there are hbase.regionserver.thread.compaction.small and
> > hbase.regionserver.thread.compaction.large but couldn't find
> > information regarding their default values.
> > I am also planing to increase the regions number and the memstore size
> > to increase utilization of the cluster and performance.
> > Which other settings should be tuned to improve both utilization and
> > performance?
> > Thank you.
> >
> >
> > I'm using HBase 0.98.7 and regionserver heap size is 7GB.
> >
> >
> > Regards, Akmal
> >
> > This email and any attachments transmitted with it are intended for
> > use by the intended recipient(s) only. If you have received this email
> > in error, please notify the sender immediately and then delete it. If
> > you are not the intended recipient, you must not keep, use, disclose,
> > copy or distribute this email without the author’s prior permission.
> > We take precautions to minimize the risk of transmitting software
> > viruses, but we advise you to perform your own virus checks on any
> > attachment to this message. We cannot accept liability for any loss or
> > damage caused by software viruses. The information contained in this
> > communication may be confidential and may be subject to the
> attorney-client privilege.
> >
> This email and any attachments transmitted with it are intended for use by
> the intended recipient(s) only. If you have received this email in error,
> please notify the sender immediately and then delete it. If you are not the
> intended recipient, you must not keep, use, disclose, copy or distribute
> this email without the author’s prior permission. We take precautions to
> minimize the risk of transmitting software viruses, but we advise you to
> perform your own virus checks on any attachment to this message. We cannot
> accept liability for any loss or damage caused by software viruses. The
> information contained in this communication may be confidential and may be
> subject to the attorney-client privilege.
>

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

Thanks stack and ted for your help.

After check the code, i think the reason is RS send split request with
parent region, two daughter regions,  then RS crash.

Master update two daughter regions to be SPLIT_NEW state and put them
in regionsInTransition
which is stored in memory of master.

And in 0.98.11-,  serverOffline not handle this situation when region is in
SPLIT_NEW state. So we have to restart master.

As ted said, HBASE-12958 has fixed it.

As for "set_quota" command, it was introduced after 1.1,  i will upgrade my
cluster.

Thanks guys for your help.



2016-02-25 11:41 GMT+08:00 Stack <st...@duboce.net>:

> On Wed, Feb 24, 2016 at 3:31 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > The story is I run one MR job on my production cluster (0.98.6),   it
> needs
> > to scan one table during map procedure.
> >
> > Because of the heavy load from the job,  all my RS crashed due to OOM.
> >
> >
> Really big rows? If so, can you narrow your scan or ask for partial rows
> (IIRC, you can do this in 0.98.x) or move up on to hbase 1.1+ where
> scanning does 'chunking'?
>
>
> > After i restart all RS,  i found one problem.
> >
> > All regions were reopened on one RS,
>
>
>
> ... the others took a while to check in? Thats usual reason one RS gets a
> bunch of regions.
>
>
>
> > and balancer could not run because of
> > two regions were in transition.   The cluster got in stuck a long time
> > until i restarted master.
> >
> > 1.  why this happened?
> >
> > Would need logs. I see you posted some later. Good to go to the server
> that was doing the split and look in log around the time of split fail.
>
>
> > 2.  If cluster has a lots of regions, after all RS crash,  how to restart
> > the cluster.  If restart RS one by one, it means OOM may happen because
> one
> > RS has to hold all regions and it will cost a long time.
> >
> >
> Best to restart cluster in this case (after figuring why others took a
> while to check in... look at their logs around startup time to see why they
> dally)
>
>
> > 3.  Is it possible to make each table with some requests quotas,  it
> means
> > when one table is requested heavily, it has no impact to other tables on
> > cluster.
> >
> >
> Not sure what the state of this is in 0.98. Maybe someone closer to 0.98
> knows.
>
> St.Ack
>
>
>
> >
> > Thanks
> >
>

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

Thanks @ted,   your suggestions about 2 and 3  are what i need !

2016-02-25 10:39 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:

> I pick up some logs in master.log about one region
> "ad283942aff2bba6c0b94ff98a904d1a"
>
>
> 2016-02-24 16:24:35,610 INFO  [AM.ZK.Worker-pool2-t3491]
> master.RegionStates: Transition null to {ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068}
> 2016-02-24 16:25:40,472 WARN
>  [MASTER_SERVER_OPERATIONS-dx-common-hmaster1-online:6-0]
> master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected
> {ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068}
> 2016-02-24 16:34:24,769 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 16:39:24,768 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 16:44:24,768 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 16:45:37,749 DEBUG [FifoRpcScheduler.handler1-thread-10]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 16:49:24,769 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 16:54:24,768 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 16:59:24,768 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 17:04:24,769 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
> 2016-02-24 17:09:24,768 DEBUG
> [dx-common-hmaster1-online,6,1433937470611-BalancerChore]
> master.HMaster: Not running balancer because 2 region(s) in transition:
> {ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
> state=SPLITTING_NEW, ts=1456302275610,
> server=dx-common-regionserver1-online,60020,1456302268068},
> ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
> state=SPLITTING_NEW...
>
>
>
>
>
> 2016-02-25 10:05 GMT+08:00

Re: Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

I pick up some logs in master.log about one region
"ad283942aff2bba6c0b94ff98a904d1a"

2016-02-24 16:24:35,610 INFO  [AM.ZK.Worker-pool2-t3491]
master.RegionStates: Transition null to {ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068}
2016-02-24 16:25:40,472 WARN
 [MASTER_SERVER_OPERATIONS-dx-common-hmaster1-online:6-0]
master.RegionStates: THIS SHOULD NOT HAPPEN: unexpected
{ad283942aff2bba6c0b94ff98a904d1a state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068}
2016-02-24 16:34:24,769 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 16:39:24,768 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 16:44:24,768 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 16:45:37,749 DEBUG [FifoRpcScheduler.handler1-thread-10]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 16:49:24,769 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 16:54:24,768 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 16:59:24,768 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 17:04:24,769 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...
2016-02-24 17:09:24,768 DEBUG
[dx-common-hmaster1-online,6,1433937470611-BalancerChore]
master.HMaster: Not running balancer because 2 region(s) in transition:
{ad283942aff2bba6c0b94ff98a904d1a={ad283942aff2bba6c0b94ff98a904d1a
state=SPLITTING_NEW, ts=1456302275610,
server=dx-common-regionserver1-online,60020,1456302268068},
ab07d6fbcef39be032ba11ca6ba252ef={ab07d6fbcef39be032ba11ca6ba252ef
state=SPLITTING_NEW...

2016-02-25 10:05 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:

> bq. two regions were in transition
>
> Can you pastebin related server logs w.r.t. these two regions so that we
> can have more clue ?
>
> For #2, please see http://hbase.apache.org/book.html#big.cluster.config
>
> For #3, please see
>
> http://hbase.apache.org/book.html#_running_multiple_workloads_on_a_single_cluster
>
> On Wed, Feb 24, 2016 at 3:31 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > The story is I run one MR job on my production cluster (0.98.6),   it
> needs
> >

Some problems in one accident on my production cluster

2016-02-24 Thread Heng Chen

The story is I run one MR job on my production cluster (0.98.6),   it needs
to scan one table during map procedure.

Because of the heavy load from the job,  all my RS crashed due to OOM.

After i restart all RS,  i found one problem.

All regions were reopened on one RS,  and balancer could not run because of
two regions were in transition.   The cluster got in stuck a long time
until i restarted master.

1.  why this happened?

2.  If cluster has a lots of regions, after all RS crash,  how to restart
the cluster.  If restart RS one by one, it means OOM may happen because one
RS has to hold all regions and it will cost a long time.

3.  Is it possible to make each table with some requests quotas,  it means
when one table is requested heavily, it has no impact to other tables on
cluster.


Thanks

Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread Heng Chen

I am not sure whether "upsert batch size in phoenix" equals HBase Client
batch puts size or not.

But as log shows, it seems there are 2000 actions send to hbase one time.

2016-02-15 11:38 GMT+08:00 anil gupta <anilgupt...@gmail.com>:

> My phoenix upsert batch size is 50. You mean to say that 50 is also a lot?
>
> However, AsyncProcess is complaining about 2000 actions.
>
> I tried with upsert batch size of 5 also. But it didnt help.
>
> On Sun, Feb 14, 2016 at 7:37 PM, anil gupta <anilgupt...@gmail.com> wrote:
>
> > My phoenix upsert batch size is 50. You mean to say that 50 is also a
> lot?
> >
> > However, AsyncProcess is complaining about 2000 actions.
> >
> > I tried with upsert batch size of 5 also. But it didnt help.
> >
> >
> > On Sun, Feb 14, 2016 at 6:43 PM, Heng Chen <heng.chen.1...@gmail.com>
> > wrote:
> >
> >> 2016-02-14 12:34:23,593 INFO [main]
> >> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions to finish
> >>
> >> It means your writes are too many,  please decrease the batch size of
> your
> >> puts,  and balance your requests on each RS.
> >>
> >> 2016-02-15 4:53 GMT+08:00 anil gupta <anilgupt...@gmail.com>:
> >>
> >> > After a while we also get this error:
> >> > 2016-02-14 12:45:10,515 WARN [main]
> >> > org.apache.phoenix.execute.MutationState: Swallowing exception and
> >> > retrying after clearing meta cache on connection.
> >> > java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index
> >> > metadata.  ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find
> >> > cached index metadata.  key=-594230549321118802
> >> > region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index
> >> > update failed
> >> >
> >> > We have already set:
> >> >
> >> >
> >>
> phoenix.coprocessor.maxServerCacheTimeToLiveMs18
> >> >
> >> > Upset batch size is 50. Write are quite frequent so the cache would
> >> > not timeout in 18ms
> >> >
> >> >
> >> > On Sun, Feb 14, 2016 at 12:44 PM, anil gupta <anilgupt...@gmail.com>
> >> > wrote:
> >> >
> >> > > Hi,
> >> > >
> >> > > We are using phoenix4.4, hbase 1.1(hdp2.3.4).
> >> > > I have a MR job that is using PhoenixOutputFormat. My job keeps on
> >> > failing
> >> > > due to following error:
> >> > >
> >> > > 2016-02-14 12:29:43,182 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:29:53,197 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:30:03,212 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:30:13,225 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:30:23,239 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:30:33,253 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:30:43,266 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:30:53,279 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:31:03,293 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:31:13,305 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
> >> > to finish
> >> > > 2016-02-14 12:31:23,318 INFO [main]
> >> > org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
> >> actions
>

Re: org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000 actions to finish

2016-02-14 Thread Heng Chen

2016-02-14 12:34:23,593 INFO [main]
org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000
actions to finish

It means your writes are too many,  please decrease the batch size of your
puts,  and balance your requests on each RS.

2016-02-15 4:53 GMT+08:00 anil gupta :

> After a while we also get this error:
> 2016-02-14 12:45:10,515 WARN [main]
> org.apache.phoenix.execute.MutationState: Swallowing exception and
> retrying after clearing meta cache on connection.
> java.sql.SQLException: ERROR 2008 (INT10): Unable to find cached index
> metadata.  ERROR 2008 (INT10): ERROR 2008 (INT10): Unable to find
> cached index metadata.  key=-594230549321118802
> region=BI.SALES,,1455470578449.44e39179789041b5a8c03316730260e7. Index
> update failed
>
> We have already set:
>
> phoenix.coprocessor.maxServerCacheTimeToLiveMs18
>
> Upset batch size is 50. Write are quite frequent so the cache would
> not timeout in 18ms
>
>
> On Sun, Feb 14, 2016 at 12:44 PM, anil gupta 
> wrote:
>
> > Hi,
> >
> > We are using phoenix4.4, hbase 1.1(hdp2.3.4).
> > I have a MR job that is using PhoenixOutputFormat. My job keeps on
> failing
> > due to following error:
> >
> > 2016-02-14 12:29:43,182 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:29:53,197 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:30:03,212 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:30:13,225 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:30:23,239 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:30:33,253 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:30:43,266 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:30:53,279 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:31:03,293 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:31:13,305 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:31:23,318 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:31:33,331 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:31:43,345 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:31:53,358 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:32:03,371 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:32:13,385 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:32:23,399 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:32:33,412 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:32:43,428 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:32:53,443 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:33:03,457 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:33:13,472 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:33:23,486 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:33:33,524 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:33:43,538 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:33:53,551 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:34:03,565 INFO [main]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, waiting for 2000  actions
> to finish
> > 2016-02-14 12:34:03,953 INFO [hconnection-0xe82ca6e-shared--pool2-t16]
> org.apache.hadoop.hbase.client.AsyncProcess: #1, table=BI.SALES,
> attempt=10/35 failed=2000ops, last exception: null on 
> hdp3.truecar.com,16020,1455326291512,
> tracking started null, retrying after=10086ms,

Re: Can't see any log in log file

2016-02-14 Thread Heng Chen

I change phoenix lib from 4.6.0 to 4.5.1,  logs come back...



2016-02-14 15:27 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:

> I find some hints,   the log seems to be disappear after i install
> phoenix,  some suspicious logs below i found
>
> SLF4J: Class path contains multiple SLF4J bindings.
> SLF4J: Found binding in
> [jar:file:/home/maintain/hadoop/hbase/hbase-1.1.1/lib/phoenix-4.6.0-HBase-1.1-client.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/maintain/hadoop/hbase/hbase-1.1.1/lib/phoenix-server-4.6.0-HBase-1.1-runnable.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: Found binding in
> [jar:file:/home/maintain/hadoop/hbase/hbase-1.1.1/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
> SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an
> explanation.
>
> 2016-02-14 15:17 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:
>
>> This happens after i upgrade my cluster from 0.98 to 1.1
>>
>>
>>
>> 2016-02-14 12:47 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:
>>
>>> I am not sure why this happens,   this is my command
>>>
>>> maintain 11444 66.9  1.1 10386988 1485888 pts/0 Sl  12:33   6:30
>>> /usr/java/jdk/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9
>>> %p -Xmx8000m -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails
>>> -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1
>>> -XX:GCLogFileSize=512M
>>> -Dhbase.log.dir=/home/maintain/hadoop/hbase/hbase-1.1.1/logs
>>> -Dhbase.log.file=hbase-maintain-regionserver-dx-pipe-regionserver7-online.log
>>> -Dhbase.home.dir=/home/maintain/hadoop/hbase/hbase-1.1.1
>>> -Dhbase.id.str=maintain -Dhbase.root.logger=INFO,RFA
>>> -Dhbase.security.logger=INFO,RFAS
>>> org.apache.hadoop.hbase.regionserver.HRegionServer start
>>>
>>>
>>> in  hbase-maintain-regionserver-dx-pipe-regionserver7-online.log there
>>> is only information below:
>>>
>>> Sun Feb 14 12:33:19 CST 2016 Starting regionserver on
>>> dx-pipe-regionserver7-online
>>> core file size  (blocks, -c) 1024
>>> data seg size   (kbytes, -d) unlimited
>>> scheduling priority (-e) 0
>>> file size   (blocks, -f) unlimited
>>> pending signals (-i) 514904
>>> max locked memory   (kbytes, -l) 64
>>> max memory size (kbytes, -m) unlimited
>>> open files  (-n) 65536
>>> pipe size(512 bytes, -p) 8
>>> POSIX message queues (bytes, -q) 819200
>>> real-time priority  (-r) 0
>>> stack size  (kbytes, -s) 8192
>>> cpu time   (seconds, -t) unlimited
>>> max user processes  (-u) 32764
>>> virtual memory  (kbytes, -v) unlimited
>>> file locks  (-x) unlimited
>>>
>>>
>>>
>>> It seems there are some logs in
>>> hbase-maintain-regionserver-dx-pipe-regionserver7-online.out,  but not
>>> complete.
>>>
>>> 713517
>>> [RpcServer.reader=5,bindAddress=dx-pipe-regionserver7-online,port=16020]
>>> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
>>> 10.11.51.75 port: 51239 with version info: version: "1.1.1" url:
>>> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
>>> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
>>> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
>>> 2016-02-14T12:46:20.880+0800: [GC (Allocation Failure) [ParNew:
>>> 599568K->41393K(618048K), 0.0334895 secs] 1230469K->686558K(1991616K),
>>> 0.0336985 secs] [Times: user=0.36 sys=0.06, real=0.04 secs]
>>> 723538
>>> [RpcServer.reader=6,bindAddress=dx-pipe-regionserver7-online,port=16020]
>>> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
>>> 10.11.53.52 port: 18965 with version info: version: "1.1.1" url:
>>> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
>>> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
>>> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
>>> 733529
>>> [RpcServer.reader=7,bindAddress=dx-pipe-regionserver7-online,port=16020]
>>> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection f

Can't see any log in log file

2016-02-13 Thread Heng Chen

I am not sure why this happens,   this is my command

maintain 11444 66.9  1.1 10386988 1485888 pts/0 Sl  12:33   6:30
/usr/java/jdk/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9
%p -Xmx8000m -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails
-XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1
-XX:GCLogFileSize=512M
-Dhbase.log.dir=/home/maintain/hadoop/hbase/hbase-1.1.1/logs
-Dhbase.log.file=hbase-maintain-regionserver-dx-pipe-regionserver7-online.log
-Dhbase.home.dir=/home/maintain/hadoop/hbase/hbase-1.1.1
-Dhbase.id.str=maintain -Dhbase.root.logger=INFO,RFA
-Dhbase.security.logger=INFO,RFAS
org.apache.hadoop.hbase.regionserver.HRegionServer start


in  hbase-maintain-regionserver-dx-pipe-regionserver7-online.log there is
only information below:

Sun Feb 14 12:33:19 CST 2016 Starting regionserver on
dx-pipe-regionserver7-online
core file size  (blocks, -c) 1024
data seg size   (kbytes, -d) unlimited
scheduling priority (-e) 0
file size   (blocks, -f) unlimited
pending signals (-i) 514904
max locked memory   (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files  (-n) 65536
pipe size(512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority  (-r) 0
stack size  (kbytes, -s) 8192
cpu time   (seconds, -t) unlimited
max user processes  (-u) 32764
virtual memory  (kbytes, -v) unlimited
file locks  (-x) unlimited



It seems there are some logs in
hbase-maintain-regionserver-dx-pipe-regionserver7-online.out,  but not
complete.

713517
[RpcServer.reader=5,bindAddress=dx-pipe-regionserver7-online,port=16020]
INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
10.11.51.75 port: 51239 with version info: version: "1.1.1" url:
"git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
"d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
2016-02-14T12:46:20.880+0800: [GC (Allocation Failure) [ParNew:
599568K->41393K(618048K), 0.0334895 secs] 1230469K->686558K(1991616K),
0.0336985 secs] [Times: user=0.36 sys=0.06, real=0.04 secs]
723538
[RpcServer.reader=6,bindAddress=dx-pipe-regionserver7-online,port=16020]
INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
10.11.53.52 port: 18965 with version info: version: "1.1.1" url:
"git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
"d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
733529
[RpcServer.reader=7,bindAddress=dx-pipe-regionserver7-online,port=16020]
INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
10.11.51.75 port: 51325 with version info: version: "1.1.1" url:
"git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
"d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
733529
[RpcServer.reader=8,bindAddress=dx-pipe-regionserver7-online,port=16020]
INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
10.11.51.62 port: 32504 with version info: version: "1.1.1" url:
"git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
"d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
733552
[RpcServer.reader=9,bindAddress=dx-pipe-regionserver7-online,port=16020]
INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
10.11.53.52 port: 19005 with version info: version: "1.1.1" url:
"git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
"d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
2016-02-14T12:46:40.371+0800: [GC (Allocation Failure) [ParNew:
590769K->67969K(618048K), 0.0282183 secs] 1235934K->713134K(1991616K),
0.0284249 secs] [Times: user=0.27 sys=0.06, real=0.03 secs]





Any one meet this problem?

Re: Can't see any log in log file

2016-02-13 Thread Heng Chen

This happens after i upgrade my cluster from 0.98 to 1.1



2016-02-14 12:47 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:

> I am not sure why this happens,   this is my command
>
> maintain 11444 66.9  1.1 10386988 1485888 pts/0 Sl  12:33   6:30
> /usr/java/jdk/bin/java -Dproc_regionserver -XX:OnOutOfMemoryError=kill -9
> %p -Xmx8000m -XX:+UseConcMarkSweepGC -verbose:gc -XX:+PrintGCDetails
> -XX:+PrintGCDateStamps -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1
> -XX:GCLogFileSize=512M
> -Dhbase.log.dir=/home/maintain/hadoop/hbase/hbase-1.1.1/logs
> -Dhbase.log.file=hbase-maintain-regionserver-dx-pipe-regionserver7-online.log
> -Dhbase.home.dir=/home/maintain/hadoop/hbase/hbase-1.1.1
> -Dhbase.id.str=maintain -Dhbase.root.logger=INFO,RFA
> -Dhbase.security.logger=INFO,RFAS
> org.apache.hadoop.hbase.regionserver.HRegionServer start
>
>
> in  hbase-maintain-regionserver-dx-pipe-regionserver7-online.log there is
> only information below:
>
> Sun Feb 14 12:33:19 CST 2016 Starting regionserver on
> dx-pipe-regionserver7-online
> core file size  (blocks, -c) 1024
> data seg size   (kbytes, -d) unlimited
> scheduling priority (-e) 0
> file size   (blocks, -f) unlimited
> pending signals (-i) 514904
> max locked memory   (kbytes, -l) 64
> max memory size (kbytes, -m) unlimited
> open files  (-n) 65536
> pipe size(512 bytes, -p) 8
> POSIX message queues (bytes, -q) 819200
> real-time priority  (-r) 0
> stack size  (kbytes, -s) 8192
> cpu time   (seconds, -t) unlimited
> max user processes  (-u) 32764
> virtual memory  (kbytes, -v) unlimited
> file locks  (-x) unlimited
>
>
>
> It seems there are some logs in
> hbase-maintain-regionserver-dx-pipe-regionserver7-online.out,  but not
> complete.
>
> 713517
> [RpcServer.reader=5,bindAddress=dx-pipe-regionserver7-online,port=16020]
> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
> 10.11.51.75 port: 51239 with version info: version: "1.1.1" url:
> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
> 2016-02-14T12:46:20.880+0800: [GC (Allocation Failure) [ParNew:
> 599568K->41393K(618048K), 0.0334895 secs] 1230469K->686558K(1991616K),
> 0.0336985 secs] [Times: user=0.36 sys=0.06, real=0.04 secs]
> 723538
> [RpcServer.reader=6,bindAddress=dx-pipe-regionserver7-online,port=16020]
> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
> 10.11.53.52 port: 18965 with version info: version: "1.1.1" url:
> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
> 733529
> [RpcServer.reader=7,bindAddress=dx-pipe-regionserver7-online,port=16020]
> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
> 10.11.51.75 port: 51325 with version info: version: "1.1.1" url:
> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
> 733529
> [RpcServer.reader=8,bindAddress=dx-pipe-regionserver7-online,port=16020]
> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
> 10.11.51.62 port: 32504 with version info: version: "1.1.1" url:
> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
> 733552
> [RpcServer.reader=9,bindAddress=dx-pipe-regionserver7-online,port=16020]
> INFO  SecurityLogger.org.apache.hadoop.hbase.Server  - Connection from
> 10.11.53.52 port: 19005 with version info: version: "1.1.1" url:
> "git://hw11397.local/Volumes/hbase-1.1.1RC0/hbase" revision:
> "d0a115a7267f54e01c72c603ec53e91ec418292f" user: "ndimiduk" date: "Tue Jun
> 23 14:56:34 PDT 2015" src_checksum: "6e2d8cecbd28738ad86daacb25dc467e"
> 2016-02-14T12:46:40.371+0800: [GC (Allocation Failure) [ParNew:
> 590769K->67969K(618048K), 0.0282183 secs] 1235934K->713134K(1991616K),
> 0.0284249 secs] [Times: user=0.27 sys=0.06, real=0.03 secs]
>
>
>
>
>
> Any one meet this problem?
>
>

Re: After namenode failed, some regions stuck in Closed state

2016-01-11 Thread Heng Chen

)



2016-01-12 10:36 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:

> Looks like the picture didn't go through.
>
> Consider using third party image hosting site.
>
> Pastebinning server log would help.
>
> Cheers
>
> On Mon, Jan 11, 2016 at 6:28 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > [image: 内嵌图片 1]
> >
> >
> > HBASE-1.1.1  hadoop-2.5.0
> >
> >
> > I want to recovery this regions, how?  ask for help.
> >
>

Re: After namenode failed, some regions stuck in Closed state

2016-01-11 Thread Heng Chen

After assign manually,  everything is OK now.  Thanks Ted.

2016-01-12 11:37 GMT+08:00 Ted Yu <yuzhih...@gmail.com>:

> Do you see table descriptor (on hdfs) for region
> 4a5c3511dc0b880d063e56042a7da547 ?
>
> Have you run fsck to see if there is any corrupt block(s) ?
>
> Cheers
>
> On Mon, Jan 11, 2016 at 6:52 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > Some relates region log on RS
> >
> >
> > 2016-01-12 10:45:01,570 INFO
> >  [PriorityRpcServer.handler=14,queue=0,port=16020]
> > regionserver.RSRpcServices: Open
> >
> >
> PIPE.TABLE_CONFIG,\x01\x00\x00\x00\x00\x00,1451875306059.4a5c3511dc0b880d063e56042a7da547.
> > 2016-01-12 10:45:01,573 ERROR
> > [RS_OPEN_REGION-dx-pipe-regionserver3-online:16020-0]
> > handler.OpenRegionHandler: Failed open of
> >
> >
> region=PIPE.TABLE_CONFIG,\x01\x00\x00\x00\x00\x00,1451875306059.4a5c3511dc0b880d063e56042a7da547.,
> > starting to roll back the global memstore size.
> > java.lang.IllegalStateException: Could not instantiate a region instance.
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5836)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6143)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6115)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6071)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:6022)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:362)
> > at
> >
> >
> org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:129)
> > at
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
> > at
> >
> >
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
> > at java.lang.Thread.run(Thread.java:745)
> > Caused by: java.lang.reflect.InvocationTargetException
> > at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
> > at
> >
> >
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
> > at
> >
> >
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
> > at java.lang.reflect.Constructor.newInstance(Constructor.java:408)
> > at
> >
> org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:5833)
> > ... 10 more
> > Caused by: java.lang.IllegalArgumentException: Need table descriptor
> > at org.apache.hadoop.hbase.regionserver.HRegion.(HRegion.java:643)
> > at org.apache.hadoop.hbase.regionserver.HRegion.(HRegion.java:620)
> > ... 15 more
> >
> > 2016-01-12 10:42 GMT+08:00 Heng Chen <heng.chen.1...@gmail.com>:
> >
> > > Information from Web UI
> > >
> > > regionstate
> > >RIT
> > > 4a5c3511dc0b880d063e56042a7da547
> >
> PIPE.TABLE_CONFIG,\x01\x00\x00\x00\x00\x00,1451875306059.4a5c3511dc0b880d063e56042a7da547.
> > > state=CLOSED, ts=Tue Jan 12 10:18:06 CST 2016 (1243s ago),
> > > server=dx-pipe-regionserver3-online,16020,1452554429647
> > > 1243053
> > >
> > >
> > >
> > >
> > > Some error logs in master
> > >
> > > 2016-01-12 07:18:18,345 ERROR
> > > [PriorityRpcServer.handler=10,queue=0,port=16000]
> > master.MasterRpcServices:
> > > Region server dx-pipe-regionserver4-online,16020,1447236435629
> reported a
> > > fatal error:
> > > ABORTING region server
> dx-pipe-regionserver4-online,16020,1447236435629:
> > > Replay of WAL required. Forcing server shutdown
> > > Cause:
> > > org.apache.hadoop.hbase.DroppedSnapshotException: region:
> > > ape_fenbi_exercise,\xCF\xB7\x9D\x02\x00\x00\x00\x00_\x00\x00\x00\x00
> > >
> >
> ^-\xF0_\x00\x00\x00\x00\x00\x00\x00,,1451863106090.2ee0e6e2baed75e214cc4074ff51d33b.
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushCacheAndCommit(HRegion.java:2346)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.internalFlushcache(HRegion.java:2049)
> > > at
> > >
> >
> org.apache.hadoop.hbase.regionserver.HRegion.int

After namenode failed, some regions stuck in Closed state

2016-01-11 Thread Heng Chen

[image: 内嵌图片 1]


HBASE-1.1.1  hadoop-2.5.0


I want to recovery this regions, how?  ask for help.

Re: How to list the regions in an HBase table through the shell?

2015-12-03 Thread Heng Chen

@tedyu, should we add  something like 'list server table' to list all
regions in one table on some RS.

I found in my practice, it is always needed.

2015-12-04 4:48 GMT+08:00 Ted Yu :

> There is get_splits command but it only shows the splits.
>
> status 'detailed' would show you enough information
> e.g.
>
> "t1,30,1449175546660.da5f3853f6e59d1ada0a8554f12885ab."
> numberOfStores=1, numberOfStorefiles=0,
> storefileUncompressedSizeMB=0, lastMajorCompactionTimestamp=0,
> storefileSizeMB=0, memstoreSizeMB=0, storefileIndexSizeMB=0,
> readRequestsCount=0, writeRequestsCount=0, rootIndexSizeKB=0,
> totalStaticIndexSizeKB=0, totalStaticBloomSizeKB=0, totalCompactingKVs=0,
> currentCompactedKVs=0, compactionProgressPct=NaN, completeSequenceId=-1,
> dataLocality=0.0
>
> However, you need to parse the regions of the table you're interested in
>
> FYI
>
> On Thu, Dec 3, 2015 at 12:41 PM, Kevin Pauli  wrote:
>
> > I would like to get the same information about the regions of a table
> that
> > appear in the web UI (i.e. region name, region server, start/end key,
> > locality), but through the hbase shell.
> >
> > (The UI is flaky/slow, and furthermore I want to process this information
> > as
> > part of a script.)
> >
> > After much googling, I can't find out how, and this surprises me
> immensely.
> > version is 1.0.0.-cdh5.4.0
> >
> >
> >
> > --
> > View this message in context:
> >
> http://apache-hbase.679495.n3.nabble.com/How-to-list-the-regions-in-an-HBase-table-through-the-shell-tp4076402.html
> > Sent from the HBase User mailing list archive at Nabble.com.
> >
>

Re: Row Versions in Apache Hbase

2015-12-01 Thread Heng Chen

So, maybe we can use 1212 + customerId as rowKey.
btw, what is 1212 used for?

2015-12-01 17:49 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:

> Hi chen,
>
> yes I have customerid column to represent each customers
>
>
>
> On Tue, Dec 1, 2015 at 3:11 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > Hm.., is there anything unique like userId to represent one people？
> >
> >
> > 2015-12-01 16:33 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
> >
> > > Is there any other way to store only id becoz there may be new rows
> with
> > > the same name like
> > >
> > > 1212  |   xxxx | 20
> > > 1212  |   |  21
> > > 1212  |  | 22
> > >
> > >
> > > On Tue, Dec 1, 2015 at 1:59 PM, Heng Chen <heng.chen.1...@gmail.com>
> > > wrote:
> > >
> > > > Yeah,  if you want to get all records about 1212,  just scan rows
> with
> > > > prefix 1212
> > > >
> > > > 2015-12-01 16:27 GMT+08:00 Rajeshkumar J <
> rajeshkumarit8...@gmail.com
> > >:
> > > >
> > > > > so you want me to design row-key value by appending name column
> value
> > > to
> > > > > the rowkey
> > > > >
> > > > > On Tue, Dec 1, 2015 at 1:19 PM, Heng Chen <
> heng.chen.1...@gmail.com>
> > > > > wrote:
> > > > >
> > > > > > So, why not
> > > > > >
> > > > > > 1212-xxx20
> > > > > > 1212-yyy21
> > > > > > 1212-zzz22
> > > > > >
> > > > > > 2015-12-01 15:33 GMT+08:00 Rajeshkumar J <
> > > rajeshkumarit8...@gmail.com
> > > > >:
> > > > > >
> > > > > > > Hi
> > > > > > >
> > > > > > >   I meant like below is this possible
> > > > > > >
> > > > > > > Rowkey | column family
> > > > > > >
> > > > > > >Name | Age
> > > > > > >
> > > > > > > 1212 |    | 20
> > > > > > > 1212 |   | 21
> > > > > > > 1212  |  | 22
> > > > > > >
> > > > > > > On Tue, Dec 1, 2015 at 12:03 PM, Heng Chen <
> > > heng.chen.1...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > >
> > > > > > > > why not
> > > > > > > >
> > > > > > > > 1212 | 10, 11, 12, 13, 14, 15, 16, 27,  28 ?
> > > > > > > >
> > > > > > > > 2015-12-01 14:29 GMT+08:00 Rajeshkumar J <
> > > > > rajeshkumarit8...@gmail.com
> > > > > > >:
> > > > > > > >
> > > > > > > > > Hi Ted,
> > > > > > > > >
> > > > > > > > >   This is my use case. I have to store values like this is
> it
> > > > > > possible?
> > > > > > > > >
> > > > > > > > > RowKey | Values
> > > > > > > > >
> > > > > > > > > 1212   | 10,11,12
> > > > > > > > >
> > > > > > > > > 1212  | 13, 14, 15
> > > > > > > > >
> > > > > > > > > 1212  | 16,27,28
> > > > > > > > >
> > > > > > > > > Thanks
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > On Mon, Nov 30, 2015 at 10:40 PM, Ted Yu <
> > yuzhih...@gmail.com>
> > > > > > wrote:
> > > > > > > > >
> > > > > > > > > > Have you read
> > > http://hbase.apache.org/book.html#rowkey.design
> > > > ?
> > > > > > > > > >
> > > > > > > > > > bq. we can store more than one row for a row-key value.
> > > > > > > > > >
> > > > > > > > > > Can you clarify your intention / use case ? If row key is
> > the
> > > > > same,
> > > > > > > key
> > > > > > > > > > values would be in the same row.
> > > > > > > > > >
> > > > > > > > > > On Mon, Nov 30, 2015 at 8:30 AM, Rajeshkumar J <
> > > > > > > > > > rajeshkumarit8...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > >   I am new to Apache Hbase and I know that in a table
> > when
> > > we
> > > > > try
> > > > > > > to
> > > > > > > > > > insert
> > > > > > > > > > > row key value which is already present either new value
> > is
> > > > > > > discarded
> > > > > > > > or
> > > > > > > > > > > updated. Also I came across row version through which
> we
> > > can
> > > > > > store
> > > > > > > > > > > different versions of row key based on timestamp. Any
> one
> > > > > correct
> > > > > > > me
> > > > > > > > > if I
> > > > > > > > > > > am wrong? Also I need to know is there any way we can
> > store
> > > > > more
> > > > > > > than
> > > > > > > > > one
> > > > > > > > > > > row for a row-key value.
> > > > > > > > > > >
> > > > > > > > > > > Thanks
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Row Versions in Apache Hbase

2015-12-01 Thread Heng Chen

Yeah,  if you want to get all records about 1212,  just scan rows with
prefix 1212

2015-12-01 16:27 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:

> so you want me to design row-key value by appending name column value to
> the rowkey
>
> On Tue, Dec 1, 2015 at 1:19 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > So, why not
> >
> > 1212-xxx20
> > 1212-yyy21
> > 1212-zzz22
> >
> > 2015-12-01 15:33 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
> >
> > > Hi
> > >
> > >   I meant like below is this possible
> > >
> > > Rowkey | column family
> > >
> > >Name | Age
> > >
> > > 1212 |    | 20
> > > 1212 |   | 21
> > > 1212  |  | 22
> > >
> > > On Tue, Dec 1, 2015 at 12:03 PM, Heng Chen <heng.chen.1...@gmail.com>
> > > wrote:
> > >
> > > > why not
> > > >
> > > > 1212 | 10, 11, 12, 13, 14, 15, 16, 27,  28 ?
> > > >
> > > > 2015-12-01 14:29 GMT+08:00 Rajeshkumar J <
> rajeshkumarit8...@gmail.com
> > >:
> > > >
> > > > > Hi Ted,
> > > > >
> > > > >   This is my use case. I have to store values like this is it
> > possible?
> > > > >
> > > > > RowKey | Values
> > > > >
> > > > > 1212   | 10,11,12
> > > > >
> > > > > 1212  | 13, 14, 15
> > > > >
> > > > > 1212  | 16,27,28
> > > > >
> > > > > Thanks
> > > > >
> > > > >
> > > > > On Mon, Nov 30, 2015 at 10:40 PM, Ted Yu <yuzhih...@gmail.com>
> > wrote:
> > > > >
> > > > > > Have you read http://hbase.apache.org/book.html#rowkey.design ?
> > > > > >
> > > > > > bq. we can store more than one row for a row-key value.
> > > > > >
> > > > > > Can you clarify your intention / use case ? If row key is the
> same,
> > > key
> > > > > > values would be in the same row.
> > > > > >
> > > > > > On Mon, Nov 30, 2015 at 8:30 AM, Rajeshkumar J <
> > > > > > rajeshkumarit8...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > >   I am new to Apache Hbase and I know that in a table when we
> try
> > > to
> > > > > > insert
> > > > > > > row key value which is already present either new value is
> > > discarded
> > > > or
> > > > > > > updated. Also I came across row version through which we can
> > store
> > > > > > > different versions of row key based on timestamp. Any one
> correct
> > > me
> > > > > if I
> > > > > > > am wrong? Also I need to know is there any way we can store
> more
> > > than
> > > > > one
> > > > > > > row for a row-key value.
> > > > > > >
> > > > > > > Thanks
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>

Re: Row Versions in Apache Hbase

2015-11-30 Thread Heng Chen

why not

1212 | 10, 11, 12, 13, 14, 15, 16, 27,  28 ?

2015-12-01 14:29 GMT+08:00 Rajeshkumar J :

> Hi Ted,
>
>   This is my use case. I have to store values like this is it possible?
>
> RowKey | Values
>
> 1212   | 10,11,12
>
> 1212  | 13, 14, 15
>
> 1212  | 16,27,28
>
> Thanks
>
>
> On Mon, Nov 30, 2015 at 10:40 PM, Ted Yu  wrote:
>
> > Have you read http://hbase.apache.org/book.html#rowkey.design ?
> >
> > bq. we can store more than one row for a row-key value.
> >
> > Can you clarify your intention / use case ? If row key is the same, key
> > values would be in the same row.
> >
> > On Mon, Nov 30, 2015 at 8:30 AM, Rajeshkumar J <
> > rajeshkumarit8...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > >   I am new to Apache Hbase and I know that in a table when we try to
> > insert
> > > row key value which is already present either new value is discarded or
> > > updated. Also I came across row version through which we can store
> > > different versions of row key based on timestamp. Any one correct me
> if I
> > > am wrong? Also I need to know is there any way we can store more than
> one
> > > row for a row-key value.
> > >
> > > Thanks
> > >
> >
>

Re: Row Versions in Apache Hbase

2015-11-30 Thread Heng Chen

So, why not

1212-xxx20
1212-yyy21
1212-zzz22

2015-12-01 15:33 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:

> Hi
>
>   I meant like below is this possible
>
> Rowkey | column family
>
>Name | Age
>
> 1212 |    | 20
> 1212 |   | 21
> 1212  |  | 22
>
> On Tue, Dec 1, 2015 at 12:03 PM, Heng Chen <heng.chen.1...@gmail.com>
> wrote:
>
> > why not
> >
> > 1212 | 10, 11, 12, 13, 14, 15, 16, 27,  28 ?
> >
> > 2015-12-01 14:29 GMT+08:00 Rajeshkumar J <rajeshkumarit8...@gmail.com>:
> >
> > > Hi Ted,
> > >
> > >   This is my use case. I have to store values like this is it possible?
> > >
> > > RowKey | Values
> > >
> > > 1212   | 10,11,12
> > >
> > > 1212  | 13, 14, 15
> > >
> > > 1212  | 16,27,28
> > >
> > > Thanks
> > >
> > >
> > > On Mon, Nov 30, 2015 at 10:40 PM, Ted Yu <yuzhih...@gmail.com> wrote:
> > >
> > > > Have you read http://hbase.apache.org/book.html#rowkey.design ?
> > > >
> > > > bq. we can store more than one row for a row-key value.
> > > >
> > > > Can you clarify your intention / use case ? If row key is the same,
> key
> > > > values would be in the same row.
> > > >
> > > > On Mon, Nov 30, 2015 at 8:30 AM, Rajeshkumar J <
> > > > rajeshkumarit8...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > >   I am new to Apache Hbase and I know that in a table when we try
> to
> > > > insert
> > > > > row key value which is already present either new value is
> discarded
> > or
> > > > > updated. Also I came across row version through which we can store
> > > > > different versions of row key based on timestamp. Any one correct
> me
> > > if I
> > > > > am wrong? Also I need to know is there any way we can store more
> than
> > > one
> > > > > row for a row-key value.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>

Re: Error while loading bulk data from pig to hbase

2015-11-18 Thread Heng Chen

org.apache.pig.backend.hadoop.hbase.HBaseStorage  is in pig project.

*ERROR:pig script failed to validate: java.lang.RuntimeException: could not
instantiate 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with
arguments.*

This message means the arguments is not correct.
Please check your argument format, is it right?


Thanks

2015-11-18 17:36 GMT+08:00 Amit Hora :

> If you are cool with using some other tool for bulk insertion you can go
> with Apache Flume.
>
> -Original Message-
> From: "Nishant Aggarwal" 
> Sent: ‎18-‎11-‎2015 14:28
> To: "user@hbase.apache.org" 
> Subject: Re: Error while loading bulk data from pig to hbase
>
> Dear All,
>
> Please help us on this. We need to bulk import data into Hbase using pig
> (or any alternate way).
>
> Any help on this will be appreciated.
>
> Thanks and Regards
> Nishant Aggarwal, PMP
> Cell No:- +91 99588 94305
> http://in.linkedin.com/pub/nishant-aggarwal/53/698/11b
>
>
> On Tue, Nov 17, 2015 at 4:10 AM, Laurent H 
> wrote:
>
> > I remember that Pig lib with HBaseStorage (0.13 or 0.14) doesn't accept
> > bulk loading, (if you look at the java class, you could see that there is
> > only put method and no bulk function...) Hope it's available righ now !
> >
> > --
> > Laurent HATIER - Consultant Big Data & Business Intelligence chez
> CapGemini
> > fr.linkedin.com/pub/laurent-hatier/25/36b/a86/
> > 
> >
> > 2015-11-05 13:58 GMT+01:00 Naresh Reddy <
> naresh.re...@aletheconsulting.com
> > >:
> >
> > > Hi
> > >
> > > I have already replaced the hbase version with
> "*hbase95.version=1.1.2*"
> > in
> > > libraries.properties file and compiled it, but I am getting the same
> > error.
> > >
> > > Regards
> > > Naresh
> > >
> > > On Wed, Nov 4, 2015 at 11:29 PM, Daniel Dai  wrote:
> > >
> > > > Will need to change ivy/libraries.properties, specify the right hbase
> > > > version and compile again.
> > > >
> > > > On Wed, Nov 4, 2015 at 6:31 AM, Ted Yu  wrote:
> > > >
> > > > > ... 22 moreCaused by: java.lang.NoSuchMethodError:
> > > > > org.apache.hadoop.hbase.client.Scan.setCacheBlocks(Z)Vat
> > > > >
> > > > > Looks like the version of Pig you use is not compiled against hbase
> > > 1.1.2
> > > > >
> > > > > This is related:
> > > > > Author: Enis Soztutar 
> > > > > Date:   Fri Sep 5 18:48:38 2014 -0700
> > > > >
> > > > > HBASE-10841 Scan,Get,Put,Delete,etc setters should consistently
> > > > return
> > > > > this
> > > > >
> > > > > FYI
> > > > >
> > > > > On Tue, Nov 3, 2015 at 10:36 PM, Naresh Reddy <
> > > > > naresh.re...@aletheconsulting.com> wrote:
> > > > >
> > > > >> Hi
> > > > >> Thanks for the reply.Below is the full error log.
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >>
> > > > >> *Pig Stack Trace---ERROR 1200: Pig script failed to
> > parse:
> > > > >>  pig script failed to validate:
> > > > >> java.lang.RuntimeException: could not instantiate
> > > > >> 'org.apache.pig.backend.hadoop.hbase.HBaseStorage' with arguments
> > > > >> '[info:fname info:lname]'Failed to parse: Pig script failed to
> > parse:
> > > > >>  > > > >> 2, column 0> pig script failed to validate:
> > > java.lang.RuntimeException:
> > > > >> could not instantiate
> > > 'org.apache.pig.backend.hadoop.hbase.HBaseStorage'
> > > > >> with arguments '[info:fname info:lname]'at
> > > > >>
> > > >
> > org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:199)
> > > > >> at
> org.apache.pig.PigServer$Graph.validateQuery(PigServer.java:1707)
> > > > at
> > > > >> org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1680)
> > > at
> > > > >> org.apache.pig.PigServer.registerQuery(PigServer.java:623)at
> > > > >>
> > >
> org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:1063)
> > > > >> at
> > > > >>
> > > > >>
> > > >
> > >
> >
> org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:501)
> > > > >> at
> > > > >>
> > > > >>
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:230)
> > > > >> at
> > > > >>
> > > > >>
> > > >
> > >
> >
> org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:205)
> > > > >> at org.apache.pig.tools.grunt.Grunt.run(Grunt.java:66)at
> > > > >> org.apache.pig.Main.run(Main.java:558)at
> > > > >>

Re: HBase client and server version is not compatible lead regionserver down

2015-11-18 Thread Heng Chen

It cause regionserver down?  Oh,  could you post some regionserver logs?

2015-11-18 16:22 GMT+08:00 聪聪 <175998...@qq.com>:

> We recently found that regionserver down.Later, we found that because the
> client and server version is not compatible.The client version is
> 1.0,server version is 0.98.6.I want to know why this is, and whether there
> is a better protection mechanism.How to avoid this problem, because some
> development will appear this kind of mistake operation.

Re: isolation level of put and scan

2015-11-17 Thread Heng Chen

Oh, it will never happen.

Each put will acquire row lock to guarantee consistency.

2015-11-17 16:20 GMT+08:00 hongbin ma :

> i found a good article
> https://blogs.apache.org/hbase/entry/apache_hbase_internals_locking_and
> which seems to have answered my question.
>
> so the my described scenario will NEVER happen no matter whether A and B
> are in same column family? please kindly confirm
>
> thanks
>
> On Tue, Nov 17, 2015 at 4:07 PM, hongbin ma  wrote:
>
> > hi,experts:
> >
> > i have two concurrent threads to read/write same the same htable row.
> > this row has two columns A and B.
> >
> > Currently this rows value is (A:a1, B:b1)
> > thread 1 wants to read the value of this row's column value for A, B
> > and thread 2 wants to update this row to (A:a2, B:b2)
> >
> > if thread 1,2 happens at the same time, is it possible that thread 1 gets
> > (A:a1 B:b2) or (A:a2 B:b1) ?
> >
> > I'm asking this because this document
> > https://hbase.apache.org/acid-semantics.html describes  hbase's
> isolation
> > level as "read committed", and "read committed" may not prevent the above
> > scenario.
> > 
> > Will the answer vary if A,B are of the same column family or not?
> > and will checkAndPut help here?
> >
> > thanks in advance.
> >
> > --
> > Regards,
> >
> > *Bin Mahone | 马洪宾*
> > Apache Kylin: http://kylin.io
> > Github: https://github.com/binmahone
> >
>
>
>
> --
> Regards,
>
> *Bin Mahone | 马洪宾*
> Apache Kylin: http://kylin.io
> Github: https://github.com/binmahone
>

Re: Hbase put multiple values in a single cell

2015-10-23 Thread Heng Chen

How about this way?

 rowkey:Parent1, cf-children:  col1-key: Child1Name,  col1-value: Child1
Information
col2-key: Child2Name ,
col2-value: child2 information
..

You can get one child information easily.  But if you want to modify child
information, it is difficult.

2015-10-23 17:52 GMT+08:00 Spico Florin :

> Hello!
> RThank you very much for your responses. Actually the requirement is like
> this: I would like to retrieve  all the  information about a kid. So the
> data is like this:
> Parent1, Child1Name,Child1Age,Child1Height
> Parent1, Child2Name,Child2Age,Child2Height
> Parent1, Child3Name,Child3Age,Child3Height
> I would like to store  all data about the child (Name, Age,Heoght)  in a
> single cell value having column the child identifier.
> So, what is the best aproach here?
>
> On Thu, Oct 22, 2015 at 2:23 PM, Jeetendra Gangele 
> wrote:
>
> > As Ted mentioned if you need to query the data on oneChildren it will
> read
> > unnecessary data.
> > But if your requirements are like that you some separator be careful
> > everything is byte here..
> >
> > On 22 October 2015 at 16:37, Ted Yu  wrote:
> >
> > > Can you give some detail on why the 3 children names need to be in same
> > > cell (instead of under different columns) ?
> > > I assume the combination of children names varies. If you want to query
> > > data for specific child (e.g. Child1Name), you may read unnecessary
> data
> > > which is discarded after parsing.
> > >
> > > Cheers
> > >
> > > On Thu, Oct 22, 2015 at 1:35 AM, Spico Florin 
> > > wrote:
> > >
> > > > In HBase, I would like to store more many record data into the cell
> > > value.
> > > > For example, given the record: ParentId1,
> > > Child1Name,Child2Name,Child3Name
> > > > I would like to store it as:
> > > >
> > > > rowkey:ParentI1,cf-children:col-name:Child1Name,Chil2Name,Child3Name.
> > > >
> > > > So in the cell value I would like to add all the childrens name.
> > Should I
> > > > use a separator for storing these names or there is an API or best
> > > > practices how to store them? Thank you.\
> > > >
> > > > Regards,
> > > >
> > > >  Florin
> > > >
> > >
> >
>

49 matches

Mail list logo