Hi,
We have a large 60 node CDH 5.5.2 Hbase 1.0.0 cluster that take a very
heavy write load. For increased performance, we are using the
BufferedMutator class in hbase-client, although we're using hbase-client
version 1.2.0 because it has a small performance fix to this class.
It seems to be work
Hi,
We have a large HBase 1.x cluster in AWS and have disabled automatic major
compaction as advised. We were running our own code for compaction daily
around midnight which calls HBaseAdmin.majorCompactRegion(byte[]
regionName) in a rolling fashion across all regions.
But we missed the fact that
Hi,
I found this blog post from 2014 on sources of HBase client side latency
which I found useful:
https://hadoop-hbase.blogspot.com/2014/08/hbase-client-response-times.html?showComment=1461099797978#c5266762058464276023
Since this is a bit dated, anyone have any other sources of latency to add?
Apr 19, 2016 at 6:35 PM, Stack wrote:
> On Tue, Apr 19, 2016 at 2:07 PM, Saad Mufti wrote:
>
> > Hi,
> >
> > I found this blog post from 2014 on sources of HBase client side latency
> > which I found useful:
> >
> >
> >
> https://hadoop-hbase
Thanks for the pointer. Working like a charm.
Saad
On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu wrote:
> Please use the following method of HBaseAdmin:
>
> public CompactionState getCompactionStateForRegion(final byte[]
> regionName)
>
> Cheers
>
> On Tue, Apr 19
This is from just one region server right? are you sure it is co-located
with an HDFS data node after your upgrade?
I imagine that is pretty obvious thing to check but the only thing I can
think of.
Saad
On Wed, Apr 20, 2016 at 10:30 AM, Ted Tuttle wrote:
> Hello-
>
> We just upgraded to
Why can't you install hbase on your local machine, with the configuration
pointing it to your desired cluster, then run the hbase shell and its
script locally?
I believe the HBase web UI has a convenient link to download client
configuration.
Saad
On Sun, Apr 24, 2016 at 5:22 PM, Saurabh M
Hi,
In our large HBase cluster based on CDH 5.5 in AWS, we're constantly seeing
the following messages in the region server logs:
2016-04-25 14:02:55,178 INFO
org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 ms,
current pipeline:
[DatanodeInfoWithStorage[10.99.182.165:50010,DS
or performance or other reasons.
>
> -Saurabh
>
> -----Original Message-
> From: Saad Mufti [mailto:saad.mu...@gmail.com]
> Sent: Sunday, April 24, 2016 2:55 PM
> To: user@hbase.apache.org
> Subject: Re: Hbase shell script from java
>
> Why can't you install hb
:
> w.r.t. the pipeline, please see this description:
>
> http://itm-vm.shidler.hawaii.edu/HDFS/ArchDocUseCases.html
>
> On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti wrote:
>
> > Hi,
> >
> > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly
&
as the
> 250ms default chosen with SSDs and 10ge in mind or something? I guess I'm
> surprised a sync write several times through JVMs to 2 remote datanodes
> would be expected to consistently happen that fast.
>
> Regards,
>
> On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti wrote
our CDH5 HBase clusters too. We
> eventually correlated it very closely to GC pauses. Through heavily tuning
> our GC we were able to drastically reduce the logs, by keeping most GC's
> under 100ms.
>
> On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti wrote:
>
> > From
Hi,
Does anyone have experience with HBase write performance under auto-split
conditions? Out keyspace is randomized so all regions roughly start
auto-splitting around the same time, although early on when we had the 1024
regions we started with, they all decided to do so within an hour or so and
gt; We will also have a blog post coming out in the next week or so that talks
> specifically to tuning G1GC for HBase. I can update this thread when that's
> available.
>
> On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti wrote:
>
> > That is interesting. Would it be possible
gt;
> http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection
> > >
> > > We will also have a blog post coming out in the next week or so that
> > talks
> > > specifically to tuning G1GC for HBase. I can update this thread
.
>
> -Vlad
>
> On Wed, Apr 27, 2016 at 8:27 AM, Saad Mufti wrote:
>
> > Hi,
> >
> > Does anyone have experience with HBase write performance under auto-split
> > conditions? Out keyspace is randomized so all regions roughly start
> > auto-splitting
he system. For
> instance if they started spamming a lot of too large requests, or badly
> filtered scans, etc. In the detention queue, they use their own RPC
> handlers which we can aggressively limit or reject if need be to preserve
> the cluster.
>
> Hope this helps
>
> On We
e.
Thanks.
Saad
On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti wrote:
> Thanks for the pointer. Working like a charm.
>
>
> Saad
>
>
> On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu wrote:
>
>> Please use the following method of HBaseAdmin:
>>
>> public Co
is a scary thing to do,
> but tell region servers to do MC.
>
> We have it running in our cluster for about 10 hours a day and it has
> virtually no impact to applications and the cluster is doing far better
> than when using default scheduled MC.
>
>
> -Original Message
ion is
> considered idle.
>
> -----Original Message-
> From: Saad Mufti [mailto:saad.mu...@gmail.com]
> Sent: Friday, April 29, 2016 5:37 PM
> To: user@hbase.apache.org
> Subject: Re: Major Compaction Strategy
>
> Unfortunately all our tables and regions are active 24/7. Tra
HI,
We're running a CDH 5.5.2 HBase cluster (HBase Version 1.0.0-cdh5.5.2,
revision=Unknown). We are using the per-cell TTL feature (Mutation.setTTL)
As I learn more about and read up on HBase, I realized that in our HBase
config hfile.format.version was set to 2 (the default, we haven't touche
There is no real column schema in HBase other than defining the column
family, each write to a column writes a cell with the column name plus
value, so in theory number of columns doesn't really matter. What matters
is how much data you read and write.
That said there are settings in the column fa
Hi,
Don't have anything conclusive but I have seen some correlation where in
very high write rate situation, the write rate can increase when major
compaction or some other high CPU/network activity (for example we run some
Spark jobs on our replica HBase cluster) stops happening on the replica
cl
Hi,
We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
hotspotting due to inadvertent data patterns by prepending an MD5 based 4
digit hash prefix to all our data keys. This works fine most of the times,
but more and more (as much as once or twice a day) recently we have
occas
ach
>
>
>
> > On Dec 1, 2016, at 1:50 PM, Saad Mufti wrote:
> >
> > Hi,
> >
> > We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid
> > hotspotting due to inadvertent data patterns by prepending an MD5 based 4
> > digit hash pre
FWIW, in my company (AOL) we discovered a small elegant all client side
transaction library on top of HBase, originally written by a Korea based
team, called Haeinsa. It doesn't look active anymore so we had to fork it
and have done a couple of minor enhancements and one bugfix, but has been
workin
t; their meta.
>
> Key word is supposed. We have seen meta hot spotting from time to time
> and on different versions at Splice Machine.
>
> How confident are you in your hashing algorithm?
>
> Regards,
> John Leach
>
>
>
> > On Dec 1, 2016, at 2:25 PM, S
happens
> again ?
>
> Thanks
>
> > On Dec 2, 2016, at 4:48 AM, Saad Mufti wrote:
> >
> > We used a pre-split into 1024 regions at the start but we miscalculated
> our
> > data size, so there were still auto-splits storms at the beginning as
> data
> > size s
One way to do this without knowing your data (still need some idea of size
of keyspace) is to prepend a fixed numeric prefix from a suitable range
based on a good hash like MD5. For example, let us say you can predict your
data will fit in about 1024 regions. You can decide to prepend a prefix
from
Forgot to mention in above example you would presplit into 1024 regions,
starting from "" to "1023" (start keys).
Cheers.
Saad
On Fri, Dec 2, 2016 at 8:47 AM, Saad Mufti wrote:
> One way to do this without knowing your data (still need some idea of size
>
, Dec 1, 2016 at 6:08 PM, Saad Mufti wrote:
> Sure will, the next time it happens.
>
> Thanks!!!
>
>
> Saad
>
>
> On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu wrote:
>
>> From #2 in the initial email, the hbase:meta might not be the cause for
>> the hots
un(CallRunner.java:107)
> at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop(
> RpcExecutor.java:130)
> at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107)
> at java.lang.Thread.run(Thread.java:745)
>
>
> Too many writers being blocked a
No.
Saad
On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu wrote:
> Some how I couldn't access the pastebin (I am in China now).
> Did the region server showing hotspot host meta ?
> Thanks
>
> On Friday, December 2, 2016 11:53 AM, Saad Mufti
> wrote:
>
>
&g
;
> > I would check compaction, investigate throttling if it's causing high
> CPU.
> >
> > On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti wrote:
> >
> > > No.
> > >
> > >
> > > Saad
> > >
> > >
> > > On Fri, Dec 2,
Hi,
We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on HBase
is heavy and a mix of reads and writes. For a few months we have had a
problem where occasionally (once a day or more) one of the region servers
starts consuming close to 100% CPU. This causes all the client thread pool
-tuning-tips.html
>
>
>
> Sent from my iPhone
>
> > On Mar 1, 2017, at 6:06 AM, Saad Mufti wrote:
> >
> > Hi,
> >
> > We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on HBase
> > is heavy and a mix of reads and writes. For a few mont
Hi,
I have a main HBase 1.x cluster and some of the tables are being replicated
to a separate HBase cluster of the same version, and the table schemas are
identical. The column family being used has TTL set to "FOREVER", but we do
a per put TTL in every Put we issue on the main cluster.
Data is b
rs?
>
> -Anoop-
>
> On Thu, Apr 27, 2017 at 2:08 AM, Saad Mufti wrote:
> > Hi,
> >
> > I have a main HBase 1.x cluster and some of the tables are being
> replicated
> > to a separate HBase cluster of the same version, and the table schemas
> are
> > ide
x27;m not clear) to client side code. So how can I verify that a Cell in one
cluster has the TTL tag whereas the same replicated C3ell in the next
cluster does or doesn't?
Thanks.
Saad
On Fri, Apr 28, 2017 at 1:06 PM, Saad Mufti wrote:
> Thanks for the feedback, I have confirmed tha
t; be able to retrieve the tags back to client side and check
>
> -Anoop-
>
> On Mon, May 1, 2017 at 2:59 AM, Saad Mufti wrote:
> > Is there any facility to check what tags are on a Cell from a client side
> > program? I started writing some Java code to look at the tags
Hi,
I'm looking for some guidance as our security team is requiring us to
implement encryption of our HBase data at rest and in motion. I'm reading
the docs and doing research and the choice seems to be between doing it at
the HBase level or the more general HDFS level.
I am leaning towards HDFS
Thank you everyone for the feedback. It was very helpful.
Cheers.
---
Saad Mufti
On Fri, Aug 18, 2017 at 3:20 PM, Andrew Purtell wrote:
> The Hadoop KMS in 2.6 or 2.7 can be suitable for demos or prototypes but I
> would advise against using it for more than that. Recent
Hi,
We have an HBase system running HBase 1.3.1 on an AWS EMR service. Our
BucketCache is configured for 400 GB on a set of attached EBS disk volumes,
with all column families marked for in-memory in their column family
schemas using INMEMORY => 'true' (except for one column family we only ever
wr
Sorry I meant BLOCKCACHE => 'false' on the one column family we don't want
getting cached.
Cheers.
Saad
On Sun, Feb 18, 2018 at 6:51 PM, Saad Mufti wrote:
> Hi,
>
> We have an HBase system running HBase 1.3.1 on an AWS EMR service. Our
> BucketCache is con
is zero but the
> #evicted blocks are there. Those might be the blocks of the compacted
> away files. Hope this helps you to understand what is going on.
>
> -Anoop-
>
>
> On Mon, Feb 19, 2018 at 5:25 AM, Saad Mufti wrote:
> > Sorry I meant BLOCKCACHE =>
HI,
I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is
configured to use two attached EBS disks of 50 GB each and I provisioned
the bucket cache to be a bit less than the total, at a total of 98 GB per
instance to be on the safe side. My tables have column families set to
prefetch
Hi,
We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
situation where sometimes a particular region gets into a situation where a
lot of write requests to any row in that region timeout saying they failed
to obtain a lock on a row in a region and eventually they experience
IOEngine");
>
> disableCache();
>
> Can you search in the region server log to see if the above occurred ?
>
> Was this server the only one with disabled cache ?
>
> Cheers
>
> On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti
> wrote:
>
> > HI,
&
ache, please check your IOEngine");
> >
> > disableCache();
> >
> > Can you search in the region server log to see if the above occurred ?
> >
> > Was this server the only one with disabled cache ?
> >
> > Cheers
> >
> > On Sun, F
.
Saad
On Wed, Feb 28, 2018 at 9:31 PM, Saad Mufti wrote:
> Hi,
>
> We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a
> situation where sometimes a particular region gets into a situation where a
> lot of write requests to any row in that region time
patch is for hbase or some other component ?
>
> Thanks
>
> On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti wrote:
>
> > Thanks for the feedback, so you guys are right the bucket cache is
> getting
> > disabled due to too many I/O errors from the underlying files making u
there was correlation between this duration and timeout) ?
>
> Cheers
>
> On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti wrote:
>
> > Hi,
> >
> > We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing
> a
> > situation where sometimes a particular r
tayed stable and eventually recovered,
although it did suffer all those timeouts.
Saad
On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti wrote:
> I'll paste a thread dump later, writing this from my phone :-)
>
> So the same issue has happened at different times for different region
or so and sometimes longer.
Saad
On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti wrote:
> Unfortunately I lost the stack trace overnight. But it does seem related
> to compaction, because now that the compaction tool is done, I don't see
> the issue anymore. I will run our incr
to recover
faster. I haven't quite tested that yet, any advice in the meantime would
be appreciated.
Cheers.
Saad
On Thu, Mar 1, 2018 at 9:21 AM, Saad Mufti wrote:
> Actually it happened again while some minior compactions were running, so
> don't think it related to our maj
could try to see how things work when a read happens from S3 and after
> the
> > prefetch completes ensure the same checkandPut() is done (from cache this
> > time) to really know the difference what S3 does there.
> >
> > Regards
> > Ram
> >
> > On Fri
Hi,
I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is
no Hbase installed on the cluster, only HBase libs linked to my Spark app.
We are reading the snapshot info from a HBase folder in S3 using
TableSnapshotInputFormat class from HBase 1.4.0 to have the Spark job read
snaps
, it looks like this setting also has the good effect of preventing
clients from hammering a region server that is slow because its IPC queues
are backed up, allowing it to recover faster.
Does that make sense?
Cheers.
Saad
On Sat, Mar 10, 2018 at 7:04 PM, Saad Mufti wrote:
> So i
Are you using AuthUtil class to reauthenticate? This class is in Hbase, and
uses the Hadoop class UserGroupInformation to do the actual login and
re-login. But, if your UserGroupInformation class is from Hadoop 2.5.1 or
earlier, it has a bug if you are using Java 8, as most of us are. The
relogin c
, Mar 10, 2018 at 8:04 PM, Saad Mufti wrote:
> Also, for now we have mitigated this problem by using the new setting in
> HBase 1.4.0 that prevents one slow region server from blocking all client
> requests. Of course it causes some timeouts but our overall ecosystem
> contains Kafk
See below more I found on item 3.
Cheers.
Saad
On Sat, Mar 10, 2018 at 7:17 PM, Saad Mufti wrote:
> Hi,
>
> I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is
> no Hbase installed on the cluster, only HBase libs linked to my Spark app.
> We are readi
r
the Spark job.
Saad
On Sat, Mar 10, 2018 at 9:51 PM, Saad Mufti wrote:
> See below more I found on item 3.
>
> Cheers.
>
>
> Saad
>
> On Sat, Mar 10, 2018 at 7:17 PM, Saad Mufti wrote:
>
>> Hi,
>>
>> I am running a Spark job (Spark 2.2
imes the
> compacted result file could be so large (what is major compaction) and
> that will exhaust the BC if written. Also it might contain some data
> which are very old. There is a jira recently raised jira which
> discuss abt this. Pls see HBASE-20045
>
>
> -Anoop-
>
&
lling to look at my patch? I have never done this before, so
would appreciate a quick pointer on how to send a patch and get some quick
feedback.
Cheers.
Saad
On Sat, Mar 10, 2018 at 9:56 PM, Saad Mufti wrote:
> The question remain though of why it is even accessing a column family
gly.
>
> Thanks
>
> On Mon, Mar 12, 2018 at 8:43 AM, Saad Mufti wrote:
>
> > I have create a company specific branch and added 4 new flags to control
> > this behavior, these gave us a huge performance boost when running Spark
> > jobs on snapshots of very large ta
Another option if you have enough disk space/off heap memory space is to
enable bucket cache to cache even more of your data, and set the
PREFETCH_ON_OPEN => true option on the column families you want always
cache. That way HBase will prefetch your data into the bucket cache and
your scan won't ha
Hi,
We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.
We have started seeing the following stacktrace when trying to take a
snapshot of a table with a very large number of files (12000 regions and
roughly 36 - 40 files). The number of files should go down as we
haven't been compa
ured as 64MB or 128MB.
>
> Regards,
>
> Huaxiang Sun
>
>
> > On Mar 19, 2018, at 10:16 AM, Saad Mufti wrote:
> >
> > Hi,
> >
> > We are running on HBase 1.4.0 on an AWS EMR/HBase cluster.
> >
> > We have started seeing the following stacktrac
gt;
> > On Mar 19, 2018, at 10:52 AM, Saad Mufti wrote:
> >
> > Thanks!!! Wish that was documented somewhere in the manual.
> >
> > Cheers.
> >
> >
> > Saad
> >
> >
> > On Mon, Mar 19, 2018 at 1:38 PM, Huaxiang Sun wrote:
&g
Hi,
We are using the stochastic load balancer, and have tuned it to do a
maximum of 1% of regions in any calculation. But it is way too conservative
after that, it moves one region at a time. Is there a way to tell it to go
faster with whatever number of regions it decided to do? I have been
looki
Hi,
We are using HBase 1.4.0 on AWS EMR based Hbase. Since snapshots are in S3,
they take much longer than when using local disk. We have a cron script to
take regular snapshots as backup, and they fail quite often on our largest
table which takes close to an hour to complete the snapshot.
The on
computing new load balance
> plan. Computation took 1200227ms to try 2254 different iterations. Found
> a solution that moves 550 regions; Going from a computed cost of
> 77.52829271038965 to a new cost of 74.32764924425548
> >
> > If you have a dev cluster, you can try diffe
We are facing the exact same symptoms in HBase 1.4.0 running on AWS EMR
based cluster, and desperately need to take a snapshot to feed a downstream
job. So far we have tried using the "assign" command on all regions
involved to move them around but the snapshot still fails. Also saw the
same error
is available in 1.4
>
> -Vlad
>
> On Tue, Mar 20, 2018 at 8:00 PM, Saad Mufti wrote:
>
> > Hi,
> >
> > We are using HBase 1.4.0 on AWS EMR based Hbase. Since snapshots are in
> S3,
> > they take much longer than when using local disk. We have a cron script
r,region merging and split before snapshot should help.
> > This works in 2.0
> >
> > Not sure if merge/split switch is available in 1.4
> >
> > -Vlad
> >
> > On Tue, Mar 20, 2018 at 8:00 PM, Saad Mufti
> wrote:
> >
> > > Hi,
> > >
>
Restarting the region server worked for us to recover from this error.
Saad
On Fri, Mar 23, 2018 at 7:19 PM, Saad Mufti wrote:
> We are facing the exact same symptoms in HBase 1.4.0 running on AWS EMR
> based cluster, and desperately need to take a snapshot to feed a downstream
>
Hi,
Here is my scenario, I have two secure/authenticated EMR based HBase
clusters, both have their own cluster dedicated KDC (using EMR support for
this which means we get Kerberos support by just turning on a config flag).
Now we want to get replication going between them. For other application
I think TableInputFormat will try to maintain as much locality as possible,
assigning one Spark partition per region and trying to assign that
partition to a YARN container/executor on the same node (assuming you're
using Spark over YARN). So the reason for the uneven distribution could be
that you
I am not clear how your snapshot even succeeds if this is the case. The
snapshot taking procedure includes a check for consistency at the end and
throws an exception on problems like this. I would run an hbck command on
your table to check if there are any consistency errors. It also has repair
op
ros principal hbase/@PGS.dev when I ran
the add_peer command
Thanks for taking the time to help me in any way you can.
Saad
On Wed, May 23, 2018 at 7:24 AM, Reid Chan wrote:
> Three places to check,
>
>
> 1. Would you mind showing your "/etc/zookeeper/conf/server-jaa
80 matches
Mail list logo