CDH Hbase 1.0.0 Requested row out of range during auto-split

2016-04-02 Thread Saad Mufti
Hi, We have a large 60 node CDH 5.5.2 Hbase 1.0.0 cluster that take a very heavy write load. For increased performance, we are using the BufferedMutator class in hbase-client, although we're using hbase-client version 1.2.0 because it has a small performance fix to this class. It seems to be work

Major Compaction Strategy

2016-04-19 Thread Saad Mufti
Hi, We have a large HBase 1.x cluster in AWS and have disabled automatic major compaction as advised. We were running our own code for compaction daily around midnight which calls HBaseAdmin.majorCompactRegion(byte[] regionName) in a rolling fashion across all regions. But we missed the fact that

Sources Of HBase Client Side Latency

2016-04-19 Thread Saad Mufti
Hi, I found this blog post from 2014 on sources of HBase client side latency which I found useful: https://hadoop-hbase.blogspot.com/2014/08/hbase-client-response-times.html?showComment=1461099797978#c5266762058464276023 Since this is a bit dated, anyone have any other sources of latency to add?

Re: Sources Of HBase Client Side Latency

2016-04-20 Thread Saad Mufti
Apr 19, 2016 at 6:35 PM, Stack wrote: > On Tue, Apr 19, 2016 at 2:07 PM, Saad Mufti wrote: > > > Hi, > > > > I found this blog post from 2014 on sources of HBase client side latency > > which I found useful: > > > > > > > https://hadoop-hbase

Re: Major Compaction Strategy

2016-04-20 Thread Saad Mufti
Thanks for the pointer. Working like a charm. Saad On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu wrote: > Please use the following method of HBaseAdmin: > > public CompactionState getCompactionStateForRegion(final byte[] > regionName) > > Cheers > > On Tue, Apr 19

Re: zero data locality

2016-04-20 Thread Saad Mufti
This is from just one region server right? are you sure it is co-located with an HDFS data node after your upgrade? I imagine that is pretty obvious thing to check but the only thing I can think of. Saad On Wed, Apr 20, 2016 at 10:30 AM, Ted Tuttle wrote: > Hello- > > We just upgraded to

Re: Hbase shell script from java

2016-04-24 Thread Saad Mufti
Why can't you install hbase on your local machine, with the configuration pointing it to your desired cluster, then run the hbase shell and its script locally? I believe the HBase web UI has a convenient link to download client configuration. Saad On Sun, Apr 24, 2016 at 5:22 PM, Saurabh M

Slow sync cost

2016-04-25 Thread Saad Mufti
Hi, In our large HBase cluster based on CDH 5.5 in AWS, we're constantly seeing the following messages in the region server logs: 2016-04-25 14:02:55,178 INFO org.apache.hadoop.hbase.regionserver.wal.FSHLog: Slow sync cost: 258 ms, current pipeline: [DatanodeInfoWithStorage[10.99.182.165:50010,DS

Re: Hbase shell script from java

2016-04-25 Thread Saad Mufti
or performance or other reasons. > > -Saurabh > > -----Original Message- > From: Saad Mufti [mailto:saad.mu...@gmail.com] > Sent: Sunday, April 24, 2016 2:55 PM > To: user@hbase.apache.org > Subject: Re: Hbase shell script from java > > Why can't you install hb

Re: Slow sync cost

2016-04-25 Thread Saad Mufti
: > w.r.t. the pipeline, please see this description: > > http://itm-vm.shidler.hawaii.edu/HDFS/ArchDocUseCases.html > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti wrote: > > > Hi, > > > > In our large HBase cluster based on CDH 5.5 in AWS, we're constantly &

Re: Slow sync cost

2016-04-26 Thread Saad Mufti
as the > 250ms default chosen with SSDs and 10ge in mind or something? I guess I'm > surprised a sync write several times through JVMs to 2 remote datanodes > would be expected to consistently happen that fast. > > Regards, > > On Mon, Apr 25, 2016 at 12:18 PM, Saad Mufti wrote

Re: Slow sync cost

2016-04-26 Thread Saad Mufti
our CDH5 HBase clusters too. We > eventually correlated it very closely to GC pauses. Through heavily tuning > our GC we were able to drastically reduce the logs, by keeping most GC's > under 100ms. > > On Tue, Apr 26, 2016 at 6:25 AM Saad Mufti wrote: > > > From

HBase Write Performance Under Auto-Split

2016-04-27 Thread Saad Mufti
Hi, Does anyone have experience with HBase write performance under auto-split conditions? Out keyspace is randomized so all regions roughly start auto-splitting around the same time, although early on when we had the 1024 regions we started with, they all decided to do so within an hour or so and

Re: Slow sync cost

2016-04-27 Thread Saad Mufti
gt; We will also have a blog post coming out in the next week or so that talks > specifically to tuning G1GC for HBase. I can update this thread when that's > available. > > On Tue, Apr 26, 2016 at 8:08 PM Saad Mufti wrote: > > > That is interesting. Would it be possible

Re: Slow sync cost

2016-04-27 Thread Saad Mufti
gt; > http://product.hubspot.com/blog/g1gc-fundamentals-lessons-from-taming-garbage-collection > > > > > > We will also have a blog post coming out in the next week or so that > > talks > > > specifically to tuning G1GC for HBase. I can update this thread

Re: HBase Write Performance Under Auto-Split

2016-04-27 Thread Saad Mufti
. > > -Vlad > > On Wed, Apr 27, 2016 at 8:27 AM, Saad Mufti wrote: > > > Hi, > > > > Does anyone have experience with HBase write performance under auto-split > > conditions? Out keyspace is randomized so all regions roughly start > > auto-splitting

Re: Slow sync cost

2016-04-27 Thread Saad Mufti
he system. For > instance if they started spamming a lot of too large requests, or badly > filtered scans, etc. In the detention queue, they use their own RPC > handlers which we can aggressively limit or reject if need be to preserve > the cluster. > > Hope this helps > > On We

Re: Major Compaction Strategy

2016-04-29 Thread Saad Mufti
e. Thanks. Saad On Wed, Apr 20, 2016 at 1:19 PM, Saad Mufti wrote: > Thanks for the pointer. Working like a charm. > > > Saad > > > On Tue, Apr 19, 2016 at 4:01 PM, Ted Yu wrote: > >> Please use the following method of HBaseAdmin: >> >> public Co

Re: Major Compaction Strategy

2016-04-29 Thread Saad Mufti
is a scary thing to do, > but tell region servers to do MC. > > We have it running in our cluster for about 10 hours a day and it has > virtually no impact to applications and the cluster is doing far better > than when using default scheduled MC. > > > -Original Message

Re: Major Compaction Strategy

2016-04-29 Thread Saad Mufti
ion is > considered idle. > > -----Original Message- > From: Saad Mufti [mailto:saad.mu...@gmail.com] > Sent: Friday, April 29, 2016 5:37 PM > To: user@hbase.apache.org > Subject: Re: Major Compaction Strategy > > Unfortunately all our tables and regions are active 24/7. Tra

Cell Level TTL And hfile.format.version

2016-05-06 Thread Saad Mufti
HI, We're running a CDH 5.5.2 HBase cluster (HBase Version 1.0.0-cdh5.5.2, revision=Unknown). We are using the per-cell TTL feature (Mutation.setTTL) As I learn more about and read up on HBase, I realized that in our HBase config hfile.format.version was set to 2 (the default, we haven't touche

Re: HBase number of columns

2016-06-16 Thread Saad Mufti
There is no real column schema in HBase other than defining the column family, each write to a column writes a cell with the column name plus value, so in theory number of columns doesn't really matter. What matters is how much data you read and write. That said there are settings in the column fa

Does Replication Affect Write Performance On The Main Cluster

2016-07-15 Thread Saad Mufti
Hi, Don't have anything conclusive but I have seen some correlation where in very high write rate situation, the write rate can increase when major compaction or some other high CPU/network activity (for example we run some Spark jobs on our replica HBase cluster) stops happening on the replica cl

Hot Region Server With No Hot Region

2016-12-01 Thread Saad Mufti
Hi, We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid hotspotting due to inadvertent data patterns by prepending an MD5 based 4 digit hash prefix to all our data keys. This works fine most of the times, but more and more (as much as once or twice a day) recently we have occas

Re: Hot Region Server With No Hot Region

2016-12-01 Thread Saad Mufti
ach > > > > > On Dec 1, 2016, at 1:50 PM, Saad Mufti wrote: > > > > Hi, > > > > We are using HBase 1.0 on CDH 5.5.2 . We have taken great care to avoid > > hotspotting due to inadvertent data patterns by prepending an MD5 based 4 > > digit hash pre

Re: Using Hbase as a transactional table

2016-12-01 Thread Saad Mufti
FWIW, in my company (AOL) we discovered a small elegant all client side transaction library on top of HBase, originally written by a Korea based team, called Haeinsa. It doesn't look active anymore so we had to fork it and have done a couple of minor enhancements and one bugfix, but has been workin

Re: Hot Region Server With No Hot Region

2016-12-01 Thread Saad Mufti
t; their meta. > > Key word is supposed. We have seen meta hot spotting from time to time > and on different versions at Splice Machine. > > How confident are you in your hashing algorithm? > > Regards, > John Leach > > > > > On Dec 1, 2016, at 2:25 PM, S

Re: Hot Region Server With No Hot Region

2016-12-01 Thread Saad Mufti
happens > again ? > > Thanks > > > On Dec 2, 2016, at 4:48 AM, Saad Mufti wrote: > > > > We used a pre-split into 1024 regions at the start but we miscalculated > our > > data size, so there were still auto-splits storms at the beginning as > data > > size s

Re: Creating HBase table with presplits

2016-12-02 Thread Saad Mufti
One way to do this without knowing your data (still need some idea of size of keyspace) is to prepend a fixed numeric prefix from a suitable range based on a good hash like MD5. For example, let us say you can predict your data will fit in about 1024 regions. You can decide to prepend a prefix from

Re: Creating HBase table with presplits

2016-12-02 Thread Saad Mufti
Forgot to mention in above example you would presplit into 1024 regions, starting from "" to "1023" (start keys). Cheers. Saad On Fri, Dec 2, 2016 at 8:47 AM, Saad Mufti wrote: > One way to do this without knowing your data (still need some idea of size >

Re: Hot Region Server With No Hot Region

2016-12-02 Thread Saad Mufti
, Dec 1, 2016 at 6:08 PM, Saad Mufti wrote: > Sure will, the next time it happens. > > Thanks!!! > > > Saad > > > On Thu, Dec 1, 2016 at 5:01 PM, Ted Yu wrote: > >> From #2 in the initial email, the hbase:meta might not be the cause for >> the hots

Re: Hot Region Server With No Hot Region

2016-12-02 Thread Saad Mufti
un(CallRunner.java:107) > at org.apache.hadoop.hbase.ipc.RpcExecutor.consumerLoop( > RpcExecutor.java:130) > at org.apache.hadoop.hbase.ipc.RpcExecutor$1.run(RpcExecutor.java:107) > at java.lang.Thread.run(Thread.java:745) > > > Too many writers being blocked a

Re: Hot Region Server With No Hot Region

2016-12-03 Thread Saad Mufti
No. Saad On Fri, Dec 2, 2016 at 3:27 PM, Ted Yu wrote: > Some how I couldn't access the pastebin (I am in China now). > Did the region server showing hotspot host meta ? > Thanks > > On Friday, December 2, 2016 11:53 AM, Saad Mufti > wrote: > > &g

Re: Hot Region Server With No Hot Region

2016-12-13 Thread Saad Mufti
; > > I would check compaction, investigate throttling if it's causing high > CPU. > > > > On Sat, Dec 3, 2016 at 6:20 AM Saad Mufti wrote: > > > > > No. > > > > > > > > > Saad > > > > > > > > > On Fri, Dec 2,

Region Server Hotspot/CPU Problem

2017-03-01 Thread Saad Mufti
Hi, We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on HBase is heavy and a mix of reads and writes. For a few months we have had a problem where occasionally (once a day or more) one of the region servers starts consuming close to 100% CPU. This causes all the client thread pool

Re: Region Server Hotspot/CPU Problem

2017-03-01 Thread Saad Mufti
-tuning-tips.html > > > > Sent from my iPhone > > > On Mar 1, 2017, at 6:06 AM, Saad Mufti wrote: > > > > Hi, > > > > We are using HBase 1.0.0-cdh5.5.2 on AWS EC2 instances. The load on HBase > > is heavy and a mix of reads and writes. For a few mont

HBase 1.0 Per Put TTL Not Being Obeyed On Replication

2017-04-26 Thread Saad Mufti
Hi, I have a main HBase 1.x cluster and some of the tables are being replicated to a separate HBase cluster of the same version, and the table schemas are identical. The column family being used has TTL set to "FOREVER", but we do a per put TTL in every Put we issue on the main cluster. Data is b

Re: HBase 1.0 Per Put TTL Not Being Obeyed On Replication

2017-04-28 Thread Saad Mufti
rs? > > -Anoop- > > On Thu, Apr 27, 2017 at 2:08 AM, Saad Mufti wrote: > > Hi, > > > > I have a main HBase 1.x cluster and some of the tables are being > replicated > > to a separate HBase cluster of the same version, and the table schemas > are > > ide

Re: HBase 1.0 Per Put TTL Not Being Obeyed On Replication

2017-04-30 Thread Saad Mufti
x27;m not clear) to client side code. So how can I verify that a Cell in one cluster has the TTL tag whereas the same replicated C3ell in the next cluster does or doesn't? Thanks. Saad On Fri, Apr 28, 2017 at 1:06 PM, Saad Mufti wrote: > Thanks for the feedback, I have confirmed tha

Re: HBase 1.0 Per Put TTL Not Being Obeyed On Replication

2017-05-01 Thread Saad Mufti
t; be able to retrieve the tags back to client side and check > > -Anoop- > > On Mon, May 1, 2017 at 2:59 AM, Saad Mufti wrote: > > Is there any facility to check what tags are on a Cell from a client side > > program? I started writing some Java code to look at the tags

HBase Encryption - HDFS Vs HBase Level

2017-08-18 Thread Saad Mufti
Hi, I'm looking for some guidance as our security team is requiring us to implement encryption of our HBase data at rest and in motion. I'm reading the docs and doing research and the choice seems to be between doing it at the HBase level or the more general HDFS level. I am leaning towards HDFS

Re: HBase Encryption - HDFS Vs HBase Level

2017-08-18 Thread Saad Mufti
Thank you everyone for the feedback. It was very helpful. Cheers. --- Saad Mufti On Fri, Aug 18, 2017 at 3:20 PM, Andrew Purtell wrote: > The Hadoop KMS in 2.6 or 2.7 can be suitable for demos or prototypes but I > would advise against using it for more than that. Recent

Trying To Understand BucketCache Evictions In HBase 1.3.1

2018-02-18 Thread Saad Mufti
Hi, We have an HBase system running HBase 1.3.1 on an AWS EMR service. Our BucketCache is configured for 400 GB on a set of attached EBS disk volumes, with all column families marked for in-memory in their column family schemas using INMEMORY => 'true' (except for one column family we only ever wr

Re: Trying To Understand BucketCache Evictions In HBase 1.3.1

2018-02-18 Thread Saad Mufti
Sorry I meant BLOCKCACHE => 'false' on the one column family we don't want getting cached. Cheers. Saad On Sun, Feb 18, 2018 at 6:51 PM, Saad Mufti wrote: > Hi, > > We have an HBase system running HBase 1.3.1 on an AWS EMR service. Our > BucketCache is con

Re: Trying To Understand BucketCache Evictions In HBase 1.3.1

2018-02-19 Thread Saad Mufti
is zero but the > #evicted blocks are there. Those might be the blocks of the compacted > away files. Hope this helps you to understand what is going on. > > -Anoop- > > > On Mon, Feb 19, 2018 at 5:25 AM, Saad Mufti wrote: > > Sorry I meant BLOCKCACHE => &#

Bucket Cache Failure In HBase 1.3.1

2018-02-25 Thread Saad Mufti
HI, I am running an HBase 1.3.1 cluster on AWS EMR. The bucket cache is configured to use two attached EBS disks of 50 GB each and I provisioned the bucket cache to be a bit less than the total, at a total of 98 GB per instance to be on the safe side. My tables have column families set to prefetch

How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
Hi, We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a situation where sometimes a particular region gets into a situation where a lot of write requests to any row in that region timeout saying they failed to obtain a lock on a row in a region and eventually they experience

Re: Bucket Cache Failure In HBase 1.3.1

2018-02-28 Thread Saad Mufti
IOEngine"); > > disableCache(); > > Can you search in the region server log to see if the above occurred ? > > Was this server the only one with disabled cache ? > > Cheers > > On Sun, Feb 25, 2018 at 6:20 AM, Saad Mufti > wrote: > > > HI, &

Re: Bucket Cache Failure In HBase 1.3.1

2018-02-28 Thread Saad Mufti
ache, please check your IOEngine"); > > > > disableCache(); > > > > Can you search in the region server log to see if the above occurred ? > > > > Was this server the only one with disabled cache ? > > > > Cheers > > > > On Sun, F

Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
. Saad On Wed, Feb 28, 2018 at 9:31 PM, Saad Mufti wrote: > Hi, > > We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing a > situation where sometimes a particular region gets into a situation where a > lot of write requests to any row in that region time

Re: Bucket Cache Failure In HBase 1.3.1

2018-02-28 Thread Saad Mufti
patch is for hbase or some other component ? > > Thanks > > On Wed, Feb 28, 2018 at 6:33 PM, Saad Mufti wrote: > > > Thanks for the feedback, so you guys are right the bucket cache is > getting > > disabled due to too many I/O errors from the underlying files making u

Re: How Long Will HBase Hold A Row Write Lock?

2018-02-28 Thread Saad Mufti
there was correlation between this duration and timeout) ? > > Cheers > > On Wed, Feb 28, 2018 at 6:31 PM, Saad Mufti wrote: > > > Hi, > > > > We are running on Amazon EMR based HBase 1.4.0 . We are currently seeing > a > > situation where sometimes a particular r

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
tayed stable and eventually recovered, although it did suffer all those timeouts. Saad On Wed, Feb 28, 2018 at 10:18 PM, Saad Mufti wrote: > I'll paste a thread dump later, writing this from my phone :-) > > So the same issue has happened at different times for different region

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
or so and sometimes longer. Saad On Thu, Mar 1, 2018 at 7:54 AM, Saad Mufti wrote: > Unfortunately I lost the stack trace overnight. But it does seem related > to compaction, because now that the compaction tool is done, I don't see > the issue anymore. I will run our incr

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-01 Thread Saad Mufti
to recover faster. I haven't quite tested that yet, any advice in the meantime would be appreciated. Cheers. Saad On Thu, Mar 1, 2018 at 9:21 AM, Saad Mufti wrote: > Actually it happened again while some minior compactions were running, so > don't think it related to our maj

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
could try to see how things work when a read happens from S3 and after > the > > prefetch completes ensure the same checkandPut() is done (from cache this > > time) to really know the difference what S3 does there. > > > > Regards > > Ram > > > > On Fri

TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
Hi, I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is no Hbase installed on the cluster, only HBase libs linked to my Spark app. We are reading the snapshot info from a HBase folder in S3 using TableSnapshotInputFormat class from HBase 1.4.0 to have the Spark job read snaps

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
, it looks like this setting also has the good effect of preventing clients from hammering a region server that is slow because its IPC queues are backed up, allowing it to recover faster. Does that make sense? Cheers. Saad On Sat, Mar 10, 2018 at 7:04 PM, Saad Mufti wrote: > So i

Re: HBase failed on local exception and failed servers list.

2018-03-10 Thread Saad Mufti
Are you using AuthUtil class to reauthenticate? This class is in Hbase, and uses the Hadoop class UserGroupInformation to do the actual login and re-login. But, if your UserGroupInformation class is from Hadoop 2.5.1 or earlier, it has a bug if you are using Java 8, as most of us are. The relogin c

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-10 Thread Saad Mufti
, Mar 10, 2018 at 8:04 PM, Saad Mufti wrote: > Also, for now we have mitigated this problem by using the new setting in > HBase 1.4.0 that prevents one slow region server from blocking all client > requests. Of course it causes some timeouts but our overall ecosystem > contains Kafk

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
See below more I found on item 3. Cheers. Saad On Sat, Mar 10, 2018 at 7:17 PM, Saad Mufti wrote: > Hi, > > I am running a Spark job (Spark 2.2.1) on an EMR cluster in AWS. There is > no Hbase installed on the cluster, only HBase libs linked to my Spark app. > We are readi

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-10 Thread Saad Mufti
r the Spark job. Saad On Sat, Mar 10, 2018 at 9:51 PM, Saad Mufti wrote: > See below more I found on item 3. > > Cheers. > > > Saad > > On Sat, Mar 10, 2018 at 7:17 PM, Saad Mufti wrote: > >> Hi, >> >> I am running a Spark job (Spark 2.2

Re: How Long Will HBase Hold A Row Write Lock?

2018-03-11 Thread Saad Mufti
imes the > compacted result file could be so large (what is major compaction) and > that will exhaust the BC if written. Also it might contain some data > which are very old. There is a jira recently raised jira which > discuss abt this. Pls see HBASE-20045 > > > -Anoop- > &

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-12 Thread Saad Mufti
lling to look at my patch? I have never done this before, so would appreciate a quick pointer on how to send a patch and get some quick feedback. Cheers. Saad On Sat, Mar 10, 2018 at 9:56 PM, Saad Mufti wrote: > The question remain though of why it is even accessing a column family&#

Re: TableSnapshotInputFormat Behavior In HBase 1.4.0

2018-03-12 Thread Saad Mufti
gly. > > Thanks > > On Mon, Mar 12, 2018 at 8:43 AM, Saad Mufti wrote: > > > I have create a company specific branch and added 4 new flags to control > > this behavior, these gave us a huge performance boost when running Spark > > jobs on snapshots of very large ta

Re: Scan problem

2018-03-19 Thread Saad Mufti
Another option if you have enough disk space/off heap memory space is to enable bucket cache to cache even more of your data, and set the PREFETCH_ON_OPEN => true option on the column families you want always cache. That way HBase will prefetch your data into the bucket cache and your scan won't ha

CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Saad Mufti
Hi, We are running on HBase 1.4.0 on an AWS EMR/HBase cluster. We have started seeing the following stacktrace when trying to take a snapshot of a table with a very large number of files (12000 regions and roughly 36 - 40 files). The number of files should go down as we haven't been compa

Re: CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Saad Mufti
ured as 64MB or 128MB. > > Regards, > > Huaxiang Sun > > > > On Mar 19, 2018, at 10:16 AM, Saad Mufti wrote: > > > > Hi, > > > > We are running on HBase 1.4.0 on an AWS EMR/HBase cluster. > > > > We have started seeing the following stacktrac

Re: CorruptedSnapshotException Taking Snapshot Of Table With Large Number Of Files

2018-03-19 Thread Saad Mufti
gt; > > On Mar 19, 2018, at 10:52 AM, Saad Mufti wrote: > > > > Thanks!!! Wish that was documented somewhere in the manual. > > > > Cheers. > > > > > > Saad > > > > > > On Mon, Mar 19, 2018 at 1:38 PM, Huaxiang Sun wrote: &g

Balance Regions Faster

2018-03-20 Thread Saad Mufti
Hi, We are using the stochastic load balancer, and have tuned it to do a maximum of 1% of regions in any calculation. But it is way too conservative after that, it moves one region at a time. Is there a way to tell it to go faster with whatever number of regions it decided to do? I have been looki

Should Taking A Snapshot Work Even If Balancer Is Moving A Few Regions Around?

2018-03-20 Thread Saad Mufti
Hi, We are using HBase 1.4.0 on AWS EMR based Hbase. Since snapshots are in S3, they take much longer than when using local disk. We have a cron script to take regular snapshots as backup, and they fail quite often on our largest table which takes close to an hour to complete the snapshot. The on

Re: Balance Regions Faster

2018-03-20 Thread Saad Mufti
computing new load balance > plan. Computation took 1200227ms to try 2254 different iterations. Found > a solution that moves 550 regions; Going from a computed cost of > 77.52829271038965 to a new cost of 74.32764924425548 > > > > If you have a dev cluster, you can try diffe

Anyone Have A Workaround For HBASE-19681?

2018-03-23 Thread Saad Mufti
We are facing the exact same symptoms in HBase 1.4.0 running on AWS EMR based cluster, and desperately need to take a snapshot to feed a downstream job. So far we have tried using the "assign" command on all regions involved to move them around but the snapshot still fails. Also saw the same error

Re: Should Taking A Snapshot Work Even If Balancer Is Moving A Few Regions Around?

2018-03-23 Thread Saad Mufti
is available in 1.4 > > -Vlad > > On Tue, Mar 20, 2018 at 8:00 PM, Saad Mufti wrote: > > > Hi, > > > > We are using HBase 1.4.0 on AWS EMR based Hbase. Since snapshots are in > S3, > > they take much longer than when using local disk. We have a cron script

Re: Should Taking A Snapshot Work Even If Balancer Is Moving A Few Regions Around?

2018-03-23 Thread Saad Mufti
r,region merging and split before snapshot should help. > > This works in 2.0 > > > > Not sure if merge/split switch is available in 1.4 > > > > -Vlad > > > > On Tue, Mar 20, 2018 at 8:00 PM, Saad Mufti > wrote: > > > > > Hi, > > > >

Re: Anyone Have A Workaround For HBASE-19681?

2018-03-26 Thread Saad Mufti
Restarting the region server worked for us to recover from this error. Saad On Fri, Mar 23, 2018 at 7:19 PM, Saad Mufti wrote: > We are facing the exact same symptoms in HBase 1.4.0 running on AWS EMR > based cluster, and desperately need to take a snapshot to feed a downstream >

HBase Replication Between Two Secure Clusters With Different Kerberos KDC's

2018-05-22 Thread Saad Mufti
Hi, Here is my scenario, I have two secure/authenticated EMR based HBase clusters, both have their own cluster dedicated KDC (using EMR support for this which means we get Kerberos support by just turning on a config flag). Now we want to get replication going between them. For other application

Re: Spark UNEVENLY distributing data

2018-05-22 Thread Saad Mufti
I think TableInputFormat will try to maintain as much locality as possible, assigning one Spark partition per region and trying to assign that partition to a YARN container/executor on the same node (assuming you're using Spark over YARN). So the reason for the uneven distribution could be that you

Re: Re:Got Duplicate Records for the Same Row Key from a Snapshot

2018-05-22 Thread Saad Mufti
I am not clear how your snapshot even succeeds if this is the case. The snapshot taking procedure includes a check for consistency at the end and throws an exception on problems like this. I would run an hbck command on your table to check if there are any consistency errors. It also has repair op

Re: HBase Replication Between Two Secure Clusters With Different Kerberos KDC's

2018-05-23 Thread Saad Mufti
ros principal hbase/@PGS.dev when I ran the add_peer command Thanks for taking the time to help me in any way you can. Saad On Wed, May 23, 2018 at 7:24 AM, Reid Chan wrote: > Three places to check, > > > 1. Would you mind showing your "/etc/zookeeper/conf/server-jaa