Thanks Enis & Ted! A few more questions inline. On Wed, Jan 21, 2015 at 9:53 PM, Enis Söztutar <[email protected]> wrote:
> Online in this context is HBase cluster being online, not individual > regions. For the merge process, the regions go briefly offline similar to > how splits work. It should be on the order of seconds. > Hm, but how could it be so quick? Aren't regions first offlined and then one of them is *moved*? Or maybe data is not actually sent over the network? But if 2 regions are being merged, doesn't that mean that a completely new region needs to be written (over the network, to disk, and then HDFS replication also needs to take place). If regions are a few GB in size, can that really be done in a matter of seconds? What happens to the (in flight) writes or reads going to the regions that are being merged? Thanks, Otis -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/ > On Wed, Jan 21, 2015 at 10:26 AM, Ted Yu <[email protected]> wrote: > > > Please take a look at slides 5 and 6 in this file: > > > > > https://issues.apache.org/jira/secure/attachment/12561887/merge%20region.pdf > > > > It is clear that the two regions to be merged are taken offline in step > 1. > > > > Cheers > > > > On Tue, Jan 20, 2015 at 5:26 PM, Otis Gospodnetic < > > [email protected]> wrote: > > > > > Hi, > > > > > > Considering this is called the *online* region merge, I would assume > > > regions being merged never go offline during the merge and both regions > > > being merged are available for reading and writing at all times, even > > > during the merge.... though I don't get how writes would work if one > > region > > > is being moved from one RS to another.... so maybe this is not truly > > online > > > and writes are either rejected or buffered/blocked until the region is > > > moved AND merged? Anyone knows for sure? > > > > > > I see this in one of the comments: > > > Q: If one (or both) of the regions were receiving non-trivial load > prior > > to > > > this action, would client(s) be affected ? > > > A: Yes, region would be off services in a short time, it is equal with > > > moving region, e.g balance a region > > > > > > Also took a look at the patch: > > > > > > > > > https://issues.apache.org/jira/secure/attachment/12574965/hbase-7403-trunkv33.patch > > > > > > And see: > > > > > > + /** > > > + * The merging region A has been taken out of the server's online > > > regions list. > > > + */ > > > + OFFLINED_REGION_A, > > > > > > > > > ... and if you look for the word "offline" in the patch I think it's > > > pretty clear that BOTH regions being merged do go offline at some > > > point. I guess it could be after the merge, too, not before.... > > > > > > ... maybe others know? > > > > > > > > > Thanks, > > > Otis > > > -- > > > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > > > Solr & Elasticsearch Support * http://sematext.com/ > > > > > > > > > On Mon, Jan 19, 2015 at 4:17 AM, Vladimir Tretyakov < > > > [email protected]> wrote: > > > > > > > Hi, I have one question about 'online region merge' ( > > > > https://issues.apache.org/jira/browse/HBASE-7403). > > > > How I've understood regions which will be passed to merge method will > > be > > > > unavailable for some time. > > > > > > > > That means: > > > > 1. Some data will be unavailable some time. > > > > 2. If client will try to write data to these regions it will get > > > > exceptions. > > > > > > > > Are above sentences correct? > > > > > > > > Somebody can estimate time which 1 and 2 will be true? Seconds, > minutes > > > or > > > > hours? Is there any way to avoid 1 and 2? > > > > > > > > I am asking because now we have problem during time with number of > > > regions > > > > (our key contains timestamp), count of regions growing constantly > > > > (splitting) and it become a cause of performance problem with time. > > > > For avoiding this effect we use 2 tables: > > > > 1. First table we use for writing and reading data. > > > > 2. Second we use only for reading data. > > > > > > > > After some time we truncate second table and rotate these tables > (first > > > > become second and second become first). That allow us control count > of > > > > regions, but solution looks a bit ugly, I looked at 'online region > > > merge', > > > > but we can't live with restrictions I've described in first part of > > > > question. > > > > > > > > Can somebody help with answers? > > > > > > > > Thx, Vladimir Tretyakov. > > > > > > > > > >
