Re: HBase region server failure issues

2014-04-16 Thread Claudiu Soroiu
Thanks for the hints. I will take a look and explore the idea. Claudiu On Tue, Apr 15, 2014 at 1:43 PM, Claudiu Soroiu wrote: > First of all, thanks for the clarifications. > > **how about 300 regions with 3x replication? Or 1000 regions? This > is going to be 3000 files. on HDFS. per one RS.**

Re: HBase region server failure issues

2014-04-16 Thread Jonathan Hsieh
On Tue, Apr 15, 2014 at 1:43 PM, Claudiu Soroiu wrote: > First of all, thanks for the clarifications. > > **how about 300 regions with 3x replication? Or 1000 regions? This > is going to be 3000 files. on HDFS. per one RS.** > > Now i see that the trade-off is how to reduce the recovery time wit

Re: HBase region server failure issues

2014-04-16 Thread Nicolas Liochon
What you described seems to be the favored nodes feature, but there are still some open (and stale...) jiras there: HBASE-9116 and cie. You may also want to look at the hbase.master.distributed.log.replay option, as is allows writes during recovery. And for the client there is hbase.status.publishe

Re: HBase region server failure issues

2014-04-15 Thread Claudiu Soroiu
Yes, overall the second WAL would contain the same data but differently distributed. A server will have in the second WAL data from the regions that it will take over if they fail. It is just an idea as it might not be good to duplicate the data across the cluster. On Wed, Apr 16, 2014 at 12:36

Re: HBase region server failure issues

2014-04-15 Thread Ted Yu
Would the second WAL contain the same contents as the first ? We already have the code that adds interceptor on the calls to the namenode#getBlockLocations so that blocks on the same DN as the dead RS are placed at the end of the priority queue.. See addLocationsOrderInterceptor() in hbase-server/

Re: HBase region server failure issues

2014-04-15 Thread Claudiu Soroiu
First of all, thanks for the clarifications. **how about 300 regions with 3x replication? Or 1000 regions? This is going to be 3000 files. on HDFS. per one RS.** Now i see that the trade-off is how to reduce the recovery time without affecting the overall performance of the cluster. Having too m

Re: HBase region server failure issues

2014-04-15 Thread Vladimir Rodionov
*We also had a global HDFS file limit to contend with* Yes, we have been seeing this from time to time in our production clusters. Periodic purging of old files helps, but the issue is obvious. -Vladimir Rodionov On Tue, Apr 15, 2014 at 11:58 AM, Stack wrote: > On Mon, Apr 14, 2014 at 1:47 PM

Re: HBase region server failure issues

2014-04-15 Thread Stack
On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu wrote: > After some tunning I managed to > reduce it to 8 seconds in total and for the moment it fits the needs. > What did you do Claudiu to get the time down? Thanks, St.Ack

Re: HBase region server failure issues

2014-04-15 Thread Kevin O'dell
Andrew, I agree, there is definitely a chance HDFS doesn't have an extra 3GB of NN heap to squeak out for HBase. It would be interesting to check in with the Flurry guys and see what their NN pressure looks like. As clusters become more multi-tenant HDFS pressure could become a real concern.

Re: HBase region server failure issues

2014-04-15 Thread Andrew Purtell
You'd probably know better than I Kevin but I'd worry about the 1000*1000*32 case, where HDFS is as (over)committed as the HBase tier. On Tue, Apr 15, 2014 at 9:26 AM, Kevin O'dell wrote: > In general I have never seen nor heard of Federated Namespaces in the wild, > so I would be hesitant to go

Re: HBase region server failure issues

2014-04-15 Thread Kevin O'dell
In general I have never seen nor heard of Federated Namespaces in the wild, so I would be hesitant to go down that path. But you know for "Science" I would be interested in seeing how that worked out. Would we be looking at 32 WALs per region? At a large cluster with 1000nodes, 100 regions per n

Re: HBase region server failure issues

2014-04-15 Thread Andrew Purtell
# of WALs as roughly spindles / replication factor seems intuitive. Would be interesting to benchmark. As for one WAL per region, the BigTable paper IIRC says they didn't because of concerns about the number of seeks in the filesystems underlying GFS and because it would reduce the effectiveness o

Re: HBase region server failure issues

2014-04-15 Thread Jonathan Hsieh
Thanks for catching that -- it was a typo -- one wal per region. On Tue, Apr 15, 2014 at 8:21 AM, Ted Yu wrote: > bq. In the case of an SSD world, it makes more sense to have one wal per > node > > Was there a typo in the sentence above (one wal per node) ? > > Cheers > > > On Tue, Apr 15, 2014

Re: HBase region server failure issues

2014-04-15 Thread Ted Yu
bq. In the case of an SSD world, it makes more sense to have one wal per node Was there a typo in the sentence above (one wal per node) ? Cheers On Tue, Apr 15, 2014 at 7:11 AM, Jonathan Hsieh wrote: > It makes sense to have as many wals as # of spindles / replication factor > per machine. T

Re: HBase region server failure issues

2014-04-15 Thread Jonathan Hsieh
It makes sense to have as many wals as # of spindles / replication factor per machine. This should be decoupled from the number of regions on a region server. So for a cluster with 12 spindles we should likely have at least 4 wals (12 spindles / 3 replication factor), and need to do experiments t

Re: HBase region server failure issues

2014-04-14 Thread Vladimir Rodionov
Todd, how about 300 regions with 3x replication? Or 1000 regions? This is going to be 3000 files. on HDFS. per one RS. When I said that it does not scale, I meant that exactly that.

Re: HBase region server failure issues

2014-04-14 Thread Todd Lipcon
On Mon, Apr 14, 2014 at 6:32 PM, Vladimir Rodionov wrote: > *On the other hand, 95% of HBase users don't actually configure HDFS to > fsync() every edit. Given that, the random writes aren't actually going to > cause one seek per write -- they'll get buffered up and written back > periodically in

Re: HBase region server failure issues

2014-04-14 Thread Vladimir Rodionov
*On the other hand, 95% of HBase users don't actually configure HDFS to fsync() every edit. Given that, the random writes aren't actually going to cause one seek per write -- they'll get buffered up and written back periodically in a much more efficient fashion.* Todd, this is in theory. Reality i

Re: HBase region server failure issues

2014-04-14 Thread Todd Lipcon
On the other hand, 95% of HBase users don't actually configure HDFS to fsync() every edit. Given that, the random writes aren't actually going to cause one seek per write -- they'll get buffered up and written back periodically in a much more efficient fashion. Plus, in some small number of years,

Re: HBase region server failure issues

2014-04-14 Thread Vladimir Rodionov
I do not think its a good idea to have one WAL file per region. All WAL file idea is based on assumption that writing data sequentially reduces average latency and increases total throughput. This is no longer a case in a one WAL file per region approach, you may have hundreds active regions per R

Re: HBase region server failure issues

2014-04-14 Thread Ted Yu
There is on-going effort to address this issue. See the following: HBASE-8610 Introduce interfaces to support MultiWAL HBASE-10378 Divide HLog interface into User and Implementor specific interfaces Cheers On Mon, Apr 14, 2014 at 1:47 PM, Claudiu Soroiu wrote: > Hi all, > > My name is Claudiu

HBase region server failure issues

2014-04-14 Thread Claudiu Soroiu
Hi all, My name is Claudiu Soroiu and I am new to hbase/hadoop but not new to distributed computing in FT/HA environments and I see there are a lot of issues reported related to the region server failure. The main problem I see it is related to recovery time in case of a node failure and distribu