pre-spliting or not, that's the question

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Hello, I am still in my first steps with HBase, I was used to use Cassandra a while ago. For several years, I was used to think trying to store data in Cassandra ordered among nodes was something evil, as it's OrderedPartitioner is something not supported and not recommended in production.

Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread sridhararao mutluri
Hi Team, I am trying to use hbase 0.98 distributed mode with zk 3.4.6 hadoop ha 2.6.(JDK 1.8) I am having following issue and little help in google pages also I tried to start zk first after clearing zk data dir and tried to start master first and rs later and no luck I used mycluster/hbase in

write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
If I have an application that writes to a HBase cluster, can I count that the cluster will always available to receive writes? I might not be able to read if a region server which handles a range of keys is down, but will I be able to keep writing to other nodes, so everything get in sync when

Re: write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Thanks Serega, it helps me to understand the differences. From: user@hbase.apache.org Subject: Re: write availability If I have an application that writes to a HBase cluster, can I count that the cluster will always available to receive writes? No, it's CP, not AP system. so everything get in

Re: write availability

2015-04-07 Thread Michael Segel
I don’t know if I would say that… I read Marcelo’s question of “if the cluster is up, even though a RS may be down, can I still insert records in to HBase?” So if the cluster is up, then you can insert records in to HBase even though you lost a RS that was handing a specific region. But

Re: write availability

2015-04-07 Thread Serega Sheypak
If I have an application that writes to a HBase cluster, can I count that the cluster will always available to receive writes? No, it's CP, not AP system. so everything get in sync when the other nodes get up again There is no hinted backoff, It's not Cassandra. 2015-04-07 14:48 GMT+02:00

RE: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread sridhararao mutluri
Hi, This is my hbase-site.xml: configuration propertynamehbase.master/name valuehdfs://cluster1:6/value /property property namehbase.rootdir/namevaluehdfs://mycluster/hbase/value /property property

RE: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread sridhararao mutluri
Team, The port the HBase Master should bind to 6 Thanks,Sridhar From: serega.shey...@gmail.com Date: Tue, 7 Apr 2015 16:40:54 +0200 Subject: Re: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase To: user@hbase.apache.org CC: bus...@cloudera.com property

Re: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread Serega Sheypak
property namehbase.master/name valuehdfs://cluster1:6/value /property what is it? 2015-04-07 16:34 GMT+02:00 sridhararao mutluri drm...@hotmail.com: Hi, This is my hbase-site.xml: configuration propertynamehbase.master/name valuehdfs://cluster1:6/value

Re: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread Ted Yu
bq. propertynamehbase.rootdir/name valuehdfs://mycluster/hbase/value /property property Looks like there is a property missing at the end of the line. You showed snippet from shell output. Have you checked master log ? Cheers On Tue, Apr 7, 2015 at 5:16 AM, sridhararao

Re: write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Wellington, I might be misinterpreting this: http://stackoverflow.com/questions/13741946/role-of-datanode-regionserver-in-hbase-hadoop-integration But aren't HBase region servers and HDFS datanodes always in the same server? With a replication factor of 3, what happens if all 3 datanodes

Re: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread Esteban Gutierrez
Sridhar, What do you see in the HBase Master logs? The exception you are getting from the HBase Master is just a side effect and not the real cause? Is it possible for you to upload the HBase Master logs to a site like pastebin.com or gist.github.com so we can look at? cheers, esteban. --

Re: write availability

2015-04-07 Thread Serega Sheypak
Marcelo, if you are comparing with Cassandra: 1. don't think about data replication/redundancy. It's out of HBase scope. C* thinks about it, HBase doesn't HBase uses HDFS. So assume you never-ever can lost the data. Assume, that HDFS configured properly. 2. HBase doesn't think in terms of

Re: write availability

2015-04-07 Thread Serega Sheypak
But aren't HBase region servers and HDFS datanodes always in the same server? It's good point, but it's not mandatory. With a replication factor of 3, what happens if all 3 datanodes hosting that information go down and one of them come back, but with the disk intact? Should be OK. you have 3

Re: write availability

2015-04-07 Thread Esteban Gutierrez
Hello Marcelo, HBase has strong durability guarantees to avoid data loss. When a write arrives to a RegionServer data will be persisted into a Write-Ahead-Log (on HDFS) and temporarily in the RegionServer memory until the data from this memory store is flushed (also to HDFS). For the point of

Re: write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
So if a RS goes down, it's assumed you lost the data on it, right? HBase has replications on HDFS, so if a RS goes down it doesn't mean I lost all the data, as I could have the replicas yet... But what happens if all RS hosting a specific region goes down? What if one RS from this one comes

Rowkey design question

2015-04-07 Thread Kristoffer Sjögren
Hi I have a row with around 100.000 qualifiers with mostly small values around 1-5KB and maybe 5 largers ones around 1-5 MB. A coprocessor do random access of 1-10 qualifiers per row. I would like to understand how HBase loads the data into memory. Will the entire row be loaded or only the

Re: write availability

2015-04-07 Thread Nick Dimiduk
Hi Marcelo, As you well know, HBase partitions your data set into row key ranges -- regions. Each region is assigned to a single region server, which is the sole responsible host** for the availability of that region. When a region is offline, for whatever reason, it is not available for

Re: write availability

2015-04-07 Thread Wellington Chevreuil
When a RS goes down, the Master will try to assign the regions on the remaining RSes. When the RS comes back, after a while, the Master balancer process will re-distribute regions between RS, so the given RS will be hosting regions, but not necessarily the one it used to host before it went

Re: write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
So if the cluster is up, then you can insert records in to HBase even though you lost a RS that was handing a specific region. What happens when the RS goes down? Writes to that region will be written to another region server? Another RS assumes the region range while the RS is down? What

Re: write availability

2015-04-07 Thread Andrew Purtell
Sorry, there is something I asked wrongly because I was understanding it wrongly. 1 region server correspond to 1 namenode and 1 write to 1 name node will replicate to 3 datanodes... No, but this may just be a terminology problem. The NameNode isn't an HBase daemon, it's HDFS. HDFS writers,

Re: Rowkey design question

2015-04-07 Thread Imants Cekusins
how HBase loads the data into memory. If you init Get and specify columns with addColumn, it is likely that only data for these columns is read and loaded in memory. Rowkey is best kept short. So are column qualifiers.

Re: write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Sorry, there is something I asked wrongly because I was understanding it wrongly. 1 region server correspond to 1 namenode and 1 write to 1 name node will replicate to 3 datanodes... So to simplify the second question, what happens to the HBase cluster when 1 region server is down? -Marcelo

Re: write availability

2015-04-07 Thread Esteban Gutierrez
-- Cloudera, Inc. On Tue, Apr 7, 2015 at 10:36 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: Sorry, there is something I asked wrongly because I was understanding it wrongly. 1 region server correspond to 1 namenode and 1 write to 1 name node will replicate to 3

Re: write availability

2015-04-07 Thread Marcelo Valle (BLOOMBERG/ LONDON)
Esteban, If I understood correctly what you said: For the failure mode you mention if all DNs go down (not the NN) clients will be blocked waiting for the acknowledge of a write to the DNs and after few retries the RS will consider there was a failure writing to the WAL, the RS will

Re: write availability

2015-04-07 Thread Esteban Gutierrez
Hello Marcelo, On Tue, Apr 7, 2015 at 10:16 AM, Marcelo Valle (BLOOMBERG/ LONDON) mvallemil...@bloomberg.net wrote: Esteban, If I understood correctly what you said: For the failure mode you mention if all DNs go down (not the NN) clients will be blocked waiting for the acknowledge of a

Re: Rowkey design question

2015-04-07 Thread Kristoffer Sjögren
Sorry I should have explained my use case a bit more. Yes, it's a pretty big row and it's close to worst case. Normally there would be fewer qualifiers and the largest qualifiers would be smaller. The reason why these rows gets big is because they stores aggregated data in indexed compressed

Re: Rowkey design question

2015-04-07 Thread Michael Segel
Sorry, but your initial problem statement doesn’t seem to parse … Are you saying that you a single row with approximately 100,000 elements where each element is roughly 1-5KB in size and in addition there are ~5 elements which will be between one and five MB in size? And you then mention a

Re: Rowkey design question

2015-04-07 Thread Nick Dimiduk
Those rows are written out into HBase blocks on cell boundaries. Your column family has a BLOCK_SIZE attribute, which you may or may have no overridden the default of 64k. Cells are written into a block until is it = the target block size. So your single 500mb row will be broken down into

HBase region assignment by range?

2015-04-07 Thread Demai Ni
hi, folks, I have a question about region assignment and like to clarify some through. Let's say I have a table with rowkey as row0 ~ row3 on a 4 node hbase cluster, is there a way to keep data partitioned by range on each node? for example: node1: =row1 node2: row10001~row2

RE: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase

2015-04-07 Thread sridhararao mutluri
Hi Esteban, I pasted logs in github.com: Thanks,Sridhar From: este...@cloudera.com Date: Tue, 7 Apr 2015 08:44:55 -0700 Subject: Re: Hbase 0.98 Distributed Mode with hadoop 2.6 HA:Issues of Hbase To: user@hbase.apache.org Sridhar, What do you see in the HBase Master logs?

Re: Spinning up for 1.1 Release

2015-04-07 Thread Nick Dimiduk
Heya folks, We're down to a week remaining before my proposed branch date. A couple big-ticket items have made it in since my last mail (HBASE-12972, HBASE-12975, HBASE-11598, HBASE-13170). However, we still have about 70 unresolved issues marked for this release, including 3 blockers