That email was just informational. Below are the details on my cluster - let me
know if more is needed.
I have 2 hbase clusters setup
- for production, 6 node cluster, 32G, 8 processors
- for dev, 3 node cluster , 16GRAM , 4 processors
1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
successfully.
2. After that I loaded 2G+ files into HDFS and HBASE table.
An example Hbase table looks like this:
{NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS =>
'100', COM true
PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => '65536',
IN_MEMO
RY => 'false', BLOCKCACHE => 'true'}]}
3. I started stargate on one server and accessed Hbase for reading from another
3rd party application successfully.
It took 600 seconds on dev cluster and 250 on production to read .5M
records from Hbase via stargate.
4. later to boost read performance, it was suggested that upgrading to
Hbase0.20.6 will be helpful. I did that on production (w/o running the migrate
script) and re-started stargate and everything was running fine, though I did
not see a bump in performance.
5. Eventually, I had to move to dev cluster from production because of some
resource issues at our end. Dev cluster had 0.20.3 at this time. As I started
loading more files into Hbase (<10 versions of <1G files) and converting my app
to use hbase more heavily (via more stargate clients), the performance started
degrading. I decided it was time to upgrade dev cluster as well to 0.20.6. (I
did not run the migrate script here as well, I missed this step in the doc).
6. When Hbase 0.20.6 came back up on dev cluster (with increased block cache
(.6) and region server handler counts (75) ), pointing to the same rootdir, I
noticed that some tables were missing. I could see a mention of them in the
logs, but not when I did 'list' in the shell. I recovered those tables using
add_table.rb script.
a. Is there a way to check the health of all Hbase tables in the
cluster after an upgrade or even periodically, to make sure that everything is
healthy ?
b. I would like to be able to force this error again and check the
health of hbase and want it to report to me that some tables were lost.
Currently, I just found out because I had very less data and it was easy to
tell.
7. Here are the issues I face after this upgrade
a. when I run stop-hbase.sh, it does not stop my regionservers on
other boxes.
b. It does start them using start-hbase.sh.
c. Is it that stopping regionservers is not reported, but it does stop
them (I see that happening on production cluster) ?
8. I started stargate in the upgraded 0.20.6 in dev cluster
a. earlier when I sent a URL to look for a data row that did not exist,
the return value was NULL , now I get an xml stating HTTP error 404/405.
Everything works as expected for an existing data row.
b. and this works okay on the production cluster after upgrade, it's
the dev cluster that gives this error.
c. examples :
On production cluster:
:~ hadoop$curl http://localhost:8080/version
Stargate 0.0.1 [JVM: Sun Microsystems Inc.
1.6.0_20-16.3-b01] [OS: SunOS 5.10 x86] [Server: jetty/6.1.14] [Jersey:
1.1.0-ea]
:~ hadoop$curl http://localhost:8080/verison
:~ hadoop$curl http://localhost:8080/version/cluster
0.20.6
On dev cluster:
:~ hadoop$curl http://localhost:8080/version
Stargate 1.0 [JVM: Sun Microsystems Inc.
1.6.0_20-16.3-b01] [OS: SunOS 5.10 x86] [Server: jetty/6.1.14] [Jersey: 1.1.5.1]
:~ hadoop$curl http://localhost:8080/verison
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 405 METHOD_NOT_ALLOWED</title>
</head>
<body><h2>HTTP ERROR: 405</h2><pre>METHOD_NOT_ALLOWED</pre>
<p>RequestURI=/verson</p><p><i><small><a
href="http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
</body>
</html>
9. Therefore, I thought I should try downgrading to 0.20.3, basically start
hbase from that old dir I still have on the dev cluster since stargate was
working as desired before the upgrade. I changed all my classpaths to point to
the old dir and restarted hbase and stargate from hbase-0.20.3 dir.
a. but I think that doesn't really work. It recognizes 0.20.6
somehow... since my hbase shell kept pointing to 0.20.6 and
also stargate URL "curl http://localhost:8080/version/cluster"
reports 0.20.6
b. I am not sure if there is any such thing as downgrading hbase.
10. Now I started pointing back to 0.20.6 ( running everything out of here). I
still get the same http error as above.
Below is another error ... HTTP 404 this time with 0.20.6
hadoop$curl http://localhost:8080/<table_name>/75
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/>
<title>Error 404 NOT_FOUND</title>
</head>
<body><h2>HTTP ERROR: 404</h2><pre>NOT_FOUND</pre>
<p>RequestURI=/VRS/75</p><p><i><small><a
href="http://jetty.mortbay.org/">Powered by Jetty://</a></small></i></p><br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
<br/>
</body>
</html>
That was a long email. Please let me know if futher clarifications are needed.
Thank you,
-Avani
-----Original Message-----
From: [email protected] [mailto:[email protected]] On Behalf Of Stack
Sent: Tuesday, August 31, 2010 12:24 PM
To: [email protected]
Subject: Re: HBase table lost on upgrade
On Tue, Aug 31, 2010 at 12:14 PM, Sharma, Avani <[email protected]> wrote:
> Thanks, Stack. Well, I was able to get the basic hbase cluster to run, but
> now that I am trying to boost read performance, I am running into stuff that
> is either not working or I cannot easily find solutions to on the net.
>
This mail that you've just written above gives us nothing to go on.
You want to boost read performance saying nothing about what current
performance, datasize, hardware, nor schema looks like.
St.Ack