Re: HBase table lost on upgrade

Sharma, Avani Wed, 08 Sep 2010 14:36:40 -0700

Ted,
I did look at that thread. It seems I need to modify the code in that file? 
Could you point me to the exact steps to get it and compile it?


Did you get through the issue if regions being added to catalog , but do not 
show up in master.jsp?




On Sep 4, 2010, at 9:24 PM, Ted Yu <[email protected]> wrote:

> The tool Stack mentioned is hbck. If you want to port it to 0.20, see email
> thread entitled:
> compiling HBaseFsck.java for 0.20.5You should try reducing the number of
> tables in your system, possibly through HBASE-2473
> 
> Cheers
> 
> On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <[email protected]> wrote:
> 
>> 
>> 
>> 
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Stack
>> Sent: Wednesday, September 01, 2010 10:45 PM
>> To: [email protected]
>> Subject: Re: HBase table lost on upgrade
>> 
>> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <[email protected]> wrote:
>>> That email was just informational. Below are the details on my cluster -
>> let me know if more is needed.
>>> 
>>> I have 2 hbase clusters setup
>>> -       for production, 6 node cluster,  32G, 8 processors
>>> -       for dev, 3 node cluster , 16GRAM , 4 processors
>>> 
>>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
>> successfully.
>> 
>> Why not latest stable version, 0.20.6?
>> 
>> This was couple of months ago.
>> 
>> 
>>> 2. After that I loaded 2G+ files into HDFS and HBASE table.
>> 
>> 
>> Whats this mean?  Each of the .5M cells was 2G in size or the total size
>> was 2G?
>> 
>> The total file size is 2G. Cells are of the order of hundreds of bytes.
>> 
>> 
>>>       An example Hbase table looks like this:
>>>               {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS =>
>> '100', COM true
>>>                PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
>> '65536', IN_MEMO
>>>                RY => 'false', BLOCKCACHE => 'true'}]}
>> 
>> That looks fine.
>> 
>>> 3. I started stargate on one server and accessed Hbase for reading from
>> another 3rd party application successfully.
>>>       It took 600 seconds on dev cluster and 250 on production to read
>> .5M records from Hbase via stargate.
>> 
>> 
>> That don't sound so good.
>> 
>> 
>> 
>>> 4. later to boost read performance, it was suggested that upgrading to
>> Hbase0.20.6 will be helpful. I did that on production (w/o running the
>> migrate script) and re-started stargate and everything was running fine,
>> though I did not see a bump in performance.
>>> 
>>> 5. Eventually, I had to move to dev cluster from production because of
>> some resource issues at our end. Dev cluster had 0.20.3 at this time. As I
>> started loading more files into Hbase (<10 versions of <1G files) and
>> converting my app to use hbase more heavily (via more stargate clients), the
>> performance started degrading. I decided it was time to upgrade dev cluster
>> as well to 0.20.6.  (I did not run the migrate script here as well, I missed
>> this step in the doc).
>>> 
>> 
>> What kinda perf you looking for from REST?
>> 
>> Do you have to use REST?  All is base64'd so its safe to transport.
>> 
>> I also have the Java Api code (for testing purposes) and that gave similar
>> performance results (520 seconds on dev and 250 on production cluster). Is
>> there a way to flush the cache before we run the next experiment? I doubt
>> that the first lookup always takes longer and then the later ones perform
>> better.
>> 
>> I need something that can integrate with C++ - libcurl and stargate were
>> the easiest to start with. I could look at thrift or anything else the Hbase
>> gurus think might be a better fit performance-wise.
>> 
>> 
>>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
>> cache (.6) and region server handler counts (75) ), pointing to the same
>> rootdir, I noticed that some tables were missing. I could see a mention of
>> them in the logs, but not when I did 'list' in the shell. I recovered those
>> tables using add_table.rb script.
>> 
>> 
>> How did you shutdown this cluster?  Did you reboot machines?  Was your
>> hdfs homed on /tmp?  What is going on on your systems?  Are they
>> swapping?  Did you give HBase more than its default memory?  You read
>> the requirements and made sure ulimit and xceivers had been upped on
>> these machines?
>> 
>> 
>> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. They
>> are not swapping.
>> Hbase heap size is 2G.  I have upped the xcievers now on your
>> recommanedation.  Do I need to restart hdfs after making this change in
>> hdfs-site.xml ?
>> ulimit -n
>> 2048
>> 
>> 
>> 
>>>       a. Is there a way to check the health of all Hbase tables in the
>> cluster after an upgrade or even periodically, to make sure that everything
>> is healthy ?
>>>       b. I would like to be able to force this error again and check the
>> health of hbase and want it to report to me that some tables were lost.
>> Currently, I just found out because I had very less data and it was easy to
>> tell.
>>> 
>> 
>> Iin trunk there is such a tool.  In 0.20.x, run a count against our
>> table.  See the hbase shell.  Type help to see how.
>> 
>> 
>> What tool are you talking about here - it wasn't clear ? Count against
>> which table ? I want hbase to check all tables and I don't know how many
>> tables I have since there are too many - is that possible?
>> 
>>> 7. Here are the issues I face after this upgrade
>>>       a. when I run stop-hbase.sh, it  does not stop my regionservers on
>> other boxes.
>> 
>> Why not.  Whats going on on those machines?  If you tail the logs on
>> the hosts that won't go down and/or on master, what do they say?
>> Tail the logs.  Should give you (us) clue.
>> 
>> They do go down with some errors in the log, but down't report it on the
>> terminal.
>> http://pastebin.com/0hYwaffL  regionserver log
>> 
>> 
>> 
>>>       b. It does start them using start-hbase.sh.
>>>       c. Is it that stopping regionservers is not reported, but it does
>> stop them (I see that happening on production cluster) ?
>>> 
>> 
>> 
>> 
>>> 8. I started stargate in the upgraded 0.20.6 in dev cluster
>>>       a. earlier when I sent a URL to look for a data row that did not
>> exist, the return value was NULL , now I get an xml stating HTTP error
>> 404/405.        Everything works as expected for an existing data row.
>> 
>> The latter sounds RESTy.  What would you expect of it?  The null?
>> 
>> 
>> Yes, it should send NULL like it does in the production server. Is there
>> anyone else you can point to who would have used REST ? This is the main
>> showstopper for me currently.
>> 
>> 
>>

Re: HBase table lost on upgrade

Reply via email to