Re: HBase table lost on upgrade

Ted Yu Sun, 05 Sep 2010 09:07:43 -0700

dfs.datanode.max.xcievers is read in DataXceiverServer ctor.
If you change its value, you need to restart the cluster.


On Sat, Sep 4, 2010 at 9:23 PM, Ted Yu <[email protected]> wrote:

> The tool Stack mentioned is hbck. If you want to port it to 0.20, see email
> thread entitled:
> compiling HBaseFsck.java for 0.20.5You should try reducing the number of
> tables in your system, possibly through HBASE-2473
>
> Cheers
>
>
> On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <[email protected]> wrote:
>
>>
>>
>>
>> -----Original Message-----
>> From: [email protected] [mailto:[email protected]] On Behalf Of Stack
>> Sent: Wednesday, September 01, 2010 10:45 PM
>> To: [email protected]
>> Subject: Re: HBase table lost on upgrade
>>
>> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <[email protected]> wrote:
>> > That email was just informational. Below are the details on my cluster -
>> let me know if more is needed.
>> >
>> > I have 2 hbase clusters setup
>> > -       for production, 6 node cluster,  32G, 8 processors
>> > -       for dev, 3 node cluster , 16GRAM , 4 processors
>> >
>> > 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters,
>> successfully.
>>
>> Why not latest stable version, 0.20.6?
>>
>> This was couple of months ago.
>>
>>
>> > 2. After that I loaded 2G+ files into HDFS and HBASE table.
>>
>>
>> Whats this mean?  Each of the .5M cells was 2G in size or the total size
>> was 2G?
>>
>> The total file size is 2G. Cells are of the order of hundreds of bytes.
>>
>>
>> >        An example Hbase table looks like this:
>> >                {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS
>> => '100', COM true
>> >                 PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE =>
>> '65536', IN_MEMO
>> >                 RY => 'false', BLOCKCACHE => 'true'}]}
>>
>> That looks fine.
>>
>> > 3. I started stargate on one server and accessed Hbase for reading from
>> another 3rd party application successfully.
>> >        It took 600 seconds on dev cluster and 250 on production to read
>> .5M records from Hbase via stargate.
>>
>>
>> That don't sound so good.
>>
>>
>>
>> > 4. later to boost read performance, it was suggested that upgrading to
>> Hbase0.20.6 will be helpful. I did that on production (w/o running the
>> migrate script) and re-started stargate and everything was running fine,
>> though I did not see a bump in performance.
>> >
>> > 5. Eventually, I had to move to dev cluster from production because of
>> some resource issues at our end. Dev cluster had 0.20.3 at this time. As I
>> started loading more files into Hbase (<10 versions of <1G files) and
>> converting my app to use hbase more heavily (via more stargate clients), the
>> performance started degrading. I decided it was time to upgrade dev cluster
>> as well to 0.20.6.  (I did not run the migrate script here as well, I missed
>> this step in the doc).
>> >
>>
>> What kinda perf you looking for from REST?
>>
>> Do you have to use REST?  All is base64'd so its safe to transport.
>>
>> I also have the Java Api code (for testing purposes) and that gave similar
>> performance results (520 seconds on dev and 250 on production cluster). Is
>> there a way to flush the cache before we run the next experiment? I doubt
>> that the first lookup always takes longer and then the later ones perform
>> better.
>>
>> I need something that can integrate with C++ - libcurl and stargate were
>> the easiest to start with. I could look at thrift or anything else the Hbase
>> gurus think might be a better fit performance-wise.
>>
>>
>> > 6. When Hbase 0.20.6 came back up on dev cluster (with increased block
>> cache (.6) and region server handler counts (75) ), pointing to the same
>> rootdir, I noticed that some tables were missing. I could see a mention of
>> them in the logs, but not when I did 'list' in the shell. I recovered those
>> tables using add_table.rb script.
>>
>>
>> How did you shutdown this cluster?  Did you reboot machines?  Was your
>> hdfs homed on /tmp?  What is going on on your systems?  Are they
>> swapping?  Did you give HBase more than its default memory?  You read
>> the requirements and made sure ulimit and xceivers had been upped on
>> these machines?
>>
>>
>> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp.
>> They are not swapping.
>> Hbase heap size is 2G.  I have upped the xcievers now on your
>> recommanedation.  Do I need to restart hdfs after making this change in
>> hdfs-site.xml ?
>> ulimit -n
>> 2048
>>
>>
>>
>> >        a. Is there a way to check the health of all Hbase tables in the
>> cluster after an upgrade or even periodically, to make sure that everything
>> is healthy ?
>> >        b. I would like to be able to force this error again and check
>> the health of hbase and want it to report to me that some tables were lost.
>> Currently, I just found out because I had very less data and it was easy to
>> tell.
>> >
>>
>> Iin trunk there is such a tool.  In 0.20.x, run a count against our
>> table.  See the hbase shell.  Type help to see how.
>>
>>
>> What tool are you talking about here - it wasn't clear ? Count against
>> which table ? I want hbase to check all tables and I don't know how many
>> tables I have since there are too many - is that possible?
>>
>> > 7. Here are the issues I face after this upgrade
>> >        a. when I run stop-hbase.sh, it  does not stop my regionservers
>> on other boxes.
>>
>> Why not.  Whats going on on those machines?  If you tail the logs on
>> the hosts that won't go down and/or on master, what do they say?
>> Tail the logs.  Should give you (us) clue.
>>
>> They do go down with some errors in the log, but down't report it on the
>> terminal.
>> http://pastebin.com/0hYwaffL  regionserver log
>>
>>
>>
>> >        b. It does start them using start-hbase.sh.
>> >        c. Is it that stopping regionservers is not reported, but it does
>> stop them (I see that happening on production cluster) ?
>> >
>>
>>
>>
>> > 8. I started stargate in the upgraded 0.20.6 in dev cluster
>> >        a. earlier when I sent a URL to look for a data row that did not
>> exist, the return value was NULL , now I get an xml stating HTTP error
>> 404/405.        Everything works as expected for an existing data row.
>>
>> The latter sounds RESTy.  What would you expect of it?  The null?
>>
>>
>> Yes, it should send NULL like it does in the production server. Is there
>> anyone else you can point to who would have used REST ? This is the main
>> showstopper for me currently.
>>
>>
>>
>

Re: HBase table lost on upgrade

Reply via email to