Right. Anyway, where can I get this file from ? Any pointers? I can't find it at src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6.
-----Original Message----- From: Ted Yu [mailto:[email protected]] Sent: Wednesday, September 08, 2010 3:09 PM To: [email protected] Subject: Re: HBase table lost on upgrade master.jsp shows tables, not regions. I personally haven't encountered the problem you're facing. On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <[email protected]> wrote: > Ted, > I did look at that thread. It seems I need to modify the code in that file? > Could you point me to the exact steps to get it and compile it? > > Did you get through the issue if regions being added to catalog , but do > not show up in master.jsp? > > > > > On Sep 4, 2010, at 9:24 PM, Ted Yu <[email protected]> wrote: > > > The tool Stack mentioned is hbck. If you want to port it to 0.20, see > email > > thread entitled: > > compiling HBaseFsck.java for 0.20.5You should try reducing the number of > > tables in your system, possibly through HBASE-2473 > > > > Cheers > > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <[email protected]> > wrote: > > > >> > >> > >> > >> -----Original Message----- > >> From: [email protected] [mailto:[email protected]] On Behalf Of > Stack > >> Sent: Wednesday, September 01, 2010 10:45 PM > >> To: [email protected] > >> Subject: Re: HBase table lost on upgrade > >> > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <[email protected]> > wrote: > >>> That email was just informational. Below are the details on my cluster > - > >> let me know if more is needed. > >>> > >>> I have 2 hbase clusters setup > >>> - for production, 6 node cluster, 32G, 8 processors > >>> - for dev, 3 node cluster , 16GRAM , 4 processors > >>> > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters, > >> successfully. > >> > >> Why not latest stable version, 0.20.6? > >> > >> This was couple of months ago. > >> > >> > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table. > >> > >> > >> Whats this mean? Each of the .5M cells was 2G in size or the total size > >> was 2G? > >> > >> The total file size is 2G. Cells are of the order of hundreds of bytes. > >> > >> > >>> An example Hbase table looks like this: > >>> {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS > => > >> '100', COM true > >>> PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => > >> '65536', IN_MEMO > >>> RY => 'false', BLOCKCACHE => 'true'}]} > >> > >> That looks fine. > >> > >>> 3. I started stargate on one server and accessed Hbase for reading from > >> another 3rd party application successfully. > >>> It took 600 seconds on dev cluster and 250 on production to read > >> .5M records from Hbase via stargate. > >> > >> > >> That don't sound so good. > >> > >> > >> > >>> 4. later to boost read performance, it was suggested that upgrading to > >> Hbase0.20.6 will be helpful. I did that on production (w/o running the > >> migrate script) and re-started stargate and everything was running fine, > >> though I did not see a bump in performance. > >>> > >>> 5. Eventually, I had to move to dev cluster from production because of > >> some resource issues at our end. Dev cluster had 0.20.3 at this time. As > I > >> started loading more files into Hbase (<10 versions of <1G files) and > >> converting my app to use hbase more heavily (via more stargate clients), > the > >> performance started degrading. I decided it was time to upgrade dev > cluster > >> as well to 0.20.6. (I did not run the migrate script here as well, I > missed > >> this step in the doc). > >>> > >> > >> What kinda perf you looking for from REST? > >> > >> Do you have to use REST? All is base64'd so its safe to transport. > >> > >> I also have the Java Api code (for testing purposes) and that gave > similar > >> performance results (520 seconds on dev and 250 on production cluster). > Is > >> there a way to flush the cache before we run the next experiment? I > doubt > >> that the first lookup always takes longer and then the later ones > perform > >> better. > >> > >> I need something that can integrate with C++ - libcurl and stargate were > >> the easiest to start with. I could look at thrift or anything else the > Hbase > >> gurus think might be a better fit performance-wise. > >> > >> > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased block > >> cache (.6) and region server handler counts (75) ), pointing to the same > >> rootdir, I noticed that some tables were missing. I could see a mention > of > >> them in the logs, but not when I did 'list' in the shell. I recovered > those > >> tables using add_table.rb script. > >> > >> > >> How did you shutdown this cluster? Did you reboot machines? Was your > >> hdfs homed on /tmp? What is going on on your systems? Are they > >> swapping? Did you give HBase more than its default memory? You read > >> the requirements and made sure ulimit and xceivers had been upped on > >> these machines? > >> > >> > >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. > They > >> are not swapping. > >> Hbase heap size is 2G. I have upped the xcievers now on your > >> recommanedation. Do I need to restart hdfs after making this change in > >> hdfs-site.xml ? > >> ulimit -n > >> 2048 > >> > >> > >> > >>> a. Is there a way to check the health of all Hbase tables in the > >> cluster after an upgrade or even periodically, to make sure that > everything > >> is healthy ? > >>> b. I would like to be able to force this error again and check > the > >> health of hbase and want it to report to me that some tables were lost. > >> Currently, I just found out because I had very less data and it was easy > to > >> tell. > >>> > >> > >> Iin trunk there is such a tool. In 0.20.x, run a count against our > >> table. See the hbase shell. Type help to see how. > >> > >> > >> What tool are you talking about here - it wasn't clear ? Count against > >> which table ? I want hbase to check all tables and I don't know how many > >> tables I have since there are too many - is that possible? > >> > >>> 7. Here are the issues I face after this upgrade > >>> a. when I run stop-hbase.sh, it does not stop my regionservers > on > >> other boxes. > >> > >> Why not. Whats going on on those machines? If you tail the logs on > >> the hosts that won't go down and/or on master, what do they say? > >> Tail the logs. Should give you (us) clue. > >> > >> They do go down with some errors in the log, but down't report it on the > >> terminal. > >> http://pastebin.com/0hYwaffL regionserver log > >> > >> > >> > >>> b. It does start them using start-hbase.sh. > >>> c. Is it that stopping regionservers is not reported, but it does > >> stop them (I see that happening on production cluster) ? > >>> > >> > >> > >> > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster > >>> a. earlier when I sent a URL to look for a data row that did not > >> exist, the return value was NULL , now I get an xml stating HTTP error > >> 404/405. Everything works as expected for an existing data row. > >> > >> The latter sounds RESTy. What would you expect of it? The null? > >> > >> > >> Yes, it should send NULL like it does in the production server. Is there > >> anyone else you can point to who would have used REST ? This is the main > >> showstopper for me currently. > >> > >> > >> >
