You can copy HBaseFsck.java from trunk and compile in 0.20.6 On Wed, Sep 8, 2010 at 3:43 PM, Sharma, Avani <[email protected]> wrote:
> Right. > > Anyway, where can I get this file from ? Any pointers? > I can't find it at > src/main/java/org/apache/hadoop/hbase/client/HBaseFsck.java in 0.20.6. > > -----Original Message----- > From: Ted Yu [mailto:[email protected]] > Sent: Wednesday, September 08, 2010 3:09 PM > To: [email protected] > Subject: Re: HBase table lost on upgrade > > master.jsp shows tables, not regions. > I personally haven't encountered the problem you're facing. > > On Wed, Sep 8, 2010 at 2:36 PM, Sharma, Avani <[email protected]> wrote: > > > Ted, > > I did look at that thread. It seems I need to modify the code in that > file? > > Could you point me to the exact steps to get it and compile it? > > > > Did you get through the issue if regions being added to catalog , but do > > not show up in master.jsp? > > > > > > > > > > On Sep 4, 2010, at 9:24 PM, Ted Yu <[email protected]> wrote: > > > > > The tool Stack mentioned is hbck. If you want to port it to 0.20, see > > email > > > thread entitled: > > > compiling HBaseFsck.java for 0.20.5You should try reducing the number > of > > > tables in your system, possibly through HBASE-2473 > > > > > > Cheers > > > > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <[email protected]> > > wrote: > > > > > >> > > >> > > >> > > >> -----Original Message----- > > >> From: [email protected] [mailto:[email protected]] On Behalf Of > > Stack > > >> Sent: Wednesday, September 01, 2010 10:45 PM > > >> To: [email protected] > > >> Subject: Re: HBase table lost on upgrade > > >> > > >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <[email protected]> > > wrote: > > >>> That email was just informational. Below are the details on my > cluster > > - > > >> let me know if more is needed. > > >>> > > >>> I have 2 hbase clusters setup > > >>> - for production, 6 node cluster, 32G, 8 processors > > >>> - for dev, 3 node cluster , 16GRAM , 4 processors > > >>> > > >>> 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters, > > >> successfully. > > >> > > >> Why not latest stable version, 0.20.6? > > >> > > >> This was couple of months ago. > > >> > > >> > > >>> 2. After that I loaded 2G+ files into HDFS and HBASE table. > > >> > > >> > > >> Whats this mean? Each of the .5M cells was 2G in size or the total > size > > >> was 2G? > > >> > > >> The total file size is 2G. Cells are of the order of hundreds of > bytes. > > >> > > >> > > >>> An example Hbase table looks like this: > > >>> {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS > > => > > >> '100', COM true > > >>> PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => > > >> '65536', IN_MEMO > > >>> RY => 'false', BLOCKCACHE => 'true'}]} > > >> > > >> That looks fine. > > >> > > >>> 3. I started stargate on one server and accessed Hbase for reading > from > > >> another 3rd party application successfully. > > >>> It took 600 seconds on dev cluster and 250 on production to > read > > >> .5M records from Hbase via stargate. > > >> > > >> > > >> That don't sound so good. > > >> > > >> > > >> > > >>> 4. later to boost read performance, it was suggested that upgrading > to > > >> Hbase0.20.6 will be helpful. I did that on production (w/o running the > > >> migrate script) and re-started stargate and everything was running > fine, > > >> though I did not see a bump in performance. > > >>> > > >>> 5. Eventually, I had to move to dev cluster from production because > of > > >> some resource issues at our end. Dev cluster had 0.20.3 at this time. > As > > I > > >> started loading more files into Hbase (<10 versions of <1G files) and > > >> converting my app to use hbase more heavily (via more stargate > clients), > > the > > >> performance started degrading. I decided it was time to upgrade dev > > cluster > > >> as well to 0.20.6. (I did not run the migrate script here as well, I > > missed > > >> this step in the doc). > > >>> > > >> > > >> What kinda perf you looking for from REST? > > >> > > >> Do you have to use REST? All is base64'd so its safe to transport. > > >> > > >> I also have the Java Api code (for testing purposes) and that gave > > similar > > >> performance results (520 seconds on dev and 250 on production > cluster). > > Is > > >> there a way to flush the cache before we run the next experiment? I > > doubt > > >> that the first lookup always takes longer and then the later ones > > perform > > >> better. > > >> > > >> I need something that can integrate with C++ - libcurl and stargate > were > > >> the easiest to start with. I could look at thrift or anything else the > > Hbase > > >> gurus think might be a better fit performance-wise. > > >> > > >> > > >>> 6. When Hbase 0.20.6 came back up on dev cluster (with increased > block > > >> cache (.6) and region server handler counts (75) ), pointing to the > same > > >> rootdir, I noticed that some tables were missing. I could see a > mention > > of > > >> them in the logs, but not when I did 'list' in the shell. I recovered > > those > > >> tables using add_table.rb script. > > >> > > >> > > >> How did you shutdown this cluster? Did you reboot machines? Was your > > >> hdfs homed on /tmp? What is going on on your systems? Are they > > >> swapping? Did you give HBase more than its default memory? You read > > >> the requirements and made sure ulimit and xceivers had been upped on > > >> these machines? > > >> > > >> > > >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. > > They > > >> are not swapping. > > >> Hbase heap size is 2G. I have upped the xcievers now on your > > >> recommanedation. Do I need to restart hdfs after making this change > in > > >> hdfs-site.xml ? > > >> ulimit -n > > >> 2048 > > >> > > >> > > >> > > >>> a. Is there a way to check the health of all Hbase tables in > the > > >> cluster after an upgrade or even periodically, to make sure that > > everything > > >> is healthy ? > > >>> b. I would like to be able to force this error again and check > > the > > >> health of hbase and want it to report to me that some tables were > lost. > > >> Currently, I just found out because I had very less data and it was > easy > > to > > >> tell. > > >>> > > >> > > >> Iin trunk there is such a tool. In 0.20.x, run a count against our > > >> table. See the hbase shell. Type help to see how. > > >> > > >> > > >> What tool are you talking about here - it wasn't clear ? Count against > > >> which table ? I want hbase to check all tables and I don't know how > many > > >> tables I have since there are too many - is that possible? > > >> > > >>> 7. Here are the issues I face after this upgrade > > >>> a. when I run stop-hbase.sh, it does not stop my regionservers > > on > > >> other boxes. > > >> > > >> Why not. Whats going on on those machines? If you tail the logs on > > >> the hosts that won't go down and/or on master, what do they say? > > >> Tail the logs. Should give you (us) clue. > > >> > > >> They do go down with some errors in the log, but down't report it on > the > > >> terminal. > > >> http://pastebin.com/0hYwaffL regionserver log > > >> > > >> > > >> > > >>> b. It does start them using start-hbase.sh. > > >>> c. Is it that stopping regionservers is not reported, but it > does > > >> stop them (I see that happening on production cluster) ? > > >>> > > >> > > >> > > >> > > >>> 8. I started stargate in the upgraded 0.20.6 in dev cluster > > >>> a. earlier when I sent a URL to look for a data row that did > not > > >> exist, the return value was NULL , now I get an xml stating HTTP error > > >> 404/405. Everything works as expected for an existing data row. > > >> > > >> The latter sounds RESTy. What would you expect of it? The null? > > >> > > >> > > >> Yes, it should send NULL like it does in the production server. Is > there > > >> anyone else you can point to who would have used REST ? This is the > main > > >> showstopper for me currently. > > >> > > >> > > >> > > >
