dfs.datanode.max.xcievers is read in DataXceiverServer ctor. If you change its value, you need to restart the cluster.
On Sat, Sep 4, 2010 at 9:23 PM, Ted Yu <[email protected]> wrote: > The tool Stack mentioned is hbck. If you want to port it to 0.20, see email > thread entitled: > compiling HBaseFsck.java for 0.20.5You should try reducing the number of > tables in your system, possibly through HBASE-2473 > > Cheers > > > On Thu, Sep 2, 2010 at 11:41 AM, Sharma, Avani <[email protected]> wrote: > >> >> >> >> -----Original Message----- >> From: [email protected] [mailto:[email protected]] On Behalf Of Stack >> Sent: Wednesday, September 01, 2010 10:45 PM >> To: [email protected] >> Subject: Re: HBase table lost on upgrade >> >> On Wed, Sep 1, 2010 at 5:49 PM, Sharma, Avani <[email protected]> wrote: >> > That email was just informational. Below are the details on my cluster - >> let me know if more is needed. >> > >> > I have 2 hbase clusters setup >> > - for production, 6 node cluster, 32G, 8 processors >> > - for dev, 3 node cluster , 16GRAM , 4 processors >> > >> > 1. I installed hadoop0.20.2 and hbase0.20.3 on both these clusters, >> successfully. >> >> Why not latest stable version, 0.20.6? >> >> This was couple of months ago. >> >> >> > 2. After that I loaded 2G+ files into HDFS and HBASE table. >> >> >> Whats this mean? Each of the .5M cells was 2G in size or the total size >> was 2G? >> >> The total file size is 2G. Cells are of the order of hundreds of bytes. >> >> >> > An example Hbase table looks like this: >> > {NAME =>'TABLE', FAMILIES => [{NAME => 'data', VERSIONS >> => '100', COM true >> > PRESSION => 'NONE', TTL => '2147483647', BLOCKSIZE => >> '65536', IN_MEMO >> > RY => 'false', BLOCKCACHE => 'true'}]} >> >> That looks fine. >> >> > 3. I started stargate on one server and accessed Hbase for reading from >> another 3rd party application successfully. >> > It took 600 seconds on dev cluster and 250 on production to read >> .5M records from Hbase via stargate. >> >> >> That don't sound so good. >> >> >> >> > 4. later to boost read performance, it was suggested that upgrading to >> Hbase0.20.6 will be helpful. I did that on production (w/o running the >> migrate script) and re-started stargate and everything was running fine, >> though I did not see a bump in performance. >> > >> > 5. Eventually, I had to move to dev cluster from production because of >> some resource issues at our end. Dev cluster had 0.20.3 at this time. As I >> started loading more files into Hbase (<10 versions of <1G files) and >> converting my app to use hbase more heavily (via more stargate clients), the >> performance started degrading. I decided it was time to upgrade dev cluster >> as well to 0.20.6. (I did not run the migrate script here as well, I missed >> this step in the doc). >> > >> >> What kinda perf you looking for from REST? >> >> Do you have to use REST? All is base64'd so its safe to transport. >> >> I also have the Java Api code (for testing purposes) and that gave similar >> performance results (520 seconds on dev and 250 on production cluster). Is >> there a way to flush the cache before we run the next experiment? I doubt >> that the first lookup always takes longer and then the later ones perform >> better. >> >> I need something that can integrate with C++ - libcurl and stargate were >> the easiest to start with. I could look at thrift or anything else the Hbase >> gurus think might be a better fit performance-wise. >> >> >> > 6. When Hbase 0.20.6 came back up on dev cluster (with increased block >> cache (.6) and region server handler counts (75) ), pointing to the same >> rootdir, I noticed that some tables were missing. I could see a mention of >> them in the logs, but not when I did 'list' in the shell. I recovered those >> tables using add_table.rb script. >> >> >> How did you shutdown this cluster? Did you reboot machines? Was your >> hdfs homed on /tmp? What is going on on your systems? Are they >> swapping? Did you give HBase more than its default memory? You read >> the requirements and made sure ulimit and xceivers had been upped on >> these machines? >> >> >> Did not reboot machines. hdfs or hbase do not store data/logs in /tmp. >> They are not swapping. >> Hbase heap size is 2G. I have upped the xcievers now on your >> recommanedation. Do I need to restart hdfs after making this change in >> hdfs-site.xml ? >> ulimit -n >> 2048 >> >> >> >> > a. Is there a way to check the health of all Hbase tables in the >> cluster after an upgrade or even periodically, to make sure that everything >> is healthy ? >> > b. I would like to be able to force this error again and check >> the health of hbase and want it to report to me that some tables were lost. >> Currently, I just found out because I had very less data and it was easy to >> tell. >> > >> >> Iin trunk there is such a tool. In 0.20.x, run a count against our >> table. See the hbase shell. Type help to see how. >> >> >> What tool are you talking about here - it wasn't clear ? Count against >> which table ? I want hbase to check all tables and I don't know how many >> tables I have since there are too many - is that possible? >> >> > 7. Here are the issues I face after this upgrade >> > a. when I run stop-hbase.sh, it does not stop my regionservers >> on other boxes. >> >> Why not. Whats going on on those machines? If you tail the logs on >> the hosts that won't go down and/or on master, what do they say? >> Tail the logs. Should give you (us) clue. >> >> They do go down with some errors in the log, but down't report it on the >> terminal. >> http://pastebin.com/0hYwaffL regionserver log >> >> >> >> > b. It does start them using start-hbase.sh. >> > c. Is it that stopping regionservers is not reported, but it does >> stop them (I see that happening on production cluster) ? >> > >> >> >> >> > 8. I started stargate in the upgraded 0.20.6 in dev cluster >> > a. earlier when I sent a URL to look for a data row that did not >> exist, the return value was NULL , now I get an xml stating HTTP error >> 404/405. Everything works as expected for an existing data row. >> >> The latter sounds RESTy. What would you expect of it? The null? >> >> >> Yes, it should send NULL like it does in the production server. Is there >> anyone else you can point to who would have used REST ? This is the main >> showstopper for me currently. >> >> >> >
