Ok, deleted logs that master was complaining about, restarted master only. Seemed to be stable after a bunch of the messages like the one below, then restarted regionservers, sans the one that gave me trouble this morning. Now seems to be up and running again. I don't trust it, seen this kind of "ok, I'm about to make your life suck" behavior before. We will see......
On Tue, Apr 12, 2011 at 12:40 PM, Robert Gonzalez <[email protected]> wrote: > A bunch of this in the master log: > > 2011-04-12 12:38:23,771 WARN > org.apache.hadoop.hbase.master.CatalogJanitor: REGIONINFO_QUALIFIER is > empty in > keyvalues={urlhashcopy,E3208173766FDD7C01FE9633E281ED0A,1296085183252.7501ae2b7e933057ea12610c4ec6d001./info:server/1296142856167/Put/vlen=41, > urlhashcopy,E3208173766FDD7C01FE9633E281ED0A,1296085183252.7501ae2b7e933057ea12610c4ec6d001./info:serverstartcode/1296142856167/Put/vlen=8} > 2011-04-12 12:38:23,772 WARN > org.apache.hadoop.hbase.master.CatalogJanitor: REGIONINFO_QUALIFIER is > empty in > keyvalues={urlhashcopy,E3FBE7AD03D5618BD6AE9E28D4C68FA3,1296085263052.da74b3e6d534a1d2a2f6d75e5bd7686d./info:server/1296142855579/Put/vlen=41, > urlhashcopy,E3FBE7AD03D5618BD6AE9E28D4C68FA3,1296085263052.da74b3e6d534a1d2a2f6d75e5bd7686d./info:serverstartcode/1296142855579/Put/vlen=8} > 2011-04-12 12:38:23,773 WARN > org.apache.hadoop.hbase.master.CatalogJanitor: REGIONINFO_QUALIFIER is > empty in > keyvalues={urlhashcopy,E5032151B4A9A65D45E961C29ECF3323,1296085338403.1b878d372ca96a8bdd830b7620d31464./info:server/1296142855706/Put/vlen=41, > urlhashcopy,E5032151B4A9A65D45E961C29ECF3323,1296085338403.1b878d372ca96a8bdd830b7620d31464./info:serverstartcode/1296142855706/Put/vlen=8} > 2011-04-12 12:38:23,774 WARN > org.apache.hadoop.hbase.master.CatalogJanitor: REGIONINFO_QUALIFIER is > empty in > keyvalues={urlhashcopy,E5F2ADD100BDD9791417FEC48997213F,1296085429914.6aeb9c1db827acc7a7969d3b2c8470a7./info:server/1296142855577/Put/vlen=41, > urlhashcopy,E5F2ADD100BDD9791417FEC48997213F,1296085429914.6aeb9c1db827acc7a7969d3b2c8470a7./info:serverstartcode/1296142855577/Put/vlen=8} > > > > On Tue, Apr 12, 2011 at 12:38 PM, Gary Helmling <[email protected]> wrote: >> Robert, >> >> You can stop the daemons individually on each node: >> >> bin/hbase-daemon.sh stop master >> bin/hbase-daemon.sh stop regionserver >> >> Use this to stop the processes that can be cleanly shutdown. Then let's >> look at which processes are still hanging and what the logs of the hanging >> processes are showing. >> >> Thanks, >> Gary >> >> >> On Tue, Apr 12, 2011 at 10:34 AM, Robert Gonzalez < >> [email protected]> wrote: >> >>> You mean like this: >>> >>> hbase@c1-m02:/usr/lib/hbase-0.90.0/bin$ ./stop-hbase.sh >>> stopping >>> hbase........................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................... >>> >>> Still going..... :( >>> >>> On Tue, Apr 12, 2011 at 12:01 PM, Jinsong Hu <[email protected]> >>> wrote: >>> > You probably should stop all master/regionservers, then start one master, >>> > tail -f the log to confirm all the hlogs are handled, >>> > >>> > then start the first regionserver, and then other regionservers. >>> > >>> > I have encountered this issues before. >>> > hbase is not as good as what you want, but not as bad as you say either. >>> The >>> > truth is in between. >>> > >>> > Jimmy >>> > >>> > -------------------------------------------------- >>> > From: "Robert Gonzalez" <[email protected]> >>> > Sent: Tuesday, April 12, 2011 9:49 AM >>> > To: <[email protected]> >>> > Subject: HBase is not ready for Primetime >>> > >>> >> We've been using HBase for about a year, consistenly running into >>> >> problems where we lost data. After reading forums and some back and >>> >> forth with other Hbase users, we changed our data methodology to save >>> >> less data per row. This last time, we upgraded to 0.90 at the >>> >> recommendation of the hbase community, cleared off all our data, and >>> >> started over. Seemed to be running ok for a couple of months, until >>> >> this morning. One of the regionservers stopped responding to data >>> >> requests and we tried to restart it to no avail. Then we shutdown our >>> >> processes so that nothing was using HBase and we shut down HBase and >>> >> brought it back up. We waited a little bit, until hbase status >>> >> indicated that all the servers were back up. We turned on our >>> >> processes and lo and behold, HBase is broken, getting >>> >> org.apache.hadoop.hbase. >>> >> NotServingRegionException: >>> >> org.apache.hadoop.hbase.NotServingRegionException: Region is not >>> >> online: -ROOT-,,0 >>> >> at >>> >> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getRegion(HRegionServer.java:2319) >>> >> at >>> >> >>> org.apache.hadoop.hbase.regionserver.HRegionServer.getClosestRowBefore(HRegionServer.java:1607) >>> >> at sun.reflect.GeneratedMethodAccessor2.invoke(Unknown Source) >>> >> at >>> >> >>> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) >>> >> at java.lang.reflect.Method.invoke(Method.java:597) >>> >> at >>> org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) >>> >> at >>> >> >>> org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1036) >>> >> >>> >> And now we can't even shut it down. >>> >> >>> >> Seems that Hbase is just too flaky to depend on for a serious system, >>> >> we've not had this type of problem to this degree with conventional DB >>> >> systems. Now that we are not saving that much data (we are using large >>> >> hdfs files for that) in Hbase, we are probably going to move back to a >>> >> conventional SQL system for our control data. We just can't afford to >>> >> be constantly fighting problems like this. >>> >> >>> >> >>> >> -- >>> >> >>> >> Robert Gonzalez >>> >> >>> >> Maxpoint Interactive >>> >> >>> > >>> >>> >>> >>> -- >>> >>> >>> Robert Gonzalez / Senior Software Architect >>> >>> 7600 Burnet Road, Suite 500 >>> Austin, TX 78757 >>> T 512 981 9561 F 919 882 8529 >>> [email protected] >>> >> > > > > -- > > > Robert Gonzalez / Senior Software Architect > > 7600 Burnet Road, Suite 500 > Austin, TX 78757 > T 512 981 9561 F 919 882 8529 > [email protected] > -- Robert Gonzalez / Senior Software Architect 7600 Burnet Road, Suite 500 Austin, TX 78757 T 512 981 9561 F 919 882 8529 [email protected]
