Thanks, patched version is now running (actually SU's version was patched, ty). Another question is in regards to REST doing base64, when you send a header "Accept: application/octet-stream", I get just byte stream, e.g. no base64, does this mean internally the cell is split our byte by byte without any conversion to base64? Hence no overhead?
-Jack On Tue, Sep 21, 2010 at 10:09 PM, Stack <[email protected]> wrote: > Given the log snippet, I'd guess its because your hbase doesn't have > HBASE-2643. > > The above makes it so we continue through an EOF exception when > splitting logs where before we'd fail the splitting, requeue, split, > then fail again. > > Here is comment recently added to our little hbase book at > src/docbkx/book.xml: > > <section> > <title>How EOFExceptions are treated when splitting a crashed > RegionServers' WALs</title> > > <para>If we get an EOF while splitting logs, we proceed with the split > even when <varname>hbase.hlog.split.skip.errors</varname> == > <constant>false</constant>. An EOF while reading the last log in the > set of files to split is near-guaranteed since the RegionServer likely > crashed mid-write of a record. But we'll continue even if we got an > EOF reading other than the last file in the set.<footnote> > <para>For background, see <link > > xlink:href="https://issues.apache.org/jira/browse/HBASE-2643">HBASE-2643 > Figure how to deal with eof splitting logs</link></para> > </footnote></para> > </section> > > St.Ack > > On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <[email protected]> wrote: >> First, I saw: >> >> >> 2010-09-21 11:30:05,122 DEBUG >> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put >> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue >> 2010-09-21 11:30:05,122 DEBUG >> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing >> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711 >> 2010-09-21 11:30:05,122 INFO >> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown >> of server 10.103.2.5,60020,1285042335711: logSplit: false, >> rootRescanned: false, n >> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0 >> >> repeated rapidly for 20 mins or so. >> >> Then: >> >> Bunch of regions got unassigned: >> >> >> 2010-09-21 12:00:07,782 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions >> from 10.103.2.3,60020,1285042333293 >> 2010-09-21 12:00:07,782 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Going to close region >> img816,img2103r.jpg,1285003791610.1592893332 >> 2010-09-21 12:00:07,782 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Going to close region >> img534,92166039.jpg,1284949117852.1009352950 >> 2010-09-21 12:00:07,782 DEBUG >> org.apache.hadoop.hbase.master.RegionManager: Going to close region >> img36,abcwu.jpg,1285001278990.272235177 >> >> >> Restarting master did not help. Ultimately what brought the cluster >> back up, is full shutdown of regionservers, and masters, and then >> bring all up. >> >> Any ideas what might have happened here? >> >> We are running: >> >> HBase Version 0.89.20100726, r979826 >> Hadoop Version 0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957 >> Regions On FS 5057 >> >> 3 zookeepers and 13 regionservers. >> >> -Jack >> >
