Re: all regions unregistered over time.

Jack Levin Tue, 21 Sep 2010 22:37:35 -0700

Thanks, patched version is now running (actually SU's version was
patched, ty).  Another question is in regards to REST doing base64,
when you send a header  "Accept: application/octet-stream", I get just
byte stream, e.g. no base64, does this mean internally the cell is
split our byte by byte without any conversion to base64? Hence no
overhead?


-Jack

On Tue, Sep 21, 2010 at 10:09 PM, Stack <[email protected]> wrote:
> Given the log snippet, I'd guess its because your hbase doesn't have 
> HBASE-2643.
>
> The above makes it so we continue through an EOF exception when
> splitting logs where before we'd fail the splitting, requeue, split,
> then fail again.
>
> Here is comment recently added to our little hbase book at 
> src/docbkx/book.xml:
>
>      <section>
>        <title>How EOFExceptions are treated when splitting a crashed
>        RegionServers' WALs</title>
>
>        <para>If we get an EOF while splitting logs, we proceed with the split
>        even when <varname>hbase.hlog.split.skip.errors</varname> ==
>        <constant>false</constant>. An EOF while reading the last log in the
>        set of files to split is near-guaranteed since the RegionServer likely
>        crashed mid-write of a record. But we'll continue even if we got an
>        EOF reading other than the last file in the set.<footnote>
>            <para>For background, see <link
>            
> xlink:href="https://issues.apache.org/jira/browse/HBASE-2643";>HBASE-2643
>            Figure how to deal with eof splitting logs</link></para>
>          </footnote></para>
>      </section>
>
> St.Ack
>
> On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <[email protected]> wrote:
>> First, I saw:
>>
>>
>> 2010-09-21 11:30:05,122 DEBUG
>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
>> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
>> 2010-09-21 11:30:05,122 DEBUG
>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
>> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
>> 2010-09-21 11:30:05,122 INFO
>> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
>> of server 10.103.2.5,60020,1285042335711: logSplit: false,
>> rootRescanned: false, n
>> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>
>> repeated rapidly for 20 mins or so.
>>
>> Then:
>>
>> Bunch of regions got unassigned:
>>
>>
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
>> from 10.103.2.3,60020,1285042333293
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>> img816,img2103r.jpg,1285003791610.1592893332
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>> img534,92166039.jpg,1284949117852.1009352950
>> 2010-09-21 12:00:07,782 DEBUG
>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>> img36,abcwu.jpg,1285001278990.272235177
>>
>>
>> Restarting master did not help.  Ultimately what brought the cluster
>> back up, is full shutdown of regionservers, and masters, and then
>> bring all up.
>>
>> Any ideas what might have happened here?
>>
>> We are running:
>>
>> HBase Version   0.89.20100726, r979826
>> Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
>> Regions On FS   5057
>>
>> 3 zookeepers and 13 regionservers.
>>
>> -Jack
>>
>

Re: all regions unregistered over time.

Reply via email to