Re: all regions unregistered over time.

Stack Tue, 21 Sep 2010 22:47:41 -0700

I don't know.  Can you dump the curl output to a file or STDOUT and
take a look at it?
St.Ack


On Tue, Sep 21, 2010 at 10:36 PM, Jack Levin <[email protected]> wrote:
> Thanks, patched version is now running (actually SU's version was
> patched, ty).  Another question is in regards to REST doing base64,
> when you send a header  "Accept: application/octet-stream", I get just
> byte stream, e.g. no base64, does this mean internally the cell is
> split our byte by byte without any conversion to base64? Hence no
> overhead?
>
> -Jack
>
> On Tue, Sep 21, 2010 at 10:09 PM, Stack <[email protected]> wrote:
>> Given the log snippet, I'd guess its because your hbase doesn't have 
>> HBASE-2643.
>>
>> The above makes it so we continue through an EOF exception when
>> splitting logs where before we'd fail the splitting, requeue, split,
>> then fail again.
>>
>> Here is comment recently added to our little hbase book at 
>> src/docbkx/book.xml:
>>
>>      <section>
>>        <title>How EOFExceptions are treated when splitting a crashed
>>        RegionServers' WALs</title>
>>
>>        <para>If we get an EOF while splitting logs, we proceed with the split
>>        even when <varname>hbase.hlog.split.skip.errors</varname> ==
>>        <constant>false</constant>. An EOF while reading the last log in the
>>        set of files to split is near-guaranteed since the RegionServer likely
>>        crashed mid-write of a record. But we'll continue even if we got an
>>        EOF reading other than the last file in the set.<footnote>
>>            <para>For background, see <link
>>            
>> xlink:href="https://issues.apache.org/jira/browse/HBASE-2643";>HBASE-2643
>>            Figure how to deal with eof splitting logs</link></para>
>>          </footnote></para>
>>      </section>
>>
>> St.Ack
>>
>> On Tue, Sep 21, 2010 at 3:00 PM, Jack Levin <[email protected]> wrote:
>>> First, I saw:
>>>
>>>
>>> 2010-09-21 11:30:05,122 DEBUG
>>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Put
>>> ProcessServerShutdown of 10.103.2.5,60020,1285042335711 back on queue
>>> 2010-09-21 11:30:05,122 DEBUG
>>> org.apache.hadoop.hbase.master.RegionServerOperationQueue: Processing
>>> todo: ProcessServerShutdown of 10.103.2.5,60020,1285042335711
>>> 2010-09-21 11:30:05,122 INFO
>>> org.apache.hadoop.hbase.master.RegionServerOperation: Process shutdown
>>> of server 10.103.2.5,60020,1285042335711: logSplit: false,
>>> rootRescanned: false, n
>>> umberOfMetaRegions: 1, onlineMetaRegions.size(): 0
>>>
>>> repeated rapidly for 20 mins or so.
>>>
>>> Then:
>>>
>>> Bunch of regions got unassigned:
>>>
>>>
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Unassigning 66 regions
>>> from 10.103.2.3,60020,1285042333293
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>> img816,img2103r.jpg,1285003791610.1592893332
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>> img534,92166039.jpg,1284949117852.1009352950
>>> 2010-09-21 12:00:07,782 DEBUG
>>> org.apache.hadoop.hbase.master.RegionManager: Going to close region
>>> img36,abcwu.jpg,1285001278990.272235177
>>>
>>>
>>> Restarting master did not help.  Ultimately what brought the cluster
>>> back up, is full shutdown of regionservers, and masters, and then
>>> bring all up.
>>>
>>> Any ideas what might have happened here?
>>>
>>> We are running:
>>>
>>> HBase Version   0.89.20100726, r979826
>>> Hadoop Version  0.20.2+320, r9b72d268a0b590b4fd7d13aca17c1c453f8bc957
>>> Regions On FS   5057
>>>
>>> 3 zookeepers and 13 regionservers.
>>>
>>> -Jack
>>>
>>
>

Re: all regions unregistered over time.

Reply via email to