or:
LOG.warn("Region " + region.getRegionNameAsString() + " has too
many " +
"store files; delaying flush up to " + this.blockingWaitTime +
"ms");
sth like:
WARN org.apache.hadoop.hbase.regionserver.MemStoreFlusher: Region
occurrence,\x17\xF1o\x9C,1340981109494.ecb85155563c6614e5448c7d700b909e.
has too many store files; delaying flush up to 90000ms
On Wed, Nov 12, 2014 at 10:26 AM, Qiang Tian <[email protected]> wrote:
> the checkResource Ted mentioned is a good suspect. see online hbase book
> "9.7.7.7.1.1. Being Stuck".
> Did you see below message in your RS log?
> LOG.info("Waited " + (System.currentTimeMillis() - fqe.createTime)
> +
> "ms on a compaction to clean up 'too many store files'; waited "
> +
> "long enough... proceeding with flush of " +
> region.getRegionNameAsString());
>
>
> I did a quick test setting "hbase.hregion.memstore.block.multiplier" = 0,
> issuing a put in hbase shell will trigger flush and throw region too busy
> exception to client, and the retry mechanism will make it done in next
> multi RPC call.
>
>
>
> On Wed, Nov 12, 2014 at 1:21 AM, Brian Jeltema <
> [email protected]> wrote:
>
>> Thanks. I appear to have resolved this problem by restarting the HBase
>> Master and the RegionServers
>> that were reporting the failure.
>>
>> Brian
>>
>> On Nov 11, 2014, at 12:13 PM, Ted Yu <[email protected]> wrote:
>>
>> > For your first question, region server web UI,
>> > rs-status#regionRequestStats, shows Write Request Count.
>> >
>> > You can monitor the value for the underlying region to see if it
>> receives
>> > above-normal writes.
>> >
>> > Cheers
>> >
>> > On Mon, Nov 10, 2014 at 4:06 PM, Brian Jeltema <[email protected]>
>> wrote:
>> >
>> >>> Was the region containing this row hot around the time of failure ?
>> >>
>> >> How do I measure that?
>> >>
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>
>> >> I didn't see anything in the region server logs to indicate a problem.
>> And
>> >> given the
>> >> reproducibility of the behavior, it's hard to see how dynamic
>> parameters
>> >> such as
>> >> memory pressure could be at the root of the problem.
>> >>
>> >> Brian
>> >>
>> >> On Nov 10, 2014, at 3:22 PM, Ted Yu <[email protected]> wrote:
>> >>
>> >>> Was the region containing this row hot around the time of failure ?
>> >>>
>> >>> Can you check region server log (along with monitoring tool) what
>> >> memstore pressure was ?
>> >>>
>> >>> Thanks
>> >>>
>> >>> On Nov 10, 2014, at 11:34 AM, Brian Jeltema <
>> >> [email protected]> wrote:
>> >>>
>> >>>>> How many tasks may write to this row concurrently ?
>> >>>>
>> >>>> only 1 mapper should be writing to this row. Is there a way to check
>> >> which
>> >>>> locks are being held?
>> >>>>
>> >>>>> Which 0.98 release are you using ?
>> >>>>
>> >>>> 0.98.0.2.1.2.1-471-hadoop2
>> >>>>
>> >>>> Thanks
>> >>>> Brian
>> >>>>
>> >>>> On Nov 10, 2014, at 2:21 PM, Ted Yu <[email protected]> wrote:
>> >>>>
>> >>>>> There could be more than one reason where RegionTooBusyException is
>> >> thrown.
>> >>>>> Below are two (from HRegion):
>> >>>>>
>> >>>>> * We throw RegionTooBusyException if above memstore limit
>> >>>>> * and expect client to retry using some kind of backoff
>> >>>>> */
>> >>>>> private void checkResources()
>> >>>>>
>> >>>>> * Try to acquire a lock. Throw RegionTooBusyException
>> >>>>>
>> >>>>> * if failed to get the lock in time. Throw InterruptedIOException
>> >>>>>
>> >>>>> * if interrupted while waiting for the lock.
>> >>>>>
>> >>>>> */
>> >>>>>
>> >>>>> private void lock(final Lock lock, final int multiplier)
>> >>>>>
>> >>>>> How many tasks may write to this row concurrently ?
>> >>>>>
>> >>>>> Which 0.98 release are you using ?
>> >>>>>
>> >>>>> Cheers
>> >>>>>
>> >>>>> On Mon, Nov 10, 2014 at 11:10 AM, Brian Jeltema <
>> >>>>> [email protected]> wrote:
>> >>>>>
>> >>>>>> I’m running a map/reduce job against a table that is performing a
>> >> large
>> >>>>>> number of writes (probably updating every row).
>> >>>>>> The job is failing with the exception below. This is a solid
>> failure;
>> >> it
>> >>>>>> dies at the same point in the application,
>> >>>>>> and at the same row in the table. So I doubt it’s a conflict with
>> >>>>>> compaction (and the UI shows no compaction in progress),
>> >>>>>> or that there is a load-related cause.
>> >>>>>>
>> >>>>>> ‘hbase hbck’ does not report any inconsistencies. The
>> >>>>>> ‘waitForAllPreviousOpsAndReset’ leads me to suspect that
>> >>>>>> there is operation in progress that is hung and blocking the
>> update. I
>> >>>>>> don’t see anything suspicious in the HBase logs.
>> >>>>>> The data at the point of failure is not unusual, and is identical
>> to
>> >> many
>> >>>>>> preceding rows.
>> >>>>>> Does anybody have any ideas of what I should look for to find the
>> >> cause of
>> >>>>>> this RegionTooBusyException?
>> >>>>>>
>> >>>>>> This is Hadoop 2.4 and HBase 0.98.
>> >>>>>>
>> >>>>>> 14/11/10 13:46:13 INFO mapreduce.Job: Task Id :
>> >>>>>> attempt_1415210751318_0010_m_000314_1, Status : FAILED
>> >>>>>> Error:
>> >>>>>>
>> org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException:
>> >> Failed
>> >>>>>> 1744 actions: RegionTooBusyException: 1744 times,
>> >>>>>> at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:207)
>> >>>>>> at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$1700(AsyncProcess.java:187)
>> >>>>>> at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.AsyncProcess.waitForAllPreviousOpsAndReset(AsyncProcess.java:1568)
>> >>>>>> at
>> >>>>>>
>> >>
>> org.apache.hadoop.hbase.client.HTable.backgroundFlushCommits(HTable.java:1023)
>> >>>>>> at org.apache.hadoop.hbase.client.HTable.doPut(HTable.java:995)
>> >>>>>> at org.apache.hadoop.hbase.client.HTable.put(HTable.java:953)
>> >>>>>>
>> >>>>>> Brian
>> >>>>
>> >>>
>> >>
>> >>
>>
>>
>