>
> On Tue, Aug 16, 2016 at 6:12 AM, Sterfield <[email protected]> wrote:
> > >
> > > This is a well-known issue over in tsdb-land. IIRC, they are working on
> > > alternative to the once-an-hour compression. See what they say over
> there
> > > Guillaume.
> > > Thanks,
> > > St.Ack
> >
> >
> > Thanks for the tips. I'll check on OpenTSDB side and come back here with
> > what I'll find.
> >
> > I have one last question : How could I handle the burst generated by the
> > OpenTSDB compaction ?
> >
> > The OpenTSDB log has some line like :
> >
> > 12:04:56.586 ERROR [CompactionQueue.call] - Failed to write a row to
> > re-compact
> > org.hbase.async.RemoteException:
> > org.apache.hadoop.hbase.RegionTooBusyException:
> > Above memstore limit, regionName=tsdb,\x00\x03\xD9W\
> > xAD\x82\x00\x00\x00\x01\x00\x00T\x00\x00\x0A\x00\x008\x00\
> > x00\x0B\x00\x009\x00\x00\x0C\x00\x00
> > 5,1471090649103.b833cb8fdceff5cd21887aa9ff11e7bc.,
> > server=ip-10-200-0-6.eu-west-1.campaign.aws,16020,1471255225311,
> > memstoreSize=1098822960, blockingMemStoreSize=1073741824
> >
> > Indeed, the memstore limit was reached (256MB x 4), hence the error.
> > However, the fact that the Hbase was not able to flush the memstore is a
> > bit concerning.
> >
>
> You have evidence it did not? Below in this note, it seems that the RS did
> flush?


No you are right, it may not be "it didn't flushed" but maybe more a "it
was flushing but did not have the time to complete".


> > On the corresponding RS, at the same time, there's a message about a big
> > flush, but not with so much memory in the memstore. Also, I don't see any
> > warning that could explain why the memstore grew so large (nothing about
> > the fact that there's too many hfiles to compact, for example)
> >
> >
> HBase keeps writing the memstore till it trips a lower limit. It then moves
> to try and flush the region taking writes all the time until it hits an
> upper limit at which time it stops taking on writes until the flush
> completes to prevent running out of memory. If the RS is under load, it may
> take a while to flush out the memstore causing us to hit the upper bound.
> Is that what is going on here?


 What I saw from the documentation + research on the Internet is :

   - memstore is growing up to the memstore size limit (here 256MB)
   - It could flush earlier if :
      - the total amount of space taken by all the memstore hit the lower
      limit (0.35 by default), or the upper limit (0.40 by default). If it
      reaches upper limit, writes are blocked
      - under "memory pressure", meaning that there's no more memory
      available
   - the "multiplier" allowed a memstore to grow up to a certain point
   (here x4), apparently in various cases :
      - when there's too many hfiles generated and that a compaction that
      must happened
      - apparently under lots of load.

Back to your question, yes, the server is being load-tested at the moment,
so I'm constantly writing 150k rows through OpenTSDB as we speak


> > 2016-08-16 12:04:57,752 INFO  [MemStoreFlusher.0] regionserver.HRegion:
> > Finished memstore flush of ~821.25 MB/861146920, currentsize=226.67
> > MB/237676040 for region
> > tsdb,\x00\x03\xD9W\xAD\x82\x00\x00\x00\x01\x00\x00T\x00\
> > x00\x0A\x00\x008\x00\x00\x0B\x00\x009\x00\x00\x0C\x00\x005,
> 1471090649103.
> > b833cb8fdceff5cd21887aa9ff11e7bc.
> > in 13449ms, sequenceid=11332624, compaction requested=true
> >
> > So, what could explain this amount of memory taken by the memstore, and
> how
> > I could handle such situation ?
> >
> >
> You are taking on a lot of writes? The server is under load? Are lots of
> regions concurrently flushing? You could up flushing thread count from
> default 1. You could up the multiplier so more runway for the flush to
> complete within (x6 instead of x4), etc.


Thanks for the flushing thread count, never heard about that !
I could indeed increase the multiplier. I'll try that as well.

Guillaume

Reply via email to