Ok, makes sense. So 1GB for native heap is reasonable? Tablet server A was alive and well when looking in the monitor. Those 'constraint violations' do not stop until after I've restarted all of the tservers.
On Wed, Jan 15, 2014 at 8:49 AM, Eric Newton <[email protected]> wrote: > When a tablet server (lets call it A) bulk imports a file, it makes a few > bookkeeping entries in the !METADATA table. The tablet server that is > serving the !METADATA table (lets call it B) checks a constraint: tablet > server A must still have its zookeeper lock. This constraint is being > violated because A has lost its lock. > > Tablet server A should have died. > > The native map is used for live data ingest and exist outside of the java > heap. The caches live in the heap. > > -Eric > > > On Wed, Jan 15, 2014 at 8:19 AM, Anthony F <[email protected]> wrote: > >> Just checked on the native mem maps . . . looks like it is set to 1GB. >> Do the index and data caches reside in native mem maps if available or is >> native mem used for something else? >> >> I just repeated an ingest . . . this time I did not lose any tablet >> servers but my logs are filling up with the following messages: >> >> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG: >> violating metadata mutation : b;74~thf >> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG: update: >> file:/b-00012bq/I00012cj.rf value 20272720,0,1389757684543 >> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG: update: >> loaded:/b-00012bq/I00012cj.rf value 2675766456963732003 >> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG: update: >> srv:time value M1389757684543 >> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG: update: >> srv:lock value tservers/ >> 192.168.2.231:9997/zlock-0000000002$2438da698db13b4 >> >> >> >> On Mon, Jan 13, 2014 at 2:44 PM, Sean Busbey >> <[email protected]>wrote: >> >>> >>> On Mon, Jan 13, 2014 at 12:02 PM, Anthony F <[email protected]> wrote: >>> >>>> Yes, system swappiness is set to 0. I'll run again and gather more >>>> logs. >>>> >>>> Is there a zookeeper timeout setting that I can adjust to avoid this >>>> issue and is that advisable? Basically, the tservers are colocated with >>>> HDFS datanodes and Hadoop nodemanagers. The machines are overallocated in >>>> terms of RAM. So, I have a feeling that when a map-reduce job is kicked >>>> off, it causes the tserver to page out to swap space. Once the map-reduce >>>> job finishes and the bulk ingest is kicked off, the tserver is paged back >>>> in and the ZK timeout causes a shutdown. >>>> >>>> >>>> >>> You should not overallocate the amount of memory on the machines. >>> Generally, you should provide memory limits under teh assumption that >>> everything will be on at once. >>> >>> Many parts of Hadoop (not just Accumulo) will degrade or malfunction in >>> the presence of memory swapping. >>> >>> How much of hte 12GB for Accumulo is for native memmaps? >>> >> >> >
