Re: Bulk ingest losing tablet server

Anthony F Wed, 15 Jan 2014 05:55:25 -0800

Ok, makes sense.  So 1GB for native heap is reasonable?

Tablet server A was alive and well when looking in the monitor.  Those
'constraint violations' do not stop until after I've restarted all of the
tservers.



On Wed, Jan 15, 2014 at 8:49 AM, Eric Newton <[email protected]> wrote:

> When a tablet server (lets call it A) bulk imports a file, it makes a few
> bookkeeping entries in the !METADATA table. The tablet server that is
> serving the !METADATA table (lets call it B) checks a constraint: tablet
> server A must still have its zookeeper lock.  This constraint is being
> violated because A has lost its lock.
>
> Tablet server A should have died.
>
> The native map is used for live data ingest and exist outside of the java
> heap.  The caches live in the heap.
>
> -Eric
>
>
> On Wed, Jan 15, 2014 at 8:19 AM, Anthony F <[email protected]> wrote:
>
>> Just checked on the native mem maps . . . looks like it is set to 1GB.
>>  Do the index and data caches reside in native mem maps if available or is
>> native mem used for something else?
>>
>> I just repeated an ingest . . . this time I did not lose any tablet
>> servers but my logs are filling up with the following messages:
>>
>> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG:
>> violating metadata mutation : b;74~thf
>> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG:  update:
>> file:/b-00012bq/I00012cj.rf value 20272720,0,1389757684543
>> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG:  update:
>> loaded:/b-00012bq/I00012cj.rf value 2675766456963732003
>> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG:  update:
>> srv:time value M1389757684543
>> 2014-01-15 08:16:41,643 [constraints.MetadataConstraints] DEBUG:  update:
>> srv:lock value tservers/
>> 192.168.2.231:9997/zlock-0000000002$2438da698db13b4
>>
>>
>>
>> On Mon, Jan 13, 2014 at 2:44 PM, Sean Busbey 
>> <[email protected]>wrote:
>>
>>>
>>> On Mon, Jan 13, 2014 at 12:02 PM, Anthony F <[email protected]> wrote:
>>>
>>>> Yes, system swappiness is set to 0.  I'll run again and gather more
>>>> logs.
>>>>
>>>> Is there a zookeeper timeout setting that I can adjust to avoid this
>>>> issue and is that advisable?  Basically, the tservers are colocated with
>>>> HDFS datanodes and Hadoop nodemanagers.  The machines are overallocated in
>>>> terms of RAM.  So, I have a feeling that when a map-reduce job is kicked
>>>> off, it causes the tserver to page out to swap space.  Once the map-reduce
>>>> job finishes and the bulk ingest is kicked off, the tserver is paged back
>>>> in and the ZK timeout causes a shutdown.
>>>>
>>>>
>>>>
>>> You should not overallocate the amount of memory on the machines.
>>> Generally, you should provide memory limits under teh assumption that
>>> everything will be on at once.
>>>
>>> Many parts of Hadoop (not just Accumulo) will degrade or malfunction in
>>> the presence of memory swapping.
>>>
>>> How much of hte 12GB for Accumulo is for native memmaps?
>>>
>>
>>
>

Re: Bulk ingest losing tablet server

Reply via email to