Re: Orphaned FATE Locks [SEC=UNOFFICIAL]

2016-09-21 Thread Keith Turner
On Wed, Sep 7, 2016 at 8:55 PM, Dickson, Matt MR
 wrote:
> UNOFFICIAL
>
>
> In Zookeeper there don't appear to be any locks with the same txid that is 
> listed via Accumulo.  However under /accumulo//table_locks/+default/ 
> there are the same number of files as orphaned locks labelled 
> 'lock-00xx', are these the locks I can delete?

Every table lock should have an associated fate transaction.   The
fate print tool has analysis[1] to look for situations when this is
not the case.  It seems like this analysis is finding locks w/o
transactions.   Looking at the code I see there is slight possibility
of race condition.  Its reads the list of locks into memory and then
reads the transactions.  So if a transaction completed and deleted
lock between those steps the tool could report a false positive.  If
you run the tool multiple times and it reports the same thing then
that race condition is not the cause.

Yes, you should delete the locks.  These orphaned locks can hold up
future table operations.  It would be nice to find out what caused
this.  You can grep for the fate txid related to the locks in the
master log to gather info about the fate tx that created the lock.

Do you know if the tables ids its complaining about as having locks
were deleted?  Do you know if fate transactions were deleted (manually
in ZK or using Accumulo's tool)?


[1]: 
https://github.com/apache/accumulo/blob/rel/1.8.0/fate/src/main/java/org/apache/accumulo/fate/AdminUtil.java#L69

>
> I should note that while investigating this there were no other fate 
> transactions being listed by Accumulo for this table, +default, so the system 
> was in a stable state.
>
>
> -Original Message-
>
>
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Thursday, 8 September 2016 01:07
> To: user@accumulo.apache.org
> Subject: Re: Orphaned FATE Locks [SEC=UNOFFICIAL]
>
> Hi Matt,
>
> What version of Accumulo are you using? Figuring out why those transactions 
> aren't automatically get removed is something else we would want to look into.
>
> It sounds like these transactions are just vestigial (not actually running), 
> so I wouldn't think that they would affect current bulk loads.
>
> I believe you could just stop the Master and remove the corresponding nodes 
> in ZooKeeper (as that's where the txns are stored and `fate print` is reading 
> from), but I would defer to Keith for confirmation first :)
>
> Dickson, Matt MR wrote:
>> *UNOFFICIAL*
>>
>> When running 'fate print -t IN_PROGRESS' to list fate transactions
>> there are approximately eight orphaned locks listed as:
>> txid: xxx locked: [R:+default] I'm looking into these
>> because bulk ingests are failing and there are a lot of CopyFailed
>> transactions in the fate lock list. Could these orphaned locks block
>> further bulk ingests and is there a way to kill them?
>> When I run 'fate fail ' it states there is no fate transaction
>> associated with the transaction id.
>> Thanks advance,
>> Matt


RE: Orphaned FATE Locks [SEC=UNOFFICIAL]

2016-09-20 Thread Dickson, Matt MR
UNOFFICIAL

Josh,

Have had an opportunity to run this by Keith?


-Original Message-
From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Thursday, 8 September 2016 12:21
To: user@accumulo.apache.org
Subject: Re: Orphaned FATE Locks [SEC=UNOFFICIAL]

Ah, this would be why I deferred to Keith. I apparently am not as knowledgable 
as I thought :)

I'll try to catch him in IRC tmrw and see if we can get you an answer. 
Otherwise, I'll have to go digging into code to try to figure out an answer.

Dickson, Matt MR wrote:
> UNOFFICIAL
>
>
> In Zookeeper there don't appear to be any locks with the same txid that is 
> listed via Accumulo.  However under /accumulo//table_locks/+default/ 
> there are the same number of files as orphaned locks labelled 
> 'lock-00xx', are these the locks I can delete?
>
> I should note that while investigating this there were no other fate 
> transactions being listed by Accumulo for this table, +default, so the system 
> was in a stable state.
>
>
> -Original Message-
>
>
> From: Josh Elser [mailto:josh.el...@gmail.com]
> Sent: Thursday, 8 September 2016 01:07
> To: user@accumulo.apache.org
> Subject: Re: Orphaned FATE Locks [SEC=UNOFFICIAL]
>
> Hi Matt,
>
> What version of Accumulo are you using? Figuring out why those transactions 
> aren't automatically get removed is something else we would want to look into.
>
> It sounds like these transactions are just vestigial (not actually running), 
> so I wouldn't think that they would affect current bulk loads.
>
> I believe you could just stop the Master and remove the corresponding 
> nodes in ZooKeeper (as that's where the txns are stored and `fate 
> print` is reading from), but I would defer to Keith for confirmation 
> first :)
>
> Dickson, Matt MR wrote:
>> *UNOFFICIAL*
>>
>> When running 'fate print -t IN_PROGRESS' to list fate transactions 
>> there are approximately eight orphaned locks listed as:
>> txid: xxx locked: [R:+default] I'm looking into these 
>> because bulk ingests are failing and there are a lot of CopyFailed 
>> transactions in the fate lock list. Could these orphaned locks block 
>> further bulk ingests and is there a way to kill them?
>> When I run 'fate fail ' it states there is no fate 
>> transaction associated with the transaction id.
>> Thanks advance,
>> Matt


Re: Orphaned FATE Locks [SEC=UNOFFICIAL]

2016-09-07 Thread Josh Elser
Ah, this would be why I deferred to Keith. I apparently am not as 
knowledgable as I thought :)


I'll try to catch him in IRC tmrw and see if we can get you an answer. 
Otherwise, I'll have to go digging into code to try to figure out an answer.


Dickson, Matt MR wrote:

UNOFFICIAL


In Zookeeper there don't appear to be any locks with the same txid that is 
listed via Accumulo.  However under /accumulo//table_locks/+default/ 
there are the same number of files as orphaned locks labelled 'lock-00xx', 
are these the locks I can delete?

I should note that while investigating this there were no other fate 
transactions being listed by Accumulo for this table, +default, so the system 
was in a stable state.


-Original Message-


From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Thursday, 8 September 2016 01:07
To: user@accumulo.apache.org
Subject: Re: Orphaned FATE Locks [SEC=UNOFFICIAL]

Hi Matt,

What version of Accumulo are you using? Figuring out why those transactions 
aren't automatically get removed is something else we would want to look into.

It sounds like these transactions are just vestigial (not actually running), so 
I wouldn't think that they would affect current bulk loads.

I believe you could just stop the Master and remove the corresponding nodes in 
ZooKeeper (as that's where the txns are stored and `fate print` is reading 
from), but I would defer to Keith for confirmation first :)

Dickson, Matt MR wrote:

*UNOFFICIAL*

When running 'fate print -t IN_PROGRESS' to list fate transactions
there are approximately eight orphaned locks listed as:
txid: xxx locked: [R:+default] I'm looking into these
because bulk ingests are failing and there are a lot of CopyFailed
transactions in the fate lock list. Could these orphaned locks block
further bulk ingests and is there a way to kill them?
When I run 'fate fail ' it states there is no fate transaction
associated with the transaction id.
Thanks advance,
Matt


RE: Orphaned FATE Locks [SEC=UNOFFICIAL]

2016-09-07 Thread Dickson, Matt MR
UNOFFICIAL

 
In Zookeeper there don't appear to be any locks with the same txid that is 
listed via Accumulo.  However under /accumulo//table_locks/+default/ 
there are the same number of files as orphaned locks labelled 'lock-00xx', 
are these the locks I can delete?  

I should note that while investigating this there were no other fate 
transactions being listed by Accumulo for this table, +default, so the system 
was in a stable state.


-Original Message-


From: Josh Elser [mailto:josh.el...@gmail.com]
Sent: Thursday, 8 September 2016 01:07
To: user@accumulo.apache.org
Subject: Re: Orphaned FATE Locks [SEC=UNOFFICIAL]

Hi Matt,

What version of Accumulo are you using? Figuring out why those transactions 
aren't automatically get removed is something else we would want to look into.

It sounds like these transactions are just vestigial (not actually running), so 
I wouldn't think that they would affect current bulk loads.

I believe you could just stop the Master and remove the corresponding nodes in 
ZooKeeper (as that's where the txns are stored and `fate print` is reading 
from), but I would defer to Keith for confirmation first :)

Dickson, Matt MR wrote:
> *UNOFFICIAL*
>
> When running 'fate print -t IN_PROGRESS' to list fate transactions 
> there are approximately eight orphaned locks listed as:
> txid: xxx locked: [R:+default] I'm looking into these 
> because bulk ingests are failing and there are a lot of CopyFailed 
> transactions in the fate lock list. Could these orphaned locks block 
> further bulk ingests and is there a way to kill them?
> When I run 'fate fail ' it states there is no fate transaction 
> associated with the transaction id.
> Thanks advance,
> Matt


Re: Orphaned FATE Locks [SEC=UNOFFICIAL]

2016-09-07 Thread Josh Elser

Hi Matt,

What version of Accumulo are you using? Figuring out why those 
transactions aren't automatically get removed is something else we would 
want to look into.


It sounds like these transactions are just vestigial (not actually 
running), so I wouldn't think that they would affect current bulk loads.


I believe you could just stop the Master and remove the corresponding 
nodes in ZooKeeper (as that's where the txns are stored and `fate print` 
is reading from), but I would defer to Keith for confirmation first :)


Dickson, Matt MR wrote:

*UNOFFICIAL*

When running 'fate print -t IN_PROGRESS' to list fate transactions there
are approximately eight orphaned locks listed as:
txid: xxx locked: [R:+default]
I'm looking into these because bulk ingests are failing and there are a
lot of CopyFailed transactions in the fate lock list. Could these
orphaned locks block further bulk ingests and is there a way to kill them?
When I run 'fate fail ' it states there is no fate transaction
associated with the transaction id.
Thanks advance,
Matt