Re: [External] Re: locked fate threads

Eric Newton Thu, 05 Sep 2013 06:52:02 -0700

stop-all probably won't work.  I'm suggesting a cluster-wide kill of all
tablet servers:


$ pssh -h conf/slaves pkill -f =tserve[r]   # <--- requires parallel ssh to
be installed

On the master host:

$ pkill -f =master

Wait for the master lock to expire (typically 30 seconds), and kill all the
fate transactions:

$ ./bin/accumulo org.apache.accumulo.server.fate.Admin kill "<txid>"

Then do a start-all and cross your fingers. :-)

-Eric


On Thu, Sep 5, 2013 at 9:27 AM, Losco, Jason [USA] <[email protected]>wrote:

>  Thanks for the quick response.  I issued the command to take those
> offline, however, they were locked up due to the other threads so it didn’t
> take.  How do I go about deleting those fate transactions?  Fate delete and
> fate fail do not work from the shell.  Are you suggesting a stop-all of
> accumulo, then running something using the actual AdminUtil class to kill
> those transactions?  Any input into how to kick off that process would be
> greatly appreciated.****
>
> ** **
>
> losco****
>
> ** **
>
> *From:* Eric Newton [mailto:[email protected]]
> *Sent:* Thursday, September 05, 2013 9:18 AM
> *To:* [email protected]
> *Subject:* [External] Re: locked fate threads****
>
> ** **
>
> I can't believe I posted a note about using deletemany on the !METADATA
> table!  That was pretty reckless of me.****
>
> ** **
>
> If you really deleted your table data doing this, and your table was
> online at the time, you need to restart your cluster.****
>
> ** **
>
> That alone might fix the problem.  Otherwise, you are going to need to
> kill the master, delete the fate transactions, restart the master, and
> properly delete the tables.****
>
> ** **
>
> -Eric****
>
> ** **
>
> On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <[email protected]>
> wrote:****
>
> I recently tried to remove some tables, during which I was getting a shell
> thread stuck on IO error.  A fate print plus some digging into the logs
> revealed they were stuck waiting on WAL resources.  I found a thread in
> which Eric Newton explained how to manually remove the tables removing
> lines from the !METADATA table using “deletemany –c file,” then cleaning up
> the /accumulo/tables/<id> in hdfs.  I’ve done that, however the fate
> threads are still locked and I am unable to delete or fail them.
> Additionally, the tables I removed from !METADATA and hdfs still appear in
> the list returned by the “tables” command in shell.  Below is the result of
> a “fate print.”  To note, tables id a and b are the two which I’ve removed.
> ****
>
>  ****
>
> test@c4s> fate print****
>
> txid: 4136e024209602eb  status: IN_PROGRESS         op: ChangeTableState
> locked: []              locking: [W:b]           top: ChangeTableState****
>
> txid: 439193592e93e230  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 1576dca47dfa2c65  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 3ee6232db200f2c7  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 19e5d3349679ff6e  status: IN_PROGRESS         op: TableRangeOp
> locked: [W:a]           locking: []              top: TableRangeOpWait****
>
> txid: 29204be9d141dc88  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 7d07c50ceb5ac487  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 72895b4b1a5a1640  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 6902bcb06c4f5ae7  status: IN_PROGRESS         op: DeleteTable
> locked: []              locking: [W:b]           top: DeleteTable****
>
> txid: 08db2316eb783ba1  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 6b0b135ca643b709  status: IN_PROGRESS         op: TableRangeOp
> locked: []              locking: [W:b]           top: TableRangeOp****
>
> txid: 0e174c9af5092e54  status: IN_PROGRESS         op: TableRangeOp
> locked: [W:b]           locking: []              top: TableRangeOpWait****
>
> 12 transactions****
>
>  ****
>
> Thanks in advance for your help.****
>
>  ****
>
> losco****
>
>  ****
>
> ** **
>

Re: [External] Re: locked fate threads

Reply via email to