stop-all probably won't work. I'm suggesting a cluster-wide kill of all tablet servers:
$ pssh -h conf/slaves pkill -f =tserve[r] # <--- requires parallel ssh to be installed On the master host: $ pkill -f =master Wait for the master lock to expire (typically 30 seconds), and kill all the fate transactions: $ ./bin/accumulo org.apache.accumulo.server.fate.Admin kill "<txid>" Then do a start-all and cross your fingers. :-) -Eric On Thu, Sep 5, 2013 at 9:27 AM, Losco, Jason [USA] <[email protected]>wrote: > Thanks for the quick response. I issued the command to take those > offline, however, they were locked up due to the other threads so it didn’t > take. How do I go about deleting those fate transactions? Fate delete and > fate fail do not work from the shell. Are you suggesting a stop-all of > accumulo, then running something using the actual AdminUtil class to kill > those transactions? Any input into how to kick off that process would be > greatly appreciated.**** > > ** ** > > losco**** > > ** ** > > *From:* Eric Newton [mailto:[email protected]] > *Sent:* Thursday, September 05, 2013 9:18 AM > *To:* [email protected] > *Subject:* [External] Re: locked fate threads**** > > ** ** > > I can't believe I posted a note about using deletemany on the !METADATA > table! That was pretty reckless of me.**** > > ** ** > > If you really deleted your table data doing this, and your table was > online at the time, you need to restart your cluster.**** > > ** ** > > That alone might fix the problem. Otherwise, you are going to need to > kill the master, delete the fate transactions, restart the master, and > properly delete the tables.**** > > ** ** > > -Eric**** > > ** ** > > On Thu, Sep 5, 2013 at 8:00 AM, Losco, Jason [USA] <[email protected]> > wrote:**** > > I recently tried to remove some tables, during which I was getting a shell > thread stuck on IO error. A fate print plus some digging into the logs > revealed they were stuck waiting on WAL resources. I found a thread in > which Eric Newton explained how to manually remove the tables removing > lines from the !METADATA table using “deletemany –c file,” then cleaning up > the /accumulo/tables/<id> in hdfs. I’ve done that, however the fate > threads are still locked and I am unable to delete or fail them. > Additionally, the tables I removed from !METADATA and hdfs still appear in > the list returned by the “tables” command in shell. Below is the result of > a “fate print.” To note, tables id a and b are the two which I’ve removed. > **** > > **** > > test@c4s> fate print**** > > txid: 4136e024209602eb status: IN_PROGRESS op: ChangeTableState > locked: [] locking: [W:b] top: ChangeTableState**** > > txid: 439193592e93e230 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:b] top: TableRangeOp**** > > txid: 1576dca47dfa2c65 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:b] top: TableRangeOp**** > > txid: 3ee6232db200f2c7 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:b] top: TableRangeOp**** > > txid: 19e5d3349679ff6e status: IN_PROGRESS op: TableRangeOp > locked: [W:a] locking: [] top: TableRangeOpWait**** > > txid: 29204be9d141dc88 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:b] top: TableRangeOp**** > > txid: 7d07c50ceb5ac487 status: IN_PROGRESS op: DeleteTable > locked: [] locking: [W:b] top: DeleteTable**** > > txid: 72895b4b1a5a1640 status: IN_PROGRESS op: DeleteTable > locked: [] locking: [W:b] top: DeleteTable**** > > txid: 6902bcb06c4f5ae7 status: IN_PROGRESS op: DeleteTable > locked: [] locking: [W:b] top: DeleteTable**** > > txid: 08db2316eb783ba1 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:b] top: TableRangeOp**** > > txid: 6b0b135ca643b709 status: IN_PROGRESS op: TableRangeOp > locked: [] locking: [W:b] top: TableRangeOp**** > > txid: 0e174c9af5092e54 status: IN_PROGRESS op: TableRangeOp > locked: [W:b] locking: [] top: TableRangeOpWait**** > > 12 transactions**** > > **** > > Thanks in advance for your help.**** > > **** > > losco**** > > **** > > ** ** >
