You can use the zkCli.sh utility to "rmr" /accumulo/xx.../fate and /accumulo/xx.../table_locks, and then recreate those nodes.
-Eric On Wed, Feb 19, 2014 at 5:58 PM, Dickson, Matt MR < [email protected]> wrote: > *UNOFFICIAL* > Thanks for your help on this Eric. > > I've started deleting the transactions by running the, ./accumulo > ...fate.Admin delete <txid>, and notice this takes about 20 seconds per > transaction. With 7500 to delete this is going to take a long time (almost > 2 days), so I tried running several threads each with a seperate range of > id's to delete. Unfortunately this seemed to have some contention and I > kept recieving an InvocationTargetException .... Caused by > zookeeper.KeeperException: KeeperErrorCode = noNode for > /accumulo/xxxxx-xxxx-xxxx-xxxx/table_locks/3n/lock-xxxxxx > > When I go back to one thread this error disappears. > > Is there a better way to run this? > > Thanks in advance, > Matt > > ------------------------------ > *From:* Eric Newton [mailto:[email protected]] > *Sent:* Wednesday, 19 February 2014 01:21 > > *To:* [email protected] > *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL] > > The "LeaseExpiredException" is part of the recovery process. The master > determines that a tablet server has lost its lock, or it is unresponsive > and has been halted, possibly indirectly by removing the lock. > > The master then steals the write lease on the WAL file, which causes > future writes to the WALog to fail. The message you have seen is part of > that failure. You should have seen a tablet server failure associated with > this message on the machine with <ip>. > > Having 50K FATE IN_PROGRESS lines is bad. That is preventing your bulk > imports from getting run. > > Are there any lines that show locked: [W:3n] ? The other FATE > transactions are waiting to get a READ lock on table id 3n. > > -Eric > > > > On Sun, Feb 16, 2014 at 7:59 PM, Dickson, Matt MR < > [email protected]> wrote: > >> UNOFFICIAL >> >> Josh, >> >> Zookeepr - 3.4.5-cdh4.3.0 >> Accumulo - 1.5.0 >> Hadoop - cdh 4.3.0 >> >> In the accumulo console getting >> >> ERROR RemoteException(...LeaseExpiredException): Lease mismatch on >> /accumulo/wal/<ip>+9997/<uid> owned by DFSClient_NONMAPREDUCE_699577321_12 >> but is accessed by DFSClient_NONMAPREDUCE_903051502_12 >> >> We can scan the table without issues and can load rows directly, ie not >> using bulk import. >> >> A bit more information - we recently extended how we manage old tablets >> in the system. We load data by date, creating splits for each day and then >> ageoff using the ageoff filters. This leaves empty tablets so we now merge >> these old tablets together to effectively remove them. I mention it >> because I'm not sure if this might have introduced another issue. >> >> Matt >> >> -----Original Message----- >> From: Josh Elser [mailto:[email protected]] >> Sent: Monday, 17 February 2014 11:32 >> To: [email protected] >> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL] >> >> Matt, >> >> Can you provide Hadoop, ZK and Accumulo versions? Does the cluster appear >> to be functional otherwise (can you scan that table you're bulk importing >> to? any other errors on the monitor? etc) >> >> On 2/16/14, 7:07 PM, Dickson, Matt MR wrote: >> > *UNOFFICIAL* >> > >> > I have a situation where bulk ingests are failing with a "Thread "shell" >> > stuck on IO to xxx:9999:99999 ... >> > From the management console the table we are loading to has no >> > compactions running, yet we ran "./accumulo >> > org.apache.accumulo.server.fate.Admin print and can see 50,000 lines >> > stating >> > txid: xxxx status:IN_PROGRESS op: CompactRange locked: [] >> > locking: [R:3n] top: Compact:Range >> > Does this mean there are actually compactions running or old >> > comapaction locks still hanging around that will be preventing the >> builk ingest to run? >> > Thanks in advance, >> > Matt >> > >
