UNOFFICIAL Is the xxx... the transaction id returned by the 'fate.Admin print'?
Whats involved with recreating a node? Matt ________________________________ From: Eric Newton [mailto:[email protected]] Sent: Friday, 21 February 2014 01:35 To: [email protected] Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL] You can use the zkCli.sh utility to "rmr" /accumulo/xx.../fate and /accumulo/xx.../table_locks, and then recreate those nodes. -Eric On Wed, Feb 19, 2014 at 5:58 PM, Dickson, Matt MR <[email protected]<mailto:[email protected]>> wrote: UNOFFICIAL Thanks for your help on this Eric. I've started deleting the transactions by running the, ./accumulo ...fate.Admin delete <txid>, and notice this takes about 20 seconds per transaction. With 7500 to delete this is going to take a long time (almost 2 days), so I tried running several threads each with a seperate range of id's to delete. Unfortunately this seemed to have some contention and I kept recieving an InvocationTargetException .... Caused by zookeeper.KeeperException: KeeperErrorCode = noNode for /accumulo/xxxxx-xxxx-xxxx-xxxx/table_locks/3n/lock-xxxxxx When I go back to one thread this error disappears. Is there a better way to run this? Thanks in advance, Matt ________________________________ From: Eric Newton [mailto:[email protected]<mailto:[email protected]>] Sent: Wednesday, 19 February 2014 01:21 To: [email protected]<mailto:[email protected]> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL] The "LeaseExpiredException" is part of the recovery process. The master determines that a tablet server has lost its lock, or it is unresponsive and has been halted, possibly indirectly by removing the lock. The master then steals the write lease on the WAL file, which causes future writes to the WALog to fail. The message you have seen is part of that failure. You should have seen a tablet server failure associated with this message on the machine with <ip>. Having 50K FATE IN_PROGRESS lines is bad. That is preventing your bulk imports from getting run. Are there any lines that show locked: [W:3n] ? The other FATE transactions are waiting to get a READ lock on table id 3n. -Eric On Sun, Feb 16, 2014 at 7:59 PM, Dickson, Matt MR <[email protected]<mailto:[email protected]>> wrote: UNOFFICIAL Josh, Zookeepr - 3.4.5-cdh4.3.0 Accumulo - 1.5.0 Hadoop - cdh 4.3.0 In the accumulo console getting ERROR RemoteException(...LeaseExpiredException): Lease mismatch on /accumulo/wal/<ip>+9997/<uid> owned by DFSClient_NONMAPREDUCE_699577321_12 but is accessed by DFSClient_NONMAPREDUCE_903051502_12 We can scan the table without issues and can load rows directly, ie not using bulk import. A bit more information - we recently extended how we manage old tablets in the system. We load data by date, creating splits for each day and then ageoff using the ageoff filters. This leaves empty tablets so we now merge these old tablets together to effectively remove them. I mention it because I'm not sure if this might have introduced another issue. Matt -----Original Message----- From: Josh Elser [mailto:[email protected]<mailto:[email protected]>] Sent: Monday, 17 February 2014 11:32 To: [email protected]<mailto:[email protected]> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL] Matt, Can you provide Hadoop, ZK and Accumulo versions? Does the cluster appear to be functional otherwise (can you scan that table you're bulk importing to? any other errors on the monitor? etc) On 2/16/14, 7:07 PM, Dickson, Matt MR wrote: > *UNOFFICIAL* > > I have a situation where bulk ingests are failing with a "Thread "shell" > stuck on IO to xxx:9999:99999 ... > From the management console the table we are loading to has no > compactions running, yet we ran "./accumulo > org.apache.accumulo.server.fate.Admin print and can see 50,000 lines > stating > txid: xxxx status:IN_PROGRESS op: CompactRange locked: [] > locking: [R:3n] top: Compact:Range > Does this mean there are actually compactions running or old > comapaction locks still hanging around that will be preventing the builk > ingest to run? > Thanks in advance, > Matt
