Re: Failing to BulkIngest [SEC=UNOFFICIAL]

Eric Newton Thu, 20 Feb 2014 06:35:19 -0800

You can use the zkCli.sh utility to "rmr" /accumulo/xx.../fate and
/accumulo/xx.../table_locks, and then recreate those nodes.


-Eric



On Wed, Feb 19, 2014 at 5:58 PM, Dickson, Matt MR <
[email protected]> wrote:

>  *UNOFFICIAL*
> Thanks for your help on this Eric.
>
> I've started deleting the transactions by running the, ./accumulo
> ...fate.Admin delete <txid>, and notice this takes about 20 seconds per
> transaction.  With 7500 to delete this is going to take a long time (almost
> 2 days), so I tried running several threads each with a seperate range of
> id's to delete.  Unfortunately this seemed to have some contention and I
> kept recieving an InvocationTargetException .... Caused by
> zookeeper.KeeperException: KeeperErrorCode = noNode for
> /accumulo/xxxxx-xxxx-xxxx-xxxx/table_locks/3n/lock-xxxxxx
>
> When I go back to one thread this error disappears.
>
> Is there a better way to run this?
>
> Thanks in advance,
> Matt
>
>  ------------------------------
> *From:* Eric Newton [mailto:[email protected]]
> *Sent:* Wednesday, 19 February 2014 01:21
>
> *To:* [email protected]
> *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>
>  The "LeaseExpiredException" is part of the recovery process.  The master
> determines that a tablet server has lost its lock, or it is unresponsive
> and has been halted, possibly indirectly by removing the lock.
>
> The master then steals the write lease on the WAL file, which causes
> future writes to the WALog to fail.  The message you have seen is part of
> that failure.  You should have seen a tablet server failure associated with
> this message on the machine with <ip>.
>
> Having 50K FATE IN_PROGRESS lines is bad.  That is preventing your bulk
> imports from getting run.
>
> Are there any lines that show locked: [W:3n] ?  The other FATE
> transactions are waiting to get a READ lock on table id 3n.
>
> -Eric
>
>
>
> On Sun, Feb 16, 2014 at 7:59 PM, Dickson, Matt MR <
> [email protected]> wrote:
>
>> UNOFFICIAL
>>
>> Josh,
>>
>> Zookeepr - 3.4.5-cdh4.3.0
>> Accumulo - 1.5.0
>> Hadoop - cdh 4.3.0
>>
>> In the accumulo console getting
>>
>> ERROR RemoteException(...LeaseExpiredException): Lease mismatch on
>> /accumulo/wal/<ip>+9997/<uid> owned by DFSClient_NONMAPREDUCE_699577321_12
>> but is accessed by DFSClient_NONMAPREDUCE_903051502_12
>>
>> We can scan the table without issues and can load rows directly, ie not
>> using bulk import.
>>
>> A bit more information - we recently extended how we manage old tablets
>> in the system. We load data by date, creating splits for each day and then
>> ageoff using the ageoff filters.  This leaves empty tablets so we now merge
>> these old tablets together to effectively remove them.  I mention it
>> because I'm not sure if this might have introduced another issue.
>>
>> Matt
>>
>> -----Original Message-----
>> From: Josh Elser [mailto:[email protected]]
>> Sent: Monday, 17 February 2014 11:32
>> To: [email protected]
>> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>>
>> Matt,
>>
>> Can you provide Hadoop, ZK and Accumulo versions? Does the cluster appear
>> to be functional otherwise (can you scan that table you're bulk importing
>> to? any other errors on the monitor? etc)
>>
>> On 2/16/14, 7:07 PM, Dickson, Matt MR wrote:
>> > *UNOFFICIAL*
>> >
>> > I have a situation where bulk ingests are failing with a "Thread "shell"
>> > stuck on IO to xxx:9999:99999 ...
>> >  From the management console the table we are loading to has no
>> > compactions running, yet we ran "./accumulo
>> > org.apache.accumulo.server.fate.Admin print and can see 50,000 lines
>> > stating
>> > txid: xxxx     status:IN_PROGRESS op: CompactRange     locked: []
>> > locking: [R:3n]     top: Compact:Range
>> > Does this mean there are actually compactions running or old
>> > comapaction locks still hanging around that will be preventing the
>> builk ingest to run?
>> > Thanks in advance,
>> > Matt
>>
>
>

Re: Failing to BulkIngest [SEC=UNOFFICIAL]

Reply via email to