Re: Failing to BulkIngest [SEC=UNOFFICIAL]

Eric Newton Thu, 20 Feb 2014 12:32:39 -0800

No, xxx... is your instance id.  You can find it at the top of the monitor
page. It's the ugly UUID there.


-Eric



On Thu, Feb 20, 2014 at 3:26 PM, Dickson, Matt MR <
[email protected]> wrote:

>  *UNOFFICIAL*
> Is the xxx... the transaction id returned by the 'fate.Admin print'?
>
> Whats involved with recreating a node?
>
> Matt
>
>  ------------------------------
> *From:* Eric Newton [mailto:[email protected]]
> *Sent:* Friday, 21 February 2014 01:35
>
> *To:* [email protected]
> *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>
>  You can use the zkCli.sh utility to "rmr" /accumulo/xx.../fate and
> /accumulo/xx.../table_locks, and then recreate those nodes.
>
> -Eric
>
>
>
> On Wed, Feb 19, 2014 at 5:58 PM, Dickson, Matt MR <
> [email protected]> wrote:
>
>>  *UNOFFICIAL*
>> Thanks for your help on this Eric.
>>
>> I've started deleting the transactions by running the, ./accumulo
>> ...fate.Admin delete <txid>, and notice this takes about 20 seconds per
>> transaction.  With 7500 to delete this is going to take a long time (almost
>> 2 days), so I tried running several threads each with a seperate range of
>> id's to delete.  Unfortunately this seemed to have some contention and I
>> kept recieving an InvocationTargetException .... Caused by
>> zookeeper.KeeperException: KeeperErrorCode = noNode for
>> /accumulo/xxxxx-xxxx-xxxx-xxxx/table_locks/3n/lock-xxxxxx
>>
>> When I go back to one thread this error disappears.
>>
>> Is there a better way to run this?
>>
>> Thanks in advance,
>> Matt
>>
>>  ------------------------------
>> *From:* Eric Newton [mailto:[email protected]]
>> *Sent:* Wednesday, 19 February 2014 01:21
>>
>> *To:* [email protected]
>> *Subject:* Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>>
>>   The "LeaseExpiredException" is part of the recovery process.  The
>> master determines that a tablet server has lost its lock, or it is
>> unresponsive and has been halted, possibly indirectly by removing the lock.
>>
>> The master then steals the write lease on the WAL file, which causes
>> future writes to the WALog to fail.  The message you have seen is part of
>> that failure.  You should have seen a tablet server failure associated with
>> this message on the machine with <ip>.
>>
>> Having 50K FATE IN_PROGRESS lines is bad.  That is preventing your bulk
>> imports from getting run.
>>
>> Are there any lines that show locked: [W:3n] ?  The other FATE
>> transactions are waiting to get a READ lock on table id 3n.
>>
>> -Eric
>>
>>
>>
>> On Sun, Feb 16, 2014 at 7:59 PM, Dickson, Matt MR <
>> [email protected]> wrote:
>>
>>> UNOFFICIAL
>>>
>>> Josh,
>>>
>>> Zookeepr - 3.4.5-cdh4.3.0
>>> Accumulo - 1.5.0
>>> Hadoop - cdh 4.3.0
>>>
>>> In the accumulo console getting
>>>
>>> ERROR RemoteException(...LeaseExpiredException): Lease mismatch on
>>> /accumulo/wal/<ip>+9997/<uid> owned by DFSClient_NONMAPREDUCE_699577321_12
>>> but is accessed by DFSClient_NONMAPREDUCE_903051502_12
>>>
>>> We can scan the table without issues and can load rows directly, ie not
>>> using bulk import.
>>>
>>> A bit more information - we recently extended how we manage old tablets
>>> in the system. We load data by date, creating splits for each day and then
>>> ageoff using the ageoff filters.  This leaves empty tablets so we now merge
>>> these old tablets together to effectively remove them.  I mention it
>>> because I'm not sure if this might have introduced another issue.
>>>
>>> Matt
>>>
>>> -----Original Message-----
>>> From: Josh Elser [mailto:[email protected]]
>>> Sent: Monday, 17 February 2014 11:32
>>> To: [email protected]
>>> Subject: Re: Failing to BulkIngest [SEC=UNOFFICIAL]
>>>
>>> Matt,
>>>
>>> Can you provide Hadoop, ZK and Accumulo versions? Does the cluster
>>> appear to be functional otherwise (can you scan that table you're bulk
>>> importing to? any other errors on the monitor? etc)
>>>
>>> On 2/16/14, 7:07 PM, Dickson, Matt MR wrote:
>>> > *UNOFFICIAL*
>>> >
>>> > I have a situation where bulk ingests are failing with a "Thread
>>> "shell"
>>> > stuck on IO to xxx:9999:99999 ...
>>> >  From the management console the table we are loading to has no
>>> > compactions running, yet we ran "./accumulo
>>> > org.apache.accumulo.server.fate.Admin print and can see 50,000 lines
>>> > stating
>>> > txid: xxxx     status:IN_PROGRESS op: CompactRange     locked: []
>>> > locking: [R:3n]     top: Compact:Range
>>> > Does this mean there are actually compactions running or old
>>> > comapaction locks still hanging around that will be preventing the
>>> builk ingest to run?
>>> > Thanks in advance,
>>> > Matt
>>>
>>
>>
>

Re: Failing to BulkIngest [SEC=UNOFFICIAL]

Reply via email to