Re: data loss around splits when tserver goes down

Eric Newton Mon, 27 Jan 2014 06:26:15 -0800

https://issues.apache.org/jira/browse/ACCUMULO-2261


I'll make my comments over on the ticket.  Thanks for reporting!

-Eric



On Mon, Jan 27, 2014 at 9:09 AM, Anthony F <[email protected]> wrote:

> Eric, I'm currently digging through the logs and will report back.  Keep
> in mind, I'm using Accumulo 1.5.0 on a Hadoop 2.2.0 stack.  To determine
> data loss, I have a 'ConsistencyCheckingIterator' that verifies each row
> has the expected data (it takes a long time to scan the whole table).
>  Below is a quick summary of what happened.  The tablet in question is
> "d;72~gcm~201304".  Notice that it is assigned to 
> 192.168.2.233:9997[343bc1fa155242c]
> at 2014-01-25 09:49:36,233.  At 2014-01-25 09:49:54,141, the tserver goes
> away.  Then, the tablet gets assigned to 192.168.2.223:9997[143bc1f14412432]
> and shortly after that, I see the BadLocationStateException.  The master
> never recovers from the BLSE - I have to manually delete one of the
> offending locations.
>
> 2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning
> tablet d;72~gcm~201304;72=192.168.2.233:9997[343bc1fa155242c]
> 2014-01-25 09:49:36,233 [master.Master] DEBUG: Normal Tablets assigning
> tablet p;18~thm~2012101;18=192.168.2.233:9997[343bc1fa155242c]
> 2014-01-25 09:49:54,141 [master.Master] WARN : Lost servers
> [192.168.2.233:9997[343bc1fa155242c]]
> 2014-01-25 09:49:56,866 [master.Master] DEBUG: 42 assigned to dead
> servers: 
> [d;03~u36~201302;03~thm~2012091@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;06~u36~2013;06~thm~2012083@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;25;24~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;25~u36~201303;25~thm~201209@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;27~gcm~2013041;27@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;30~u36~2013031;30~thm~2012082@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;34~thm;34~gcm~2013022@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;39~thm~20121;39~gcm~20130418@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;41~thm;41~gcm~2013041@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;42~u36~201304;42~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;45~thm~201208;45~gcm~201303@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;48~gcm~2013052;48@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;60~u36~2013021;60~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;68~gcm~2013041;68@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;72;71~u36~2013@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;72~gcm~201304;72@(192.168.2.233:9997[343bc1fa155242c],null,null),
> d;75~thm~2012101;75~gcm~2013032@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;78;77~u36~201305@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;90~u36~2013032;90~thm~2012092@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;91~thm;91~gcm~201304@(null,192.168.2.233:9997[343bc1fa155242c],null),
> d;93~u36~2013012;93~thm~20121@(null,192.168.2.233:9997[343bc1fa155242c],null),
> m;20;19@(null,192.168.2.233:9997[343bc1fa155242c],null), m;38;37@
> (null,192.168.2.233:9997[343bc1fa155242c],null), m;51;50@
> (null,192.168.2.233:9997[343bc1fa155242c],null), m;60;59@
> (null,192.168.2.233:9997[343bc1fa155242c],null), m;92;91@
> (null,192.168.2.233:9997[343bc1fa155242c],null),
> o;01<@(null,192.168.2.233:9997[343bc1fa155242c],null), o;04;03@
> (null,192.168.2.233:9997[343bc1fa155242c],null), o;50;49@
> (null,192.168.2.233:9997[343bc1fa155242c],null), o;63;62@
> (null,192.168.2.233:9997[343bc1fa155242c],null), o;74;73@
> (null,192.168.2.233:9997[343bc1fa155242c],null), o;97;96@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;08~thm~20121;08@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;09~thm~20121;09@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;10;09~thm~20121@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;18~thm~2012101;18@
> (192.168.2.233:9997[343bc1fa155242c],null,null), p;21;20~thm~201209@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;22~thm~2012091;22@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;23;22~thm~2012091@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;41~thm~2012111;41@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;42;41~thm~2012111@
> (null,192.168.2.233:9997[343bc1fa155242c],null), p;58~thm~201208;58@
> (null,192.168.2.233:9997[343bc1fa155242c],null)]...
> 2014-01-25 09:49:59,706 [master.Master] DEBUG: Normal Tablets assigning
> tablet d;72~gcm~201304;72=192.168.2.223:9997[143bc1f14412432]
> 2014-01-25 09:50:13,515 [master.EventCoordinator] INFO : tablet
> d;72~gcm~201304;72 was loaded on 192.168.2.223:9997
> 2014-01-25 09:51:20,058 [state.MetaDataTableScanner] ERROR:
> java.lang.RuntimeException:
> org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException:
> found two locations for the same extent d;72~gcm~201304: 
> 192.168.2.223:9997[143bc1f14412432]
> and 192.168.2.233:9997[343bc1fa155242c]
> java.lang.RuntimeException:
> org.apache.accumulo.server.master.state.TabletLocationState$BadLocationStateException:
> found two locations for the same extent d;72~gcm~201304: 
> 192.168.2.223:9997[143bc1f14412432]
> and 192.168.2.233:9997[343bc1fa155242c]
>
>
>
> On Mon, Jan 27, 2014 at 8:53 AM, Eric Newton <[email protected]>wrote:
>
>> Having two "last" locations... is annoying, and useless.  Having two
>> "loc" locations is disastrous.  We do a *lot* of testing that verifies that
>> data is not lost, with live ingest and with bulk ingest, and just about
>> every other condition you can imagine.  Presently, this testing is being
>> done by me for 1.6.0 on Hadoop 2.2.0 and ZK 3.4.5.
>>
>> If you can provide any of the following, it would be helpful:
>>
>> * an automated test case that demonstrates the problem
>> * logs that document what happened
>> * a description of the *exact* things you did to detect data loss
>>
>> Please don't use the approximate counts displayed on the monitor pages to
>> confirm ingest.  These are known to be incorrect with both bulk ingested
>> data and right after splits.  The data is there, but the counts are just
>> estimates.
>>
>> If you find you have verified data loss, please open a ticket, and
>> provide as many details as you can, even if it does not happen consistently.
>>
>> Thanks!
>>
>> -Eric
>>
>>
>>
>> On Mon, Jan 27, 2014 at 7:57 AM, Anthony F <[email protected]> wrote:
>>
>>> I took a look in the code . . . the stack trace is not quite the same.
>>>  In 1.6.0, the fixed issue related to METADATA_LAST_LOCATION_COLUMN_FAMILY.
>>>  The issue I am seeing (in 1.5.0) is related to
>>> METADATA_CURRENT_LOCATION_COLUMN_FAMILY (line 144).
>>>
>>>
>>> On Sun, Jan 26, 2014 at 7:00 PM, Anthony F <[email protected]> wrote:
>>>
>>>> The stack trace is pretty close and the steps to reproduce match the
>>>> scenario in which I observed the issue.  But there's no fix (in Jira)
>>>> against 1.5.0, just 1.6.0.
>>>>
>>>>
>>>> On Sun, Jan 26, 2014 at 5:56 PM, Josh Elser <[email protected]>wrote:
>>>>
>>>>> Just because the error message is the same doesn't mean that the root
>>>>> cause is also the same.
>>>>>
>>>>> Without looking more into Eric's changes, I'm not sure if
>>>>> ACCUMULO-2057 would also affect 1.5.0. We're usually pretty good about
>>>>> checking backwards when bugs are found in newer versions, but things slip
>>>>> through the cracks, too.
>>>>>
>>>>>
>>>>> On 1/26/2014 5:09 PM, Anthony F wrote:
>>>>>
>>>>>> This is pretty much the issue:
>>>>>>
>>>>>> https://issues.apache.org/jira/browse/ACCUMULO-2057
>>>>>>
>>>>>> Slightly different error message but it's a different version.  Looks
>>>>>> like its fixed in 1.6.0.  I'll probably need to upgrade.
>>>>>>
>>>>>>
>>>>>> On Sun, Jan 26, 2014 at 4:47 PM, Anthony F <[email protected]
>>>>>> <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>     Thanks, I'll check Jira.  As for versions, Hadoop 2.2.0, Zk 3.4.5,
>>>>>>     CentOS 64bit (kernel 2.6.32-431.el6.x86_64).  Has much testing
>>>>>> been
>>>>>>     done using Hadoop 2.2.0?  I tried Hadoop 2.0.0 (CDH 4.5.0) but ran
>>>>>>     into HDFS-5225/5031 which basically makes it a non-starter.
>>>>>>
>>>>>>
>>>>>>     On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <[email protected]
>>>>>>     <mailto:[email protected]>> wrote:
>>>>>>
>>>>>>         I meant to reply to your original email, but I didn't yet,
>>>>>> sorry.
>>>>>>
>>>>>>         First off, if Accumulo is reporting that it found multiple
>>>>>>         locations for the same extent, this is a (very bad) bug in
>>>>>>         Accumulo. It might be worth looking at tickets that at marked
>>>>>> as
>>>>>>         "affects 1.5.0" and "fixed in 1.5.1" on Jira. It's likely that
>>>>>>         we've already encountered and fixed the issue, but, if you
>>>>>> can't
>>>>>>         find a fix that was already made, we don't want to overlook
>>>>>> the
>>>>>>         potential need for one.
>>>>>>
>>>>>>         For both "live" and "bulk" ingest, *neither* should lose any
>>>>>>         data. This is one thing that Accumulo should never be doing.
>>>>>> If
>>>>>>         you have multiple locations for an extent, it seems plausible
>>>>>> to
>>>>>>         me that you would run into data loss. However, you should
>>>>>> focus
>>>>>>         on trying to determine why you keep running into multiple
>>>>>>         locations for a tablet.
>>>>>>
>>>>>>         After you take a look at Jira, I would likely go ahead and
>>>>>> file
>>>>>>         a jira to track this since it's easier to follow than an email
>>>>>>         thread. Be sure to note if there is anything notable about
>>>>>> your
>>>>>>         installation (did you download it directly from the
>>>>>>         accumulo.apache.org <http://accumulo.apache.org> site)? You
>>>>>>
>>>>>>         should also include what OS and version and what Hadoop and
>>>>>>         ZooKeeper versions you are running.
>>>>>>
>>>>>>
>>>>>>         On 1/26/2014 4:10 PM, Anthony F wrote:
>>>>>>
>>>>>>             I have observed a loss of data when tservers fail during
>>>>>>             bulk ingest.
>>>>>>             The keys that are missing are right around the table's
>>>>>>             splits indicating
>>>>>>             that data was lost when a tserver died during a split.  I
>>>>>> am
>>>>>>             using
>>>>>>             Accumulo 1.5.0.  At around the same time, I observe the
>>>>>>             master logging a
>>>>>>             message about "Found two locations for the same extent".
>>>>>>               Can anyone
>>>>>>             shed light on this behavior?  Are tserver failures during
>>>>>>             bulk ingest
>>>>>>             supposed to be fault tolerant?
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>
>>>
>>
>

Re: data loss around splits when tserver goes down

Reply via email to