Re: data loss around splits when tserver goes down

Josh Elser Sun, 26 Jan 2014 14:57:49 -0800

Just because the error message is the same doesn't mean that the rootcause is also the same.

Without looking more into Eric's changes, I'm not sure if ACCUMULO-2057would also affect 1.5.0. We're usually pretty good about checkingbackwards when bugs are found in newer versions, but things slip throughthe cracks, too.


On 1/26/2014 5:09 PM, Anthony F wrote:

This is pretty much the issue:

https://issues.apache.org/jira/browse/ACCUMULO-2057

Slightly different error message but it's a different version.  Looks
like its fixed in 1.6.0.  I'll probably need to upgrade.


On Sun, Jan 26, 2014 at 4:47 PM, Anthony F <[email protected]
<mailto:[email protected]>> wrote:

    Thanks, I'll check Jira.  As for versions, Hadoop 2.2.0, Zk 3.4.5,
    CentOS 64bit (kernel 2.6.32-431.el6.x86_64).  Has much testing been
    done using Hadoop 2.2.0?  I tried Hadoop 2.0.0 (CDH 4.5.0) but ran
    into HDFS-5225/5031 which basically makes it a non-starter.


    On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <[email protected]
    <mailto:[email protected]>> wrote:

        I meant to reply to your original email, but I didn't yet, sorry.

        First off, if Accumulo is reporting that it found multiple
        locations for the same extent, this is a (very bad) bug in
        Accumulo. It might be worth looking at tickets that at marked as
        "affects 1.5.0" and "fixed in 1.5.1" on Jira. It's likely that
        we've already encountered and fixed the issue, but, if you can't
        find a fix that was already made, we don't want to overlook the
        potential need for one.

        For both "live" and "bulk" ingest, *neither* should lose any
        data. This is one thing that Accumulo should never be doing. If
        you have multiple locations for an extent, it seems plausible to
        me that you would run into data loss. However, you should focus
        on trying to determine why you keep running into multiple
        locations for a tablet.

        After you take a look at Jira, I would likely go ahead and file
        a jira to track this since it's easier to follow than an email
        thread. Be sure to note if there is anything notable about your
        installation (did you download it directly from the
        accumulo.apache.org <http://accumulo.apache.org> site)? You
        should also include what OS and version and what Hadoop and
        ZooKeeper versions you are running.


        On 1/26/2014 4:10 PM, Anthony F wrote:

            I have observed a loss of data when tservers fail during
            bulk ingest.
            The keys that are missing are right around the table's
            splits indicating
            that data was lost when a tserver died during a split.  I am
            using
            Accumulo 1.5.0.  At around the same time, I observe the
            master logging a
            message about "Found two locations for the same extent".
              Can anyone
            shed light on this behavior?  Are tserver failures during
            bulk ingest
            supposed to be fault tolerant?

Re: data loss around splits when tserver goes down

Reply via email to