I took a look in the code . . . the stack trace is not quite the same. In 1.6.0, the fixed issue related to METADATA_LAST_LOCATION_COLUMN_FAMILY. The issue I am seeing (in 1.5.0) is related to METADATA_CURRENT_LOCATION_COLUMN_FAMILY (line 144).
On Sun, Jan 26, 2014 at 7:00 PM, Anthony F <[email protected]> wrote: > The stack trace is pretty close and the steps to reproduce match the > scenario in which I observed the issue. But there's no fix (in Jira) > against 1.5.0, just 1.6.0. > > > On Sun, Jan 26, 2014 at 5:56 PM, Josh Elser <[email protected]> wrote: > >> Just because the error message is the same doesn't mean that the root >> cause is also the same. >> >> Without looking more into Eric's changes, I'm not sure if ACCUMULO-2057 >> would also affect 1.5.0. We're usually pretty good about checking backwards >> when bugs are found in newer versions, but things slip through the cracks, >> too. >> >> >> On 1/26/2014 5:09 PM, Anthony F wrote: >> >>> This is pretty much the issue: >>> >>> https://issues.apache.org/jira/browse/ACCUMULO-2057 >>> >>> Slightly different error message but it's a different version. Looks >>> like its fixed in 1.6.0. I'll probably need to upgrade. >>> >>> >>> On Sun, Jan 26, 2014 at 4:47 PM, Anthony F <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Thanks, I'll check Jira. As for versions, Hadoop 2.2.0, Zk 3.4.5, >>> CentOS 64bit (kernel 2.6.32-431.el6.x86_64). Has much testing been >>> done using Hadoop 2.2.0? I tried Hadoop 2.0.0 (CDH 4.5.0) but ran >>> into HDFS-5225/5031 which basically makes it a non-starter. >>> >>> >>> On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> I meant to reply to your original email, but I didn't yet, sorry. >>> >>> First off, if Accumulo is reporting that it found multiple >>> locations for the same extent, this is a (very bad) bug in >>> Accumulo. It might be worth looking at tickets that at marked as >>> "affects 1.5.0" and "fixed in 1.5.1" on Jira. It's likely that >>> we've already encountered and fixed the issue, but, if you can't >>> find a fix that was already made, we don't want to overlook the >>> potential need for one. >>> >>> For both "live" and "bulk" ingest, *neither* should lose any >>> data. This is one thing that Accumulo should never be doing. If >>> you have multiple locations for an extent, it seems plausible to >>> me that you would run into data loss. However, you should focus >>> on trying to determine why you keep running into multiple >>> locations for a tablet. >>> >>> After you take a look at Jira, I would likely go ahead and file >>> a jira to track this since it's easier to follow than an email >>> thread. Be sure to note if there is anything notable about your >>> installation (did you download it directly from the >>> accumulo.apache.org <http://accumulo.apache.org> site)? You >>> >>> should also include what OS and version and what Hadoop and >>> ZooKeeper versions you are running. >>> >>> >>> On 1/26/2014 4:10 PM, Anthony F wrote: >>> >>> I have observed a loss of data when tservers fail during >>> bulk ingest. >>> The keys that are missing are right around the table's >>> splits indicating >>> that data was lost when a tserver died during a split. I am >>> using >>> Accumulo 1.5.0. At around the same time, I observe the >>> master logging a >>> message about "Found two locations for the same extent". >>> Can anyone >>> shed light on this behavior? Are tserver failures during >>> bulk ingest >>> supposed to be fault tolerant? >>> >>> >>> >>> >
