I meant to reply to your original email, but I didn't yet, sorry.
First off, if Accumulo is reporting that it found multiple locations for the same extent, this is a (very bad) bug in Accumulo. It might be worth looking at tickets that at marked as "affects 1.5.0" and "fixed in 1.5.1" on Jira. It's likely that we've already encountered and fixed the issue, but, if you can't find a fix that was already made, we don't want to overlook the potential need for one.
For both "live" and "bulk" ingest, *neither* should lose any data. This is one thing that Accumulo should never be doing. If you have multiple locations for an extent, it seems plausible to me that you would run into data loss. However, you should focus on trying to determine why you keep running into multiple locations for a tablet.
After you take a look at Jira, I would likely go ahead and file a jira to track this since it's easier to follow than an email thread. Be sure to note if there is anything notable about your installation (did you download it directly from the accumulo.apache.org site)? You should also include what OS and version and what Hadoop and ZooKeeper versions you are running.
On 1/26/2014 4:10 PM, Anthony F wrote:
I have observed a loss of data when tservers fail during bulk ingest. The keys that are missing are right around the table's splits indicating that data was lost when a tserver died during a split. I am using Accumulo 1.5.0. At around the same time, I observe the master logging a message about "Found two locations for the same extent". Can anyone shed light on this behavior? Are tserver failures during bulk ingest supposed to be fault tolerant?
