Just because the error message is the same doesn't mean that the root
cause is also the same.
Without looking more into Eric's changes, I'm not sure if ACCUMULO-2057
would also affect 1.5.0. We're usually pretty good about checking
backwards when bugs are found in newer versions, but things slip through
the cracks, too.
On 1/26/2014 5:09 PM, Anthony F wrote:
This is pretty much the issue:
https://issues.apache.org/jira/browse/ACCUMULO-2057
Slightly different error message but it's a different version. Looks
like its fixed in 1.6.0. I'll probably need to upgrade.
On Sun, Jan 26, 2014 at 4:47 PM, Anthony F <[email protected]
<mailto:[email protected]>> wrote:
Thanks, I'll check Jira. As for versions, Hadoop 2.2.0, Zk 3.4.5,
CentOS 64bit (kernel 2.6.32-431.el6.x86_64). Has much testing been
done using Hadoop 2.2.0? I tried Hadoop 2.0.0 (CDH 4.5.0) but ran
into HDFS-5225/5031 which basically makes it a non-starter.
On Sun, Jan 26, 2014 at 4:29 PM, Josh Elser <[email protected]
<mailto:[email protected]>> wrote:
I meant to reply to your original email, but I didn't yet, sorry.
First off, if Accumulo is reporting that it found multiple
locations for the same extent, this is a (very bad) bug in
Accumulo. It might be worth looking at tickets that at marked as
"affects 1.5.0" and "fixed in 1.5.1" on Jira. It's likely that
we've already encountered and fixed the issue, but, if you can't
find a fix that was already made, we don't want to overlook the
potential need for one.
For both "live" and "bulk" ingest, *neither* should lose any
data. This is one thing that Accumulo should never be doing. If
you have multiple locations for an extent, it seems plausible to
me that you would run into data loss. However, you should focus
on trying to determine why you keep running into multiple
locations for a tablet.
After you take a look at Jira, I would likely go ahead and file
a jira to track this since it's easier to follow than an email
thread. Be sure to note if there is anything notable about your
installation (did you download it directly from the
accumulo.apache.org <http://accumulo.apache.org> site)? You
should also include what OS and version and what Hadoop and
ZooKeeper versions you are running.
On 1/26/2014 4:10 PM, Anthony F wrote:
I have observed a loss of data when tservers fail during
bulk ingest.
The keys that are missing are right around the table's
splits indicating
that data was lost when a tserver died during a split. I am
using
Accumulo 1.5.0. At around the same time, I observe the
master logging a
message about "Found two locations for the same extent".
Can anyone
shed light on this behavior? Are tserver failures during
bulk ingest
supposed to be fault tolerant?