Eric Newton wrote:
Failure to talk to zookeeper is *really* unexpected.
Have you noticed your nodes using any significant swap?
Emphasis on this. Failing to connect to ZooKeeper for 60s (2*30) is a
very long time (although, I think I have seen JVM GC pauses longer before).
A couple of gene
I was simplifying a bit too much. If an error propagates all the way to an
Accumulo client call, then it has stopped retrying for you.
An example:
- create a batchwriter. this creates an update session within the tserver
- mutations are sent against this session id
- mutations are pushed
> Are the hadoop nodes handling your map-reduce job also running tservers?
>
Yes.
Do the Accumulo log files show the exception? If so, can you post it?
Yes, but nothing helpful to track down the cause, it was a very sparse
error message. I will try to post the full error messages.
Are the hadoop nodes handling your map-reduce job also running tservers?
Do the Accumulo log files show the exception? If so, can you post it?
On Wed, Dec 23, 2015 at 9:12 AM, Jeff Kubina wrote:
> I've have a mapreduce job that reads rfiles as Accumulo key/value
> pairs using FileSKVIterator wi
I've have a mapreduce job that reads rfiles as Accumulo key/value
pairs using FileSKVIterator within a RecordReader, partition/shuffles them
based on the byte string of the key, and writes them out as new rfiles
using the AccumuloFileOutputFormat. The objective is to create larger
rfiles for bulk i
Thanks for the beautiful explanation Eric, so this means that if I get
Mutations rejected exception due to tablet server failure, the
batchwriter will resend them to some other server and I do not have
worry about them. Great...
But what is the case when we get mutations rejected exception a
By default, accumulo traces major and minor compactions.
Distributed tracing is one way we try to figure out where time is being
spent. You can read the Google Dapper paper to get a better description of
the framework.
The tracing framework pushes the trace information into the trace table by
for
The accumulo batch writer will re-send mutations if a tablet server fails,
or rejects the mutations because the tablet has moved. There's nothing you
have to do to recover from fail-overs and re-balancing.
I'm not a kernel expert, but I believe that a swappiness setting of "1" is
equivalent to "0