Re: Mutation Rejected exception with server Error 1

2015-12-26 Thread Eric Newton
Generally speaking, rejected mutations due to resource contention is considered a system failure, requiring a re-examination of system resources. That requires re-architecting your ingest or adding significant resources. You could do some substantial pre-processing of your ingest and bulk-load th

Re: Mutation Rejected exception with server Error 1

2015-12-24 Thread mohit.kaushik
@ Eric: yes I have notices 3GB to 5GB swap uses out of 32GB on servers. And if I will resend the mutations rejected explicitly then this may create a loop for mutations getting rejected again and again. Then how can I handle it? How did you? Am i getting it right? @ Josh: For one of the zookeep

Re: Mutation Rejected exception with server Error 1

2015-12-23 Thread Josh Elser
Eric Newton wrote: Failure to talk to zookeeper is *really* unexpected. Have you noticed your nodes using any significant swap? Emphasis on this. Failing to connect to ZooKeeper for 60s (2*30) is a very long time (although, I think I have seen JVM GC pauses longer before). A couple of gene

Re: Mutation Rejected exception with server Error 1

2015-12-23 Thread Eric Newton
I was simplifying a bit too much. If an error propagates all the way to an Accumulo client call, then it has stopped retrying for you. An example: - create a batchwriter. this creates an update session within the tserver - mutations are sent against this session id - mutations are pushed

Re: Mutation Rejected exception with server Error 1

2015-12-23 Thread mohit.kaushik
Thanks for the beautiful explanation Eric, so this means that if I get Mutations rejected exception due to tablet server failure, the batchwriter will resend them to some other server and I do not have worry about them. Great... But what is the case when we get mutations rejected exception a

Re: Mutation Rejected exception with server Error 1

2015-12-23 Thread Eric Newton
By default, accumulo traces major and minor compactions. Distributed tracing is one way we try to figure out where time is being spent. You can read the Google Dapper paper to get a better description of the framework. The tracing framework pushes the trace information into the trace table by for

Re: Mutation Rejected exception with server Error 1

2015-12-23 Thread Eric Newton
The accumulo batch writer will re-send mutations if a tablet server fails, or rejects the mutations because the tablet has moved. There's nothing you have to do to recover from fail-overs and re-balancing. I'm not a kernel expert, but I believe that a swappiness setting of "1" is equivalent to "0

Re: Mutation Rejected exception with server Error 1

2015-12-22 Thread mohit.kaushik
And why are there 5000 spans queued for delevery? *Tracing spans are being dropped because there are already 5000 spans queued for delivery. This does not affect performance, security or data integrity, but distributed tracing information is being lost.* On 12/23/2015 10:01 AM, mohit.kaushik

Re: Mutation Rejected exception with server Error 1

2015-12-22 Thread mohit.kaushik
I have 3 tablet servers having around 1.4K tablets. If a tablet server loses its session with zookeeper and killed itself. The system takes some time to move all hosted tablets to other servers. In this case if a ingest in process then what should happen with the mutations going to tablets h

Re: Mutation Rejected exception with server Error 1

2015-12-22 Thread Eric Newton
A tablet server is given the rights to manage a tablet. It is critical that no other server uses the tablet to maintain consistency. To maintain the right to access a tablet, it must maintain a zookeeper session. The zookeeper session periodically exchanges keep-alive messages. If either party fa