Hi Vishal, On Sat, Aug 20, 2011 at 1:43 AM, Vishal Kher <[email protected]> wrote:
> My few cents.. > I am not sure if we can distinguish between spurious/non-spurious warnings > and I don't think we can time it well. The delay is applicable only in > certain cases. If the user knows that there will be a start up delay, then > the user can ignore those errors or modify their scripts to start the server > after a delay. I guess you misinterpreted it :-( starting the server after delay is not a solution for the original problem that I was referring to. I do not also see it possible to get my original problem fixed through a script. At least I do not know how to do it. May be changing the log level to something like FATAL and reverting it back to INFO after the delay?? I do not think that is a good idea as that will cut off some of the stuff that I want to see. > Does this have to implemented in the server? I sounds me that this is > something that user scripts should handle. > As I said I do not see how the user script can handle this? if there is any option please do let me know. Sampath > > > On Fri, Aug 19, 2011 at 7:00 AM, Flavio Junqueira <[email protected]>wrote: > >> Sampath, Do you think something along the lines of what Ted describes >> would work for you? >> >> -Flavio >> >> On Aug 18, 2011, at 7:13 PM, Ted Dunning wrote: >> >> The thought is that a server would not complain about connection refused >> or inability to form a quorum during the first (say) twenty seconds of >> operation. >> >> The thesis is that warnings from these causes during that time are >> spurious. >> >> As I mentioned, I don't see this as urgent or even necessarily a good >> idea. I completely reboot a ZK cluster once every year or three. When I am >> doing a rolling upgrade, I *want* to see alerts when I bounce a machine. If >> I don't want to see those alerts, my monitoring system allows me to put a >> machine into maintenance mode for a short period of time to temporarily >> suppress the warnings. >> >> All I was doing was translating and elaborating the original poster's >> suggestion, not so much endorsing it. >> >> On Thu, Aug 18, 2011 at 8:54 AM, Flavio Junqueira <[email protected]>wrote: >> >>> Hi Ted, I don't see how one can automate the distinction between a >>> machine that is down because it crashed and a machine that is down because >>> it hasn't started yet. Assuming that we are logging the machine >>> unavailability as we are doing currently, one can always look at the >>> timestamp of the warning and remember that this is the time the machines >>> were bootstrapping. Consequently, I don't really see the point of reducing >>> the number of warnings, unless the warnings are really polluting the logs. I >>> typically don't see so many that prevents me from reading the rest, but you >>> may have a different perception. Also, recall that we back off, so the >>> warnings become less frequent over time. >>> >>> I'm open to ideas, though. If you see anything wrong in my rationale or >>> if you have an idea of how to do it differently, then I'd be happy to hear. >>> However, if the idea is simply to add a parameter that configures the time >>> for leader election to start, then I'm currently not in favor. >>> >>> -Flavio >>> >>> On Aug 18, 2011, at 5:39 PM, Ted Dunning wrote: >>> >>> Flavio, >>> >>> What you say is correct, but the original poster does have a point that >>> many >>> of these warnings are to be expected and there is a heuristic that might >>> assist in distinguishing some of these cases so that false alarms in the >>> logs could be decreased. >>> >>> That doesn't seem like a big deal to me, but different people have >>> different >>> itches. In my experience, restarting a ZK cluster from zero almost never >>> happens. >>> >>> On Thu, Aug 18, 2011 at 8:36 AM, Ted Dunning <[email protected]> >>> wrote: >>> >>> >>> >>> On Thu, Aug 18, 2011 at 12:15 AM, Sampath Perera < >>> [email protected]>wrote: >>> >>> >>> >>> Hhmmm, I think this is a bit different isn't it? Here we know that the >>> >>> first >>> >>> server to come will be failing to connect to the other as they are not >>> yet >>> >>> up. Anyway our real issue is the warning. >>> >>> >>> >>> We know that. >>> >>> >>> But how does the server know that it is the first server? That is the >>> >>> whole point of the leader election. You might just have a server >>> rejoining >>> >>> a cluster. Or you might have a cluster that has been turned off. Or a >>> >>> cluster with 2 out of 5 machines off and we tried to touch the other down >>> >>> machine before the others. >>> >>> >>> >>> >>> Would you like to suggest a patch? >>> >>> >>> >>> Of course I do.. will prepare a patch and attach. >>> >>> >>> >>> Great! >>> >>> >>> >>> >>> *flavio* >>> *junqueira* >>> >>> research scientist >>> >>> [email protected] >>> direct +34 93-183-8828 >>> >>> avinguda diagonal 177, 8th floor, barcelona, 08018, es >>> phone (408) 349 3300 fax (408) 349 3301 >>> >>> >>> >> >> *flavio* >> *junqueira* >> >> research scientist >> >> [email protected] >> direct +34 93-183-8828 >> >> avinguda diagonal 177, 8th floor, barcelona, 08018, es >> phone (408) 349 3300 fax (408) 349 3301 >> >> >> > -- Thanks, Sampath http://adroitlogic.org
