Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Harald Barth
The problem is that you the client to scan quickly to find a server that is up, but because networks are not perfectly reliable and drop packets all the time, it cannot know that a server is not up until that server has failed to respond to multiple retransmissions of the request. Those

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Simon Wilkinson
On 24 Jan 2014, at 07:48, Harald Barth h...@kth.se wrote: You are completely right if one must talk to that server. But I think that AFS/RX sometimes hangs to loong on waiting for one server instead of trying the next one. For example for questions that could be answered by any VLDB. I'm

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Peter Grandi
For example in an ideal world putting more or less DB servers in the client 'CellServDB' should not matter, as long as one that belongs to the cell is up; again if the logic were for all types of client: scan quickly the list of potential DB servers, find one that is up and belongs to the

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Harald Barth
I have long thought that we should be using multi for vldb lookups, specifically to avoid the problems with down database servers. The situation is a little bit different for cache managers who can remember which servers are down and command line tools which normally discocver how the world

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Jeffrey Hutzelman
On Fri, 2014-01-24 at 08:01 +, Simon Wilkinson wrote: On 24 Jan 2014, at 07:48, Harald Barth h...@kth.se wrote: You are completely right if one must talk to that server. But I think that AFS/RX sometimes hangs to loong on waiting for one server instead of trying the next one. For

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Brandon Allbery
On Fri, 2014-01-24 at 11:41 -0500, Jeffrey Hutzelman wrote: The problem is the one-off clients that make _one RPC_ and then exit. They have no opportunity to remember what didn't work last time. It Has it been considered to write a cache file somewhere (even a user dotfile) that could be used

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-24 Thread Jeffrey Altman
On 1/24/2014 11:45 AM, Brandon Allbery wrote: On Fri, 2014-01-24 at 11:41 -0500, Jeffrey Hutzelman wrote: The problem is the one-off clients that make _one RPC_ and then exit. They have no opportunity to remember what didn't work last time. It Has it been considered to write a cache file

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-23 Thread Jeffrey Hutzelman
On Thu, 2014-01-23 at 10:44 -0600, Andrew Deason wrote: For example in an ideal world putting more or less DB servers in the client 'CellServDB' should not matter, as long as one that belongs to the cell is up; again if the logic were for all types of client: scan quickly the list of

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-23 Thread Jeffrey Hutzelman
On Thu, 2014-01-23 at 14:58 +, Peter Grandi wrote: My real issue was 'server/CellServeDB' because we could not prepare ahead of time all 3 new servers, but only one at a time. The issue is that with 'server/CellServDB' update there is potentially a DB daemon (PT, VL) restart (even if

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-17 Thread Jeffrey Hutzelman
On Fri, 2014-01-17 at 14:12 -0600, Andrew Deason wrote: time, so presumably if we contact a downed dbserver, the client will not try to contact that dbserver for quite some time. To elaborate: the cache manager keeps track of every server, and periodically sends a sort of ping to each server

Re: [OpenAFS] Re: DB servers quorum and OpenAFS tools

2014-01-17 Thread Jeffrey Hutzelman
On Fri, 2014-01-17 at 14:21 -0600, Andrew Deason wrote: On Fri, 17 Jan 2014 18:50:13 + p...@afs.list.sabi.co.uk (Peter Grandi) wrote: Planned to do this incremental by adding a new DB server to the 'CellServDB', then starting it up, then removing the an old DB server, and so on until