Re: Client connection patterns

2022-06-10 Thread Szalay-Bekő Máté
> the person who experienced a failure and loss of a collection evidently
had a "not synced" zk server node that also didn't know it wasn't synced

This indeed should be a configuration issue or a bug. ZK servers when
joining a quorum, sync-ing their states and later only the leader changes
the state. Although I know a few bugs in older ZK versions that can lead to
out-of-sync states in some rare cases.

In ZooKeeper, each committed state change made by the leader is marked by a
'zxid', an atomically / monotone increasing 'logical timestamp' value. It
shouldn't happen to have different changes marked by the same zxid. Also if
a client got a response that some change was successful, then that change
has already persisted and even in the case of a server failure, it should
be restored. But the theoretical configuration problem you described
(having multiple ZK quorums running and clients got provided servers from
multiple quorums) can lead to all sorts of problems. The states will
diverge and even the session ids can be duplicated.

However, you mentioned that ZooKeeper is used differently in multiple
places in Solr. If Solr uses multiple zk client sessions in a single jvm,
then it is possible that these zk sessions are handled by different zk
servers and server A is a bit behind server B (last known zxid is smaller
on zk server A than on zk server B). The state observed by the client in
session A will eventually catch up (within a given time bound) and the
changes in  session A will exactly follow the changes observed in session
B. But ZK doesn't guarantee that different client sessions (either running
on different hosts or in the same host / jvm) will always see the same
state. For the same session (if it moves to a different ZK server) this is
guaranteed though (the client  communicates to the server the last zxid it
has seen and the server knows if it is not up-to-date and can handle the
situation).
https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html#Guarantees

When one designs any distributed synchronization using ZooKeeper, then it
is important to calculate with these limitations. Relying on Curator can
help here a lot, as it provides a higher level of abstraction.

> Based on your description A load balancer would still potentially cause
interference if (for example) it didn't allow long running connections
Yes, definitely. ZooKeeper client connections will be frequently broken if
the load balancer doesn't support sticky tcp sessions. And (re-)connection
can be an expensive operation, especially if e.g. kerberos is used.

Best regards,
Mate


On Fri, Jun 10, 2022, 7:53 PM Gus Heck  wrote:

> Thanks for the response. This is helpful, and jibes with what I am seeing
> in the code. WRT solr, currently there are a couple isolated places where
> curator is used, and there is a desire to move to using it extensively but
> that has not happened yet <
> https://issues.apache.org/jira/browse/SOLR-16116>.
> (PR )
>
> To date, Solr manages its own clients directly. Exactly what it does varies
> a bit depending on who wrote the code and when, and I think part of the
> impetus for Curator is to achieve consistency.
>
> In the ticket I linked, the person who experienced a failure and loss of a
> collection evidently had a "not synced" zk server node that also didn't
> know it wasn't synced. Based on what you say above I guess this means that
> node wasn't configured properly and maybe didn't have a list of the other
> servers? I'm assuming nodes start up as not synced... Obviously if true,
> that's a serious user error in configuration, but conceivable if deployment
> infrastructure is meant to generate the config, and succeeds incorrectly
> (imagining something doing something conceptually like $OLDSERVERS + $NEW
> where $OLDSERVERS came out blank).
>
> IIUC you rely on the target server *knowing* that it is out of sync. Adding
> a server that has a false sense of confidence to the client's connection
> string doesn't (yet) seem any different than adding it to the load balancer
> here. In either case the client might select the "rogue" server that thinks
> it's synced but isn't with identical results. Based on your description A
> load balancer would still potentially cause interference if (for example)
> it didn't allow long running connections, but this would just be spurious
> disconnect/reconnect cycles and inconsistent load on individual zk servers,
> suboptimal but non-catastrophic unless near machine capacity already.
>
> -Gus
>
>
> On Fri, Jun 10, 2022 at 12:35 PM Szalay-Bekő Máté <
> szalay.beko.m...@gmail.com> wrote:
>
> > Hello Gus,
> >
> > I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
> > load balancing and also they won't connect to any 'out-of-sync' servers.
> > The way it works normally:
> >
> > - You have ZK servers A, B and C. You list all these servers in all your
> > ZooKeeper client configs. And in all server configs.
> 

Re: Client connection patterns

2022-06-10 Thread Gus Heck
Thanks for the response. This is helpful, and jibes with what I am seeing
in the code. WRT solr, currently there are a couple isolated places where
curator is used, and there is a desire to move to using it extensively but
that has not happened yet .
(PR )

To date, Solr manages its own clients directly. Exactly what it does varies
a bit depending on who wrote the code and when, and I think part of the
impetus for Curator is to achieve consistency.

In the ticket I linked, the person who experienced a failure and loss of a
collection evidently had a "not synced" zk server node that also didn't
know it wasn't synced. Based on what you say above I guess this means that
node wasn't configured properly and maybe didn't have a list of the other
servers? I'm assuming nodes start up as not synced... Obviously if true,
that's a serious user error in configuration, but conceivable if deployment
infrastructure is meant to generate the config, and succeeds incorrectly
(imagining something doing something conceptually like $OLDSERVERS + $NEW
where $OLDSERVERS came out blank).

IIUC you rely on the target server *knowing* that it is out of sync. Adding
a server that has a false sense of confidence to the client's connection
string doesn't (yet) seem any different than adding it to the load balancer
here. In either case the client might select the "rogue" server that thinks
it's synced but isn't with identical results. Based on your description A
load balancer would still potentially cause interference if (for example)
it didn't allow long running connections, but this would just be spurious
disconnect/reconnect cycles and inconsistent load on individual zk servers,
suboptimal but non-catastrophic unless near machine capacity already.

-Gus


On Fri, Jun 10, 2022 at 12:35 PM Szalay-Bekő Máté <
szalay.beko.m...@gmail.com> wrote:

> Hello Gus,
>
> I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
> load balancing and also they won't connect to any 'out-of-sync' servers.
> The way it works normally:
>
> - You have ZK servers A, B and C. You list all these servers in all your
> ZooKeeper client configs. And in all server configs.
> - ZK servers form a quorum, when the majority of the servers are joined.
> There is always one quorum member selected as leader (this is the only one
> that can change the state stored in ZK and it will only commit a change if
> the majority of the servers approved it).
> - each client is connected to a single ZK server at a time.
> - If any ZK server goes out of sync (e.g. losing the connection to the
> leader, etc) then it will stop serving requests. So even if your client
> would connect to such a server, the client will lose the connection
> immediately if the server left the quorum and no client will be able to
> connect to such an "out of sync" server.
> - The ZK client first connects to a random server from the server list. If
> the connection fails (server is unreachable or not serving client request),
> the client will move to the next server in the list in a round-robin
> fashion, until it finds a working / in-sync server.
> - All ZK clients talk with the server in "sessions". Each client
> session has an ID, unique in the cluster. If the client loses the
> connection to a server, it will automatically try to connect to a different
> server using ('renewing') the same session id. "A client will see the same
> view of the service regardless of the server that it connects to. i.e., a
> client will never see an older view of the system even if the client fails
> over to a different server with the same session." This is (among other
> things) guaranteed by ZooKeeper.
> - Of course, sessions can terminate. There is a pre-configured and
> negotiated session timeout. The session will be deleted from the ZK cluster
> if the connection between any server and the given client breaks for more
> than a pre-configured time (e.g. no heartbeat for 30 seconds). After this
> time, the session can not be renewed. In this case the application needs to
> decide what to do. It can start a new session, but then it is possible that
> e.g. it will miss some watched events and also it will lose its ephemeral
> ZNodes. (e.g. I know HBase servers aborts themselves when ZK
> session timeout happens, as they can not guarantee consistency anymore) Now
> as far as I remember, Solr is using Curator to handle ZooKeeper
> connections. I'm not entirely sure how Solr is using ZooKeeper through
> Curator. Maybe Solr reconnects automatically with a new session if the old
> one terminates. Maybe Solr handles this case in a different way.
>
> see our overview docs:
> https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html
>
> By default the list of the servers is a static config. If you want to add a
> new server (or remove an old one), then you need to rolling-restart all the
> ZK servers and also restart the 

Re: Client connection patterns

2022-06-10 Thread Szalay-Bekő Máté
Hello Gus,

I think you shouldn't use a load-balancer for ZooKeeper. Clients do the
load balancing and also they won't connect to any 'out-of-sync' servers.
The way it works normally:

- You have ZK servers A, B and C. You list all these servers in all your
ZooKeeper client configs. And in all server configs.
- ZK servers form a quorum, when the majority of the servers are joined.
There is always one quorum member selected as leader (this is the only one
that can change the state stored in ZK and it will only commit a change if
the majority of the servers approved it).
- each client is connected to a single ZK server at a time.
- If any ZK server goes out of sync (e.g. losing the connection to the
leader, etc) then it will stop serving requests. So even if your client
would connect to such a server, the client will lose the connection
immediately if the server left the quorum and no client will be able to
connect to such an "out of sync" server.
- The ZK client first connects to a random server from the server list. If
the connection fails (server is unreachable or not serving client request),
the client will move to the next server in the list in a round-robin
fashion, until it finds a working / in-sync server.
- All ZK clients talk with the server in "sessions". Each client
session has an ID, unique in the cluster. If the client loses the
connection to a server, it will automatically try to connect to a different
server using ('renewing') the same session id. "A client will see the same
view of the service regardless of the server that it connects to. i.e., a
client will never see an older view of the system even if the client fails
over to a different server with the same session." This is (among other
things) guaranteed by ZooKeeper.
- Of course, sessions can terminate. There is a pre-configured and
negotiated session timeout. The session will be deleted from the ZK cluster
if the connection between any server and the given client breaks for more
than a pre-configured time (e.g. no heartbeat for 30 seconds). After this
time, the session can not be renewed. In this case the application needs to
decide what to do. It can start a new session, but then it is possible that
e.g. it will miss some watched events and also it will lose its ephemeral
ZNodes. (e.g. I know HBase servers aborts themselves when ZK
session timeout happens, as they can not guarantee consistency anymore) Now
as far as I remember, Solr is using Curator to handle ZooKeeper
connections. I'm not entirely sure how Solr is using ZooKeeper through
Curator. Maybe Solr reconnects automatically with a new session if the old
one terminates. Maybe Solr handles this case in a different way.

see our overview docs:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperOver.html

By default the list of the servers is a static config. If you want to add a
new server (or remove an old one), then you need to rolling-restart all the
ZK servers and also restart the clients with the new config. The dynamic
reconfig feature (if enabled) allows you to do this in a more clever way,
storing the list of the ZK servers inside a system znode, which can be
changed dynamically:
https://zookeeper.apache.org/doc/r3.8.0/zookeeperReconfig.html (this is
available since ZooKeeper 3.5)

Best regards,
Mate


On Thu, Jun 9, 2022 at 10:37 PM Gus Heck  wrote:

> Hi ZK Folks,
>
> Some prior information including discussion on SOLR-13396
> <
> https://issues.apache.org/jira/browse/SOLR-13396?focusedCommentId=16822748=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16822748
> >
> had
> led me to believe that the zookeeper client established connections to all
> members of the cluster. This then seemed to be the logic for saying that
> having a load balancing in front of zookeeper was dangerous, allowing the
> possibility that the client might decide to talk to a zk that had not
> synced up. In the solr case this could lead to data loss. (see discussion
> in the above ticket).
>
> However, I've now been reading code pursuing an issue for a client and
> unless the multiple connections are hidden deep inside the handling of
> channels in the ClientCnxnSocketNIO class (or it's close relatives) it
> looks a lot to me like only one actual connection is held at one time by an
> instance of ZooKeeper.java.
>
> If that's true, then while the ZooKeeper codebase certainly has logic to
> reconnect and to balance across the cluster etc, it's becoming murky to me
> how listing all zk servers directly vs through a load balancer would be
> protection against connecting to an as-yet unsynced zookeeper if it existed
> in the configured server list.
>
> Does such a protection exist? or is it the user's responsibility not to add
> the server to the list (or load balancer) until it's clear that it has
> successfully joined the cluster and synced its data?
>
> -Gus
>
> --
> http://www.needhamsoftware.com (work)
> http://www.the111shift.com (play)
>