On 10/09/2020 07:11, Piotr Nowara wrote:
Andy, Rob,

thanks for your explanations and suggestions - now I understand the issue
much better.

I tried different approaches to solve it but the freeze keeps
occurring when the leading Zookeper instance is down/stopped permanently.
(BTW: shutting down even 2 out of 3 RDF Delta servers does not have any
negative impact on the SPARQL execution which is great).

I think this is to do with the way zookeeper works. I don't know (= can't remember) if ZK has a way to gracefully shutdown that hands over the leader role without a full election with timeouts. It does look like exactly 10s timeouts happening somewhere. (Same situation as the 30s you reported originally?)

If the way you stop the leader is abrupt, then the system is going to have to do some kind of timeout because it may be a transitory network glitch.

Did you try any changes to the zk server configuration like Rob's [2] link?

    Andy

Andy,
to answer your question about when the freeze happens:

For SELECTs it always occurs between the second and third (last) log entry:

[2020-09-10 05:52:13] Fuseki INFO [13] POST http://localhost:3031/ds
[2020-09-10 05:52:13] Fuseki INFO [13] Query = SELECT * {?s ?p ?o}
[2020-09-10 05:52:23] Fuseki INFO [13] 200 OK (10.023 s)

For INSERTs log looks like this:
[2020-09-10 05:48:50] Fuseki INFO [7] POST http://localhost:3031/ds
[2020-09-10 05:49:00] HTTP INFO Send patch id:758bcf (165 bytes) ->
ds:090688
[2020-09-10 05:49:09] Fuseki INFO [7] 200 OK (19.059 s)
[2020-09-10 05:50:15] Fuseki INFO [10] POST http://localhost:3031/ds
[2020-09-10 05:50:25] HTTP INFO Send patch id:472bf1 (165 bytes) ->
ds:090688
[2020-09-10 05:50:44] Fuseki INFO [10] 200 OK (29.042 s)

Thanks,
Piotr

wt., 18 sie 2020 o 22:03 Andy Seaborne <[email protected]> napisaƂ(a):

Piotr,

It will depend on how long zookeeper takes to resync. One of the factors
is how big the zookeeper database has become because when a ZK server
starts, it has to process the snapshot and any logs to

But while syncing the other zk server should still provide service
(read-only) - maybe the Fuseki server is choosing the fresh zk server so
it'll wait whereas as if went to another server, it would be OK (for a
read-transaction). I thought the new one would not service request until
ready but I'm not sure.

At what point in the Fuseki log file does the freeze happen? After the
HTTP request is received or as it exits (is the freeze before or after
the middle log line of the three for a request).

With 3 zk servers, the system can survive one outage.
With 5 zk server it can survive one outage and still allow write, or 2
out and read-only.

You can also configure zk in more complex primary-secondary configurations.

A load balancer between the Fuseki servers and the zk servers may help.

      Andy


On 17/08/2020 17:09, Piotr Nowara wrote:
Hi,

We are testing RDF Delta with three Zookeeper instances. Sometimes when
we
kill one of those Zookeper instances Fuseki freezes for about 30 seconds
which is bad. Is this expected? Will increasing the number of Zookeper
instances help to avoid such issues?

Yes and no.


Thanks,
Piotr Nowara



Reply via email to