Christopher Gorge A. Marges [15/10/07 14:12]:
...


when we tested running the program without inserting
the "large" files, there was no problem, we tried to
insert 2000 records (just the id field for example)
and it was ok.  i suspect that the controller chokes
on the large volume of data coming from the other
machine.

could there be a way to fix this?


Hi Christopher,

thanks for the detailed report.

We experienced similar issues, with a different group-communication package though. Our conclusions were similar to yours: the controller gets somewhat over-whelmed when asked to insert many "large" binary data (we tested with 1 to 8M blobs inserts).

Either the group-comm subsystem threads e.g. the group membership service in the controller's jvm or the whole machine including the backends and recovery log dbs (you are using collocated backends, and hsqldb recovery log) get slow enough on network replies to cause a false positive on the network-partition detector. Then you have a spurious split-brain.

We fixed one performance bottleneck somewhere in the controller which narrowed the occurrences of the issue. However, this was not sufficient it seems.

The usual workaround was to:
1/ not use hsqldb as recovery log
2/ up the timeouts on the network-partition detector


Hope this helps.

A+O.
_______________________________________________
Sequoia mailing list
[email protected]
https://forge.continuent.org/mailman/listinfo/sequoia

Reply via email to