Findings from using Hash Equivalence server at scale: =====================================================
We have now used the OE Hash Equivalence server at scale in our C/I chain at our company. This has given some insights in what works and what can be improved when running this service in full production. Some stats from our run: ------------------------ ### Hash Equivalence server: * ~30M reqs/day * ~500K new entries/day * 2-3K reqs/s * Data growth ~10GiB/day (with dbg info) ### C/I Builds: * ~20K builds/day * 15-20K tasks/build * 140K sstate misses/day. The good: --------- * It works! It finds reusable sstate tasks (in all of our builds) * It is very valuable in recipes that invalidate almost any other package. (Examples: glib, openssl) * Hashserve scales to the need even if only running one instance - Tested for 12K req/s which was sufficient for our needs * Robust enough (if no external cleanup takes place) - _No_ non-recoverable crashes during 300M requests served The client site (Bitbake): -------------------------- We have added a sanity check that disables the use of a remote HE server and switches to a local one if the remote HE server cannot be connected. This is done from the "ConfigParsed" event. The reason for this is to avoid builds hanging in case the remote HE server is not responding. Areas of improvement: --------------------- ### Data retention: There is no built in data-retention. Solving this with recurring external cleanup script. It works but also exposed locking in the server resulting in inter-lock with the external cleanup and timeout for the clients. This can partially be the nature of SQLite and a single file database. OE Hashserve is not built for cleaning up data and the data growth is high so it has to be handled. ### Protocol: As our first tested option for deployment was Kubernetes, the absence of the de-facto standard HTTP(s) protocol required some workarounds. Routing, authentication and monitoring support gets lost on the way. I would suggest that we look into using HTTP(s) + JSON and some Basic Auth as the next basis of the protocol. ### Security: Supply-chain attacks is nowadays something to be aware of. This service uses a non encrypted and non authenticated protocol and is thus open to man-in-the-middle attacks, or any type of fake data and manipulation. The protocol changes suggested above could mitigate some of this. Additionally, one thought is for the client to provide a signature of the hash together with the hash so that it can be verified by the client using it's secret upon retrieval. ### External database connection An option for an external DB (For example PostgreSQL) would improve the possibility for concurrent cleanup and vacuum while running. In the stateless world of cloud/containers/pods, an option for external DB would be favorable. This would probably make use of some external package for DB interaction so it would differ a little from the standard python only nature of Bitbake in general. ### Nice to have - Minor changes that should be more easily added as patches * Hash (LRU) cache. From our stats, there seem to be a 1:20 ratio in writes to reads from the DB so a cache might save some resources. Conclusions =========== Hashserve is FOSS and if we want improvements, we have to contribute. I will investigate to what extent we can chip in on some of these parts. However, especially for the protocol and maybe also on external Python dependencies, via PyPI, it would be nice to know if this is acceptable and wished-for changes, before starting out.
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#187988): https://lists.openembedded.org/g/openembedded-core/message/187988 Mute This Topic: https://lists.openembedded.org/mt/101497051/21656 Group Owner: [email protected] Unsubscribe: https://lists.openembedded.org/g/openembedded-core/unsub [[email protected]] -=-=-=-=-=-=-=-=-=-=-=-
