D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3
sheehan abandoned this revision. sheehan added a comment. > Either way, we'll be deploying this to Mozilla's hg servers in the next few months and testing it out. Perhaps after it's been in production for some time we will have a stronger case for inclusion in core. :) Going to deploy and maintain this at Mozilla for the time being and consider moving into core at a later time. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5194 To: sheehan, #hg-reviewers Cc: martinvonz, indygreg, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3
indygreg added a comment. In https://phab.mercurial-scm.org/D5194#77606, @martinvonz wrote: > Is this useful enough to others that it should live in the hg core repo? It doesn't seem like it to me, but maybe I'm wrong. I think having plug-and-play caching solutions in the official Mercurial distribution would be an extremely compelling product feature. We could tell people "just install Mercurial and add these config options to make your server scale nearly effortlessly." That's a killer feature IMO. S3 is pretty popular as a key-value store and I think there is a market for it. Obviously other cache backends would be useful too. And if we move forward with cache backends in core, we should be prepared to support GCP, Redis, other backends. Whether those are supported in the same extension or in separate extensions, I'm not sure. Time will tell. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5194 To: sheehan, #hg-reviewers Cc: martinvonz, indygreg, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3
sheehan added a comment. In https://phab.mercurial-scm.org/D5194#77606, @martinvonz wrote: > Is this useful enough to others that it should live in the hg core repo? It doesn't seem like it to me, but maybe I'm wrong. My thought process was that since the new wire protocol supports caching command responses but does not actually provide any cache implementations, including some optional OOB support for something as common as S3 would be useful for anyone considering use of that feature. Maybe that's not enough reason to justify an extension in the core repo, I'm not certain. Either way, we'll be deploying this to Mozilla's hg servers in the next few months and testing it out. Perhaps after it's been in production for some time we will have a stronger case for inclusion in core. :) REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5194 To: sheehan, #hg-reviewers Cc: martinvonz, indygreg, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3
martinvonz added a comment. Is this useful enough to others that it should live in the hg core repo? It doesn't seem like it to me, but maybe I'm wrong. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5194 To: sheehan, #hg-reviewers Cc: martinvonz, indygreg, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3
sheehan added a subscriber: indygreg. sheehan added a comment. Throwing this up for review now, but there are a few things that could be done to improve this. A cache expiration policy might be useful, but is difficult to test with the S3 bucket expiration rules. It may also be desirable to be able to specify more than one S3 bucket/region/account in the future. @indygreg will have more thoughts when he returns, I'm sure. :) INLINE COMMENTS > s3wireprotocache.py:185 > +def adjustcachekeystate(self, state): > +if self.s3_endpoint_url: # testing backdoor > +del state[b'repo'] This is needed for determinism in testing, but there is likely a better way to avoid it that checking for an alternative endpoint url. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5194 To: sheehan, #hg-reviewers Cc: indygreg, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3
sheehan created this revision. Herald added a subscriber: mercurial-devel. Herald added a reviewer: hg-reviewers. REVISION SUMMARY With wireprotocol version two introducing command response caching and enabling content redirect responses, it is possible to store response objects in an arbitrary blob store and send clients to the store to retrieve large responses. This commit adds an extension which implements such wire protocol caching in Amazon S3. Servers add their AWS access key and key ID to an hgrc config, and specify the name of the S3 bucket which holds the objects. When a cache lookup request comes in, the cacher sends a HEAD request to S3 which will return a 404 if the object does not exist (ie a cache miss). If the request is a cache hit, a presigned url for the object is generated and used to issue a content redirect response which is sent to the client. If the response indicates a cache miss, the response is generated by the server and buffered in the cache until `onfinished` is called. During `onfinished`, we calculate the size of the response and can optionally avoid caching if the response is below a configured minimum threshold. Otherwise we insert the object into the cache bucket using the `put_object` API. To test this extension, we require the `moto` mock AWS library. Specifically, we use the "standalone server" functionality, which creates a Flask application that imitates S3. A new hghave predicate is added to check for this functionality before testing. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D5194 AFFECTED FILES hgext/s3wireprotocache.py tests/hghave.py tests/test-help.t tests/test-s3wireprotocache.t CHANGE DETAILS diff --git a/tests/test-s3wireprotocache.t b/tests/test-s3wireprotocache.t new file mode 100644 --- /dev/null +++ b/tests/test-s3wireprotocache.t @@ -0,0 +1,219 @@ +#require motoserver + + $ . $TESTDIR/wireprotohelpers.sh + +Set up the mock S3 server, create a bucket + + $ moto_server -p 15467 s3 >> mocks3.log 2>&1 & + $ MOTO_PID=$! + >>> import boto3 + >>> s3 = boto3.client('s3', + ... aws_access_key_id='dummyaccessid', + ... aws_secret_access_key='dummysecretkey', + ... endpoint_url='http://localhost:15467/') + >>> _ = s3.create_bucket( + ... ACL='public-read', + ... Bucket='testbucket') + + $ cat >> $HGRCPATH << EOF + > [extensions] + > blackbox = + > [blackbox] + > track = s3wireprotocache + > EOF + $ hg init server + $ enablehttpv2 server + $ cd server + $ cat >> .hg/hgrc << EOF + > [extensions] + > s3wireprotocache = + > [s3wireprotocache] + > access_key_id = dummyaccessid + > secret_access_key = dummysecretkey + > bucket = testbucket + > redirecttargets = http://localhost:15467/ + > endpoint_url = http://localhost:15467/ + > EOF + + $ echo a0 > a + $ echo b0 > b + $ hg -q commit -A -m 'commit 0' + $ echo a1 > a + $ hg commit -m 'commit 1' + $ echo b1 > b + $ hg commit -m 'commit 2' + $ echo a2 > a + $ echo b2 > b + $ hg commit -m 'commit 3' + + $ hg log -G -T '{rev}:{node} {desc}' + @ 3:50590a86f3ff5d1e9a1624a7a6957884565cc8e8 commit 3 + | + o 2:4d01eda50c6ac5f7e89cbe1880143a32f559c302 commit 2 + | + o 1:4432d83626e8a98655f062ec1f2a43b07f7fbbb0 commit 1 + | + o 0:3390ef850073fbc2f0dfff2244342c8e9229013a commit 0 + + $ hg --debug debugindex -m + rev linkrev nodeid p1 p2 + 0 0 992f4779029a3df8d0666d00bb924f69634e2641 + 1 1 a988fb43583e871d1ed5750ee074c6d840bbbfc8 992f4779029a3df8d0666d00bb924f69634e2641 + 2 2 a8853dafacfca6fc807055a660d8b835141a3bb4 a988fb43583e871d1ed5750ee074c6d840bbbfc8 + 3 3 3fe11dfbb13645782b0addafbe75a87c210ffddc a8853dafacfca6fc807055a660d8b835141a3bb4 + + $ hg serve -p $HGPORT -d --pid-file hg.pid -E error.log + $ HGSERVEPID=`cat hg.pid` + + $ cat hg.pid > $DAEMON_PIDS + $ printf "\n" >> $DAEMON_PIDS + $ echo $MOTO_PID >> $DAEMON_PIDS + +Performing the same request twice should produce the same result, +with the first request caching the response in S3 and the second +result coming as an S3 redirect + + $ sendhttpv2peer << EOF + > command manifestdata + > nodes eval:[b'\x99\x2f\x47\x79\x02\x9a\x3d\xf8\xd0\x66\x6d\x00\xbb\x92\x4f\x69\x63\x4e\x26\x41'] + > tree eval:b'' + > fields eval:[b'parents'] + > EOF + creating http peer for wire protocol version 2 + sending manifestdata command + response: gen[ +{ + b'totalitems': 1 +}, +{ + b'node': b'\x99/Gy\x02\x9a=\xf8\xd0fm\x00\xbb\x92OicN', + b'parents': [ + b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00', +