D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3

2018-11-02 Thread sheehan (Connor Sheehan)
sheehan abandoned this revision.
sheehan added a comment.


  > Either way, we'll be deploying this to Mozilla's hg servers in the next few 
months and testing it out. Perhaps after it's been in production for some time 
we will have a stronger case for inclusion in core. :)
  
  Going to deploy and maintain this at Mozilla for the time being and consider 
moving into core at a later time.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5194

To: sheehan, #hg-reviewers
Cc: martinvonz, indygreg, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3

2018-10-31 Thread indygreg (Gregory Szorc)
indygreg added a comment.


  In https://phab.mercurial-scm.org/D5194#77606, @martinvonz wrote:
  
  > Is this useful enough to others that it should live in the hg core repo? It 
doesn't seem like it to me, but maybe I'm wrong.
  
  
  I think having plug-and-play caching solutions in the official Mercurial 
distribution would be an extremely compelling product feature. We could tell 
people "just install Mercurial and add these config options to make your server 
scale nearly effortlessly." That's a killer feature IMO.
  
  S3 is pretty popular as a key-value store and I think there is a market for 
it.
  
  Obviously other cache backends would be useful too. And if we move forward 
with cache backends in core, we should be prepared to support GCP, Redis, other 
backends. Whether those are supported in the same extension or in separate 
extensions, I'm not sure. Time will tell.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5194

To: sheehan, #hg-reviewers
Cc: martinvonz, indygreg, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3

2018-10-30 Thread sheehan (Connor Sheehan)
sheehan added a comment.


  In https://phab.mercurial-scm.org/D5194#77606, @martinvonz wrote:
  
  > Is this useful enough to others that it should live in the hg core repo? It 
doesn't seem like it to me, but maybe I'm wrong.
  
  
  My thought process was that since the new wire protocol supports caching 
command responses but does not actually provide any cache implementations, 
including some optional OOB support for something as common as S3 would be 
useful for anyone considering use of that feature.
  
  Maybe that's not enough reason to justify an extension in the core repo, I'm 
not certain. Either way, we'll be deploying this to Mozilla's hg servers in the 
next few months and testing it out. Perhaps after it's been in production for 
some time we will have a stronger case for inclusion in core. :)

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5194

To: sheehan, #hg-reviewers
Cc: martinvonz, indygreg, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3

2018-10-26 Thread martinvonz (Martin von Zweigbergk)
martinvonz added a comment.


  Is this useful enough to others that it should live in the hg core repo? It 
doesn't seem like it to me, but maybe I'm wrong.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5194

To: sheehan, #hg-reviewers
Cc: martinvonz, indygreg, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3

2018-10-26 Thread sheehan (Connor Sheehan)
sheehan added a subscriber: indygreg.
sheehan added a comment.


  Throwing this up for review now, but there are a few things that could be 
done to improve this. A cache expiration policy might be useful, but is 
difficult to test with the S3 bucket expiration rules. It may also be desirable 
to be able to specify more than one S3 bucket/region/account in the future.
  
  @indygreg will have more thoughts when he returns, I'm sure. :)

INLINE COMMENTS

> s3wireprotocache.py:185
> +def adjustcachekeystate(self, state):
> +if self.s3_endpoint_url:  # testing backdoor
> +del state[b'repo']

This is needed for determinism in testing, but there is likely a better way to 
avoid it that checking for an alternative endpoint url.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5194

To: sheehan, #hg-reviewers
Cc: indygreg, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D5194: wireprotov2: add an extension to cache wireproto v2 responses in S3

2018-10-26 Thread sheehan (Connor Sheehan)
sheehan created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  With wireprotocol version two introducing command response caching
  and enabling content redirect responses, it is possible to store
  response objects in an arbitrary blob store and send clients to
  the store to retrieve large responses. This commit adds an extension
  which implements such wire protocol caching in Amazon S3.
  
  Servers add their AWS access key and key ID to an hgrc config,
  and specify the name of the S3 bucket which holds the objects.
  When a cache lookup request comes in, the cacher sends a HEAD
  request to S3 which will return a 404 if the object does not
  exist (ie a cache miss). If the request is a cache hit, a presigned
  url for the object is generated and used to issue a content
  redirect response which is sent to the client. If the response
  indicates a cache miss, the response is generated by the server
  and buffered in the cache until `onfinished` is called. During
  `onfinished`, we calculate the size of the response and can
  optionally avoid caching if the response is below a configured
  minimum threshold. Otherwise we insert the object into the
  cache bucket using the `put_object` API.
  
  To test this extension, we require the `moto` mock AWS library.
  Specifically, we use the "standalone server" functionality,
  which creates a Flask application that imitates S3. A new hghave
  predicate is added to check for this functionality before
  testing.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D5194

AFFECTED FILES
  hgext/s3wireprotocache.py
  tests/hghave.py
  tests/test-help.t
  tests/test-s3wireprotocache.t

CHANGE DETAILS

diff --git a/tests/test-s3wireprotocache.t b/tests/test-s3wireprotocache.t
new file mode 100644
--- /dev/null
+++ b/tests/test-s3wireprotocache.t
@@ -0,0 +1,219 @@
+#require motoserver
+
+  $ . $TESTDIR/wireprotohelpers.sh
+
+Set up the mock S3 server, create a bucket
+
+  $ moto_server -p 15467 s3 >> mocks3.log 2>&1 &
+  $ MOTO_PID=$!
+  >>> import boto3
+  >>> s3 = boto3.client('s3',
+  ... aws_access_key_id='dummyaccessid',
+  ... aws_secret_access_key='dummysecretkey',
+  ... endpoint_url='http://localhost:15467/')
+  >>> _ = s3.create_bucket(
+  ... ACL='public-read',
+  ... Bucket='testbucket')
+
+  $ cat >> $HGRCPATH << EOF
+  > [extensions]
+  > blackbox =
+  > [blackbox]
+  > track = s3wireprotocache
+  > EOF
+  $ hg init server
+  $ enablehttpv2 server
+  $ cd server
+  $ cat >> .hg/hgrc << EOF
+  > [extensions]
+  > s3wireprotocache =
+  > [s3wireprotocache]
+  > access_key_id = dummyaccessid
+  > secret_access_key = dummysecretkey
+  > bucket = testbucket
+  > redirecttargets = http://localhost:15467/
+  > endpoint_url = http://localhost:15467/
+  > EOF
+
+  $ echo a0 > a
+  $ echo b0 > b
+  $ hg -q commit -A -m 'commit 0'
+  $ echo a1 > a
+  $ hg commit -m 'commit 1'
+  $ echo b1 > b
+  $ hg commit -m 'commit 2'
+  $ echo a2 > a
+  $ echo b2 > b
+  $ hg commit -m 'commit 3'
+
+  $ hg log -G -T '{rev}:{node} {desc}'
+  @  3:50590a86f3ff5d1e9a1624a7a6957884565cc8e8 commit 3
+  |
+  o  2:4d01eda50c6ac5f7e89cbe1880143a32f559c302 commit 2
+  |
+  o  1:4432d83626e8a98655f062ec1f2a43b07f7fbbb0 commit 1
+  |
+  o  0:3390ef850073fbc2f0dfff2244342c8e9229013a commit 0
+  
+  $ hg --debug debugindex -m
+ rev linkrev nodeid   p1   
p2
+   0   0 992f4779029a3df8d0666d00bb924f69634e2641 
 

+   1   1 a988fb43583e871d1ed5750ee074c6d840bbbfc8 
992f4779029a3df8d0666d00bb924f69634e2641 

+   2   2 a8853dafacfca6fc807055a660d8b835141a3bb4 
a988fb43583e871d1ed5750ee074c6d840bbbfc8 

+   3   3 3fe11dfbb13645782b0addafbe75a87c210ffddc 
a8853dafacfca6fc807055a660d8b835141a3bb4 

+
+  $ hg serve -p $HGPORT -d --pid-file hg.pid -E error.log
+  $ HGSERVEPID=`cat hg.pid`
+
+  $ cat hg.pid > $DAEMON_PIDS
+  $ printf "\n" >> $DAEMON_PIDS
+  $ echo $MOTO_PID >> $DAEMON_PIDS
+
+Performing the same request twice should produce the same result,
+with the first request caching the response in S3 and the second
+result coming as an S3 redirect
+
+  $ sendhttpv2peer << EOF
+  > command manifestdata
+  > nodes 
eval:[b'\x99\x2f\x47\x79\x02\x9a\x3d\xf8\xd0\x66\x6d\x00\xbb\x92\x4f\x69\x63\x4e\x26\x41']
+  > tree eval:b''
+  > fields eval:[b'parents']
+  > EOF
+  creating http peer for wire protocol version 2
+  sending manifestdata command
+  response: gen[
+{
+  b'totalitems': 1
+},
+{
+  b'node': b'\x99/Gy\x02\x9a=\xf8\xd0fm\x00\xbb\x92OicN',
+  b'parents': [
+
b'\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00\x00',
+