D3845: worker: support more return types in posix worker
durin42 added a comment. >> It's been recommended to me that we avoid the streaming flavor of >> cbor, so we'd probably just do one-shot messages. > > > > Out of curiosity, could you elaborate? Here’s the relevant excerpt of the conversation: < davidben> TBH, I haven't really been impressed by CBOR. The data model is sane, but I think they messed up the serialization. < durin42> probably still better than something bespoke on balance < davidben> I dunno, I have approximately no filter against making bespoke serializations so I'm perhaps not the best judge there. < davidben> Though I do feel people should do it more. I had to push a team away from trying to use streaming CBOR when all they really needed was like a single length prefix. Streaming CBOR is really really bad. < davidben> Whatever you do, *never* use streaming CBOR. They somehow managed to make it worse than streaming BER which is a true accomplishment. < Alex_Gaynor> 👏 < durin42> noted < davidben> (Problem is CBOR uses item count rather than item length so you need to recursively parse things to bound an element. They did it to make encoding easier but I think that was a mistake. Decoding is where you really need to worry about attacker-controlled input.) I believe davidben works on Chrome, and I can try and get a more detailed critique of the format if you’re interested. I mostly trust davidben’s judgement on things like this. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: indygreg, yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
indygreg added a comment. In https://phab.mercurial-scm.org/D3845#60316, @durin42 wrote: > It's been recommended to me that we avoid the streaming flavor of > cbor, so we'd probably just do one-shot messages. Out of curiosity, could you elaborate? One of the critiques against CBOR is that naive consumption of streaming data types can lead to resource exhaustion. e.g. by streaming a very large byte string. Of course, resource exhaustion can occur without streaming as well if the sender sends a very large document. Parsers need to deal with resource exhaustion regardless. Anyway, I don't believe ``cbor2`` prevents the use of the streaming types. Nor does it have support for limiting bytes read. For the latter, we have ``util.cappedreader`` which can expose a minimal wrap of a file object. But it needs work to be used in the context of limiting resource consumption (e.g. it should throw a reasonable error if an overrun is encountered). REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: indygreg, yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
yuja added a comment. > Yuya, it had passed tests for me with cbor, so is that a portability issue? I don't think it's a portability issue. `cbor.load(StringIO(''))` doesn't raise EOFError. > One of the benefits of pickle/marshal is that we don't lose information, > > like when tuples become lists. That would be an insidious problem for callers. Good point. > Also, in case it wasn't obvious, we'll need another patch to add some > > handling of len(dumps(result)) > PIPE_BUF (which was an existing issue). Yup. A payload was much smaller than PIPE_BUF, but that's no longer true. The most straightforward fix will be to create pipe per worker and select() them. I don't know if there's a simpler mechanism. DGRAM socket can ensure message boundaries, but it's still has a size limit. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: D3845: worker: support more return types in posix worker
> Yuya, it had passed tests for me with cbor, so is that a portability issue? I don't think it's a portability issue. `cbor.load(StringIO(''))` doesn't raise EOFError. > One of the benefits of pickle/marshal is that we don't lose information, > like when tuples become lists. That would be an insidious problem for callers. Good point. > Also, in case it wasn't obvious, we'll need another patch to add some > handling of len(dumps(result)) > PIPE_BUF (which was an existing issue). Yup. A payload was much smaller than PIPE_BUF, but that's no longer true. The most straightforward fix will be to create pipe per worker and select() them. I don't know if there's a simpler mechanism. DGRAM socket can ensure message boundaries, but it's still has a size limit. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
This revision was automatically updated to reflect the committed changes. Closed by commit rHG8c38d2948217: worker: support more return types in posix worker (authored by hooper, committed by ). REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D3845?vs=9418&id=9429 REVISION DETAIL https://phab.mercurial-scm.org/D3845 AFFECTED FILES mercurial/worker.py CHANGE DETAILS diff --git a/mercurial/worker.py b/mercurial/worker.py --- a/mercurial/worker.py +++ b/mercurial/worker.py @@ -155,8 +155,8 @@ def workerfunc(): os.close(rfd) -for i, item in func(*(staticargs + (pargs,))): -os.write(wfd, '%d %s\n' % (i, item)) +for result in func(*(staticargs + (pargs,))): +os.write(wfd, util.pickle.dumps(result)) return 0 ret = scmutil.callcatch(ui, workerfunc) @@ -187,9 +187,15 @@ os.kill(os.getpid(), -status) sys.exit(status) try: -for line in util.iterfile(fp): -l = line.split(' ', 1) -yield int(l[0]), l[1][:-1] +while True: +try: +yield util.pickle.load(fp) +except EOFError: +break +except IOError as e: +if e.errno == errno.EINTR: +continue +raise except: # re-raises killworkers() cleanup() To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
hooper added a comment. Yuya, it had passed tests for me with cbor, so is that a portability issue? One of the benefits of pickle/marshal is that we don't lose information, like when tuples become lists. That would be an insidious problem for callers. Also, in case it wasn't obvious, we'll need another patch to add some handling of len(dumps(result)) > PIPE_BUF (which was an existing issue). REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
hooper updated this revision to Diff 9418. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D3845?vs=9401&id=9418 REVISION DETAIL https://phab.mercurial-scm.org/D3845 AFFECTED FILES mercurial/worker.py CHANGE DETAILS diff --git a/mercurial/worker.py b/mercurial/worker.py --- a/mercurial/worker.py +++ b/mercurial/worker.py @@ -155,8 +155,8 @@ def workerfunc(): os.close(rfd) -for i, item in func(*(staticargs + (pargs,))): -os.write(wfd, '%d %s\n' % (i, item)) +for result in func(*(staticargs + (pargs,))): +os.write(wfd, util.pickle.dumps(result)) return 0 ret = scmutil.callcatch(ui, workerfunc) @@ -187,9 +187,15 @@ os.kill(os.getpid(), -status) sys.exit(status) try: -for line in util.iterfile(fp): -l = line.split(' ', 1) -yield int(l[0]), l[1][:-1] +while True: +try: +yield util.pickle.load(fp) +except EOFError: +break +except IOError as e: +if e.errno == errno.EINTR: +continue +raise except: # re-raises killworkers() cleanup() To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
durin42 added a comment. > This makes me feel that pickle is "okay" tool. @durin42, any idea? I can live with pickle in that case, I guess. Sigh. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
yuja added a comment. > +while True: > +try: > +yield cbor.load(fp) > +except EOFError: > +break Unfortunately this doesn't work because the cbor decoder doesn't care for EOF. It tries to raise CBORDEcodeError and fail at `fp.tell()`. We'll have to either fix the upstream cbor library or duplicate some parts to cborutil. (or add an extra length field to feed a single chunk to `cbor.loads()`.) This makes me feel that pickle is "okay" tool. @durin42, any idea? REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: D3845: worker: support more return types in posix worker
> +while True: > +try: > +yield cbor.load(fp) > +except EOFError: > +break Unfortunately this doesn't work because the cbor decoder doesn't care for EOF. It tries to raise CBORDEcodeError and fail at `fp.tell()`. We'll have to either fix the upstream cbor library or duplicate some parts to cborutil. (or add an extra length field to feed a single chunk to `cbor.loads()`.) This makes me feel that pickle is "okay" tool. @durin42, any idea? ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
hooper updated this revision to Diff 9401. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D3845?vs=9364&id=9401 REVISION DETAIL https://phab.mercurial-scm.org/D3845 AFFECTED FILES mercurial/worker.py CHANGE DETAILS diff --git a/mercurial/worker.py b/mercurial/worker.py --- a/mercurial/worker.py +++ b/mercurial/worker.py @@ -15,6 +15,9 @@ import time from .i18n import _ +from .thirdparty import ( +cbor, +) from . import ( encoding, error, @@ -155,8 +158,8 @@ def workerfunc(): os.close(rfd) -for i, item in func(*(staticargs + (pargs,))): -os.write(wfd, '%d %s\n' % (i, item)) +for result in func(*(staticargs + (pargs,))): +os.write(wfd, cbor.dumps(result)) return 0 ret = scmutil.callcatch(ui, workerfunc) @@ -187,9 +190,15 @@ os.kill(os.getpid(), -status) sys.exit(status) try: -for line in util.iterfile(fp): -l = line.split(' ', 1) -yield int(l[0]), l[1][:-1] +while True: +try: +yield cbor.load(fp) +except EOFError: +break +except IOError as e: +if e.errno == errno.EINTR: +continue +raise except: # re-raises killworkers() cleanup() To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
yuja added a comment. > > > I'm not in love with pickle. Could we use json or cbor instead? > > > > Perhaps cbor is better since it can be streamed and the overhead is pretty > > low. We have to keep the message size small since rfd/wfd is a multi-writer > > pipe. > > It's been recommended to me that we avoid the streaming flavor of > cbor, so we'd probably just do one-shot messages. I meant multiple one-shot messages can be serialized over the pipe. JSON parser doesn't work in that way. Each message must be written atomically. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: D3845: worker: support more return types in posix worker
> > > I'm not in love with pickle. Could we use json or cbor instead? > > > > Perhaps cbor is better since it can be streamed and the overhead is > pretty > > low. We have to keep the message size small since rfd/wfd is a > multi-writer > > pipe. > > It's been recommended to me that we avoid the streaming flavor of > cbor, so we'd probably just do one-shot messages. I meant multiple one-shot messages can be serialized over the pipe. JSON parser doesn't work in that way. Each message must be written atomically. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
durin42 added a comment. > > I'm not in love with pickle. Could we use json or cbor instead? > > Perhaps cbor is better since it can be streamed and the overhead is pretty > low. We have to keep the message size small since rfd/wfd is a multi-writer > pipe. It's been recommended to me that we avoid the streaming flavor of cbor, so we'd probably just do one-shot messages. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
Re: D3845: worker: support more return types in posix worker
> I'm not in love with pickle. Could we use json or cbor instead? Perhaps cbor is better since it can be streamed and the overhead is pretty low. We have to keep the message size small since rfd/wfd is a multi-writer pipe. ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
yuja added a comment. > I'm not in love with pickle. Could we use json or cbor instead? Perhaps cbor is better since it can be streamed and the overhead is pretty low. We have to keep the message size small since rfd/wfd is a multi-writer pipe. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: yuja, durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
durin42 added a comment. I'm not in love with pickle. Could we use json or cbor instead? REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 To: hooper, #hg-reviewers Cc: durin42, mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
hooper updated this revision to Diff 9364. REPOSITORY rHG Mercurial CHANGES SINCE LAST UPDATE https://phab.mercurial-scm.org/D3845?vs=9317&id=9364 REVISION DETAIL https://phab.mercurial-scm.org/D3845 AFFECTED FILES mercurial/worker.py CHANGE DETAILS diff --git a/mercurial/worker.py b/mercurial/worker.py --- a/mercurial/worker.py +++ b/mercurial/worker.py @@ -155,8 +155,8 @@ def workerfunc(): os.close(rfd) -for i, item in func(*(staticargs + (pargs,))): -os.write(wfd, '%d %s\n' % (i, item)) +for result in func(*(staticargs + (pargs,))): +os.write(wfd, util.pickle.dumps(result)) return 0 ret = scmutil.callcatch(ui, workerfunc) @@ -187,9 +187,15 @@ os.kill(os.getpid(), -status) sys.exit(status) try: -for line in util.iterfile(fp): -l = line.split(' ', 1) -yield int(l[0]), l[1][:-1] +while True: +try: +yield util.pickle.load(fp) +except EOFError: +break +except IOError as e: +if e.errno == errno.EINTR: +continue +raise except: # re-raises killworkers() cleanup() To: hooper, #hg-reviewers Cc: mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
D3845: worker: support more return types in posix worker
hooper created this revision. Herald added a subscriber: mercurial-devel. Herald added a reviewer: hg-reviewers. REVISION SUMMARY This allows us to return things that aren't tuple(int, str) from worker functions. I wanted to use marshal instead of pickle, but it seems to read from the pipe in non-blocking mode, which means it stops before it sees the results. The windows worker already supports arbitrary return values without serialization, because it uses threads instead of subprocesses. REPOSITORY rHG Mercurial REVISION DETAIL https://phab.mercurial-scm.org/D3845 AFFECTED FILES mercurial/worker.py CHANGE DETAILS diff --git a/mercurial/worker.py b/mercurial/worker.py --- a/mercurial/worker.py +++ b/mercurial/worker.py @@ -155,8 +155,8 @@ def workerfunc(): os.close(rfd) -for i, item in func(*(staticargs + (pargs,))): -os.write(wfd, '%d %s\n' % (i, item)) +for result in func(*(staticargs + (pargs,))): +os.write(wfd, util.pickle.dumps(result)) return 0 ret = scmutil.callcatch(ui, workerfunc) @@ -187,9 +187,15 @@ os.kill(os.getpid(), -status) sys.exit(status) try: -for line in util.iterfile(fp): -l = line.split(' ', 1) -yield int(l[0]), l[1][:-1] +while True: +try: +yield util.pickle.load(fp) +except EOFError: +break +except IOError as e: +if e.errno == errno.EINTR: +continue +raise except: # re-raises killworkers() cleanup() To: hooper, #hg-reviewers Cc: mercurial-devel ___ Mercurial-devel mailing list Mercurial-devel@mercurial-scm.org https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel