D3845: worker: support more return types in posix worker

2018-07-07 Thread durin42 (Augie Fackler)
durin42 added a comment.


  >> It's been recommended to me that we avoid the streaming flavor of
  >>  cbor, so we'd probably just do one-shot messages.
  > 
  > 
  > 
  >   Out of curiosity, could you elaborate?
  
  Here’s the relevant excerpt of the conversation:
  
  < davidben> TBH, I haven't really been impressed by CBOR. The data model is 
sane, but I think they messed up the serialization.
  < durin42> probably still better than something bespoke on balance
  < davidben> I dunno, I have approximately no filter against making bespoke 
serializations so I'm perhaps not the best judge there.
  < davidben> Though I do feel people should do it more. I had to push a team 
away from trying to use streaming CBOR when all they really needed was like a 
single length prefix. Streaming CBOR is really really bad.
  < davidben> Whatever you do, *never* use streaming CBOR. They somehow managed 
to make it worse than streaming BER which is a true accomplishment.
  < Alex_Gaynor> 👏
  < durin42> noted
  < davidben> (Problem is CBOR uses item count rather than item length so you 
need to recursively parse things to bound an element. They did it to make 
encoding easier but I think that was a mistake. Decoding is where you really 
need to worry about attacker-controlled input.)
  
  I believe davidben works on Chrome, and I can try and get a more detailed 
critique of the format if you’re interested. I mostly trust davidben’s 
judgement on things like this.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: indygreg, yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-04 Thread indygreg (Gregory Szorc)
indygreg added a comment.


  In https://phab.mercurial-scm.org/D3845#60316, @durin42 wrote:
  
  > It's been recommended to me that we avoid the streaming flavor of
  >  cbor, so we'd probably just do one-shot messages.
  
  
  Out of curiosity, could you elaborate?
  
  One of the critiques against CBOR is that naive consumption of streaming data 
types can lead to resource exhaustion. e.g. by streaming a very large byte 
string. Of course, resource exhaustion can occur without streaming as well if 
the sender sends a very large document. Parsers need to deal with resource 
exhaustion regardless.
  
  Anyway, I don't believe ``cbor2`` prevents the use of the streaming types. 
Nor does it have support for limiting bytes read. For the latter, we have 
``util.cappedreader`` which can expose a minimal wrap of a file object. But it 
needs work to be used in the context of limiting resource consumption (e.g. it 
should throw a reasonable error if an overrun is encountered).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: indygreg, yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-04 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >   Yuya, it had passed tests for me with cbor, so is that a portability 
issue?
  
  I don't think it's a portability issue. `cbor.load(StringIO(''))` doesn't
  raise EOFError.
  
  >   One of the benefits of pickle/marshal is that we don't lose information,
  > 
  > like when tuples become lists. That would be an insidious problem for 
callers.
  
  Good point.
  
  >   Also, in case it wasn't obvious, we'll need another patch to add some
  > 
  > handling of len(dumps(result)) > PIPE_BUF (which was an existing issue).
  
  Yup. A payload was much smaller than PIPE_BUF, but that's no longer true.
  
  The most straightforward fix will be to create pipe per worker and select()
  them. I don't know if there's a simpler mechanism. DGRAM socket can ensure
  message boundaries, but it's still has a size limit.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D3845: worker: support more return types in posix worker

2018-07-04 Thread Yuya Nishihara
>   Yuya, it had passed tests for me with cbor, so is that a portability issue?

I don't think it's a portability issue. `cbor.load(StringIO(''))` doesn't
raise EOFError.

>   One of the benefits of pickle/marshal is that we don't lose information,
> like when tuples become lists. That would be an insidious problem for callers.

Good point.

>   Also, in case it wasn't obvious, we'll need another patch to add some
> handling of len(dumps(result)) > PIPE_BUF (which was an existing issue).

Yup. A payload was much smaller than PIPE_BUF, but that's no longer true.

The most straightforward fix will be to create pipe per worker and select()
them. I don't know if there's a simpler mechanism. DGRAM socket can ensure
message boundaries, but it's still has a size limit.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-04 Thread hooper (Danny Hooper)
This revision was automatically updated to reflect the committed changes.
Closed by commit rHG8c38d2948217: worker: support more return types in posix 
worker (authored by hooper, committed by ).

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D3845?vs=9418&id=9429

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

AFFECTED FILES
  mercurial/worker.py

CHANGE DETAILS

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -155,8 +155,8 @@
 
 def workerfunc():
 os.close(rfd)
-for i, item in func(*(staticargs + (pargs,))):
-os.write(wfd, '%d %s\n' % (i, item))
+for result in func(*(staticargs + (pargs,))):
+os.write(wfd, util.pickle.dumps(result))
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
@@ -187,9 +187,15 @@
 os.kill(os.getpid(), -status)
 sys.exit(status)
 try:
-for line in util.iterfile(fp):
-l = line.split(' ', 1)
-yield int(l[0]), l[1][:-1]
+while True:
+try:
+yield util.pickle.load(fp)
+except EOFError:
+break
+except IOError as e:
+if e.errno == errno.EINTR:
+continue
+raise
 except: # re-raises
 killworkers()
 cleanup()



To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-03 Thread hooper (Danny Hooper)
hooper added a comment.


  Yuya, it had passed tests for me with cbor, so is that a portability issue?
  
  One of the benefits of pickle/marshal is that we don't lose information, like 
when tuples become lists. That would be an insidious problem for callers.
  
  Also, in case it wasn't obvious, we'll need another patch to add some 
handling of len(dumps(result)) > PIPE_BUF (which was an existing issue).

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-03 Thread hooper (Danny Hooper)
hooper updated this revision to Diff 9418.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D3845?vs=9401&id=9418

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

AFFECTED FILES
  mercurial/worker.py

CHANGE DETAILS

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -155,8 +155,8 @@
 
 def workerfunc():
 os.close(rfd)
-for i, item in func(*(staticargs + (pargs,))):
-os.write(wfd, '%d %s\n' % (i, item))
+for result in func(*(staticargs + (pargs,))):
+os.write(wfd, util.pickle.dumps(result))
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
@@ -187,9 +187,15 @@
 os.kill(os.getpid(), -status)
 sys.exit(status)
 try:
-for line in util.iterfile(fp):
-l = line.split(' ', 1)
-yield int(l[0]), l[1][:-1]
+while True:
+try:
+yield util.pickle.load(fp)
+except EOFError:
+break
+except IOError as e:
+if e.errno == errno.EINTR:
+continue
+raise
 except: # re-raises
 killworkers()
 cleanup()



To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-03 Thread durin42 (Augie Fackler)
durin42 added a comment.


  >   This makes me feel that pickle is "okay" tool. @durin42, any idea?
  
  I can live with pickle in that case, I guess. Sigh.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-03 Thread yuja (Yuya Nishihara)
yuja added a comment.


  > +while True:
  >  +try:
  >  +yield cbor.load(fp)
  >  +except EOFError:
  >  +break
  
  Unfortunately this doesn't work because the cbor decoder doesn't care for
  EOF. It tries to raise CBORDEcodeError and fail at `fp.tell()`. We'll have
  to either fix the upstream cbor library or duplicate some parts to cborutil.
  (or add an extra length field to feed a single chunk to `cbor.loads()`.)
  
  This makes me feel that pickle is "okay" tool. @durin42, any idea?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D3845: worker: support more return types in posix worker

2018-07-03 Thread Yuya Nishihara
> +while True:
> +try:
> +yield cbor.load(fp)
> +except EOFError:
> +break

Unfortunately this doesn't work because the cbor decoder doesn't care for
EOF. It tries to raise CBORDEcodeError and fail at `fp.tell()`. We'll have
to either fix the upstream cbor library or duplicate some parts to cborutil.
(or add an extra length field to feed a single chunk to `cbor.loads()`.)

This makes me feel that pickle is "okay" tool. @durin42, any idea?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-07-01 Thread hooper (Danny Hooper)
hooper updated this revision to Diff 9401.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D3845?vs=9364&id=9401

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

AFFECTED FILES
  mercurial/worker.py

CHANGE DETAILS

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -15,6 +15,9 @@
 import time
 
 from .i18n import _
+from .thirdparty import (
+cbor,
+)
 from . import (
 encoding,
 error,
@@ -155,8 +158,8 @@
 
 def workerfunc():
 os.close(rfd)
-for i, item in func(*(staticargs + (pargs,))):
-os.write(wfd, '%d %s\n' % (i, item))
+for result in func(*(staticargs + (pargs,))):
+os.write(wfd, cbor.dumps(result))
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
@@ -187,9 +190,15 @@
 os.kill(os.getpid(), -status)
 sys.exit(status)
 try:
-for line in util.iterfile(fp):
-l = line.split(' ', 1)
-yield int(l[0]), l[1][:-1]
+while True:
+try:
+yield cbor.load(fp)
+except EOFError:
+break
+except IOError as e:
+if e.errno == errno.EINTR:
+continue
+raise
 except: # re-raises
 killworkers()
 cleanup()



To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-06-29 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >   >   >   I'm not in love with pickle. Could we use json or cbor instead?
  >   >   
  >   >   Perhaps cbor is better since it can be streamed and the overhead is 
pretty
  >   >   low. We have to keep the message size small since rfd/wfd is a 
multi-writer
  >   >   pipe.
  >   
  >   It's been recommended to me that we avoid the streaming flavor of
  >   cbor, so we'd probably just do one-shot messages.
  
  I meant multiple one-shot messages can be serialized over the pipe. JSON 
parser
  doesn't work in that way. Each message must be written atomically.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D3845: worker: support more return types in posix worker

2018-06-29 Thread Yuya Nishihara
>   >   >   I'm not in love with pickle. Could we use json or cbor instead?
>   >   
>   >   Perhaps cbor is better since it can be streamed and the overhead is 
> pretty
>   >   low. We have to keep the message size small since rfd/wfd is a 
> multi-writer
>   >   pipe.
>   
>   It's been recommended to me that we avoid the streaming flavor of
>   cbor, so we'd probably just do one-shot messages.

I meant multiple one-shot messages can be serialized over the pipe. JSON parser
doesn't work in that way. Each message must be written atomically.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-06-29 Thread durin42 (Augie Fackler)
durin42 added a comment.


  >   >   I'm not in love with pickle. Could we use json or cbor instead?
  >   
  >   Perhaps cbor is better since it can be streamed and the overhead is pretty
  >   low. We have to keep the message size small since rfd/wfd is a 
multi-writer
  >   pipe.
  
  It's been recommended to me that we avoid the streaming flavor of
  cbor, so we'd probably just do one-shot messages.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: D3845: worker: support more return types in posix worker

2018-06-29 Thread Yuya Nishihara
>   I'm not in love with pickle. Could we use json or cbor instead?

Perhaps cbor is better since it can be streamed and the overhead is pretty
low. We have to keep the message size small since rfd/wfd is a multi-writer
pipe.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-06-29 Thread yuja (Yuya Nishihara)
yuja added a comment.


  >   I'm not in love with pickle. Could we use json or cbor instead?
  
  Perhaps cbor is better since it can be streamed and the overhead is pretty
  low. We have to keep the message size small since rfd/wfd is a multi-writer
  pipe.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: yuja, durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-06-29 Thread durin42 (Augie Fackler)
durin42 added a comment.


  I'm not in love with pickle. Could we use json or cbor instead?

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

To: hooper, #hg-reviewers
Cc: durin42, mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-06-29 Thread hooper (Danny Hooper)
hooper updated this revision to Diff 9364.

REPOSITORY
  rHG Mercurial

CHANGES SINCE LAST UPDATE
  https://phab.mercurial-scm.org/D3845?vs=9317&id=9364

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

AFFECTED FILES
  mercurial/worker.py

CHANGE DETAILS

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -155,8 +155,8 @@
 
 def workerfunc():
 os.close(rfd)
-for i, item in func(*(staticargs + (pargs,))):
-os.write(wfd, '%d %s\n' % (i, item))
+for result in func(*(staticargs + (pargs,))):
+os.write(wfd, util.pickle.dumps(result))
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
@@ -187,9 +187,15 @@
 os.kill(os.getpid(), -status)
 sys.exit(status)
 try:
-for line in util.iterfile(fp):
-l = line.split(' ', 1)
-yield int(l[0]), l[1][:-1]
+while True:
+try:
+yield util.pickle.load(fp)
+except EOFError:
+break
+except IOError as e:
+if e.errno == errno.EINTR:
+continue
+raise
 except: # re-raises
 killworkers()
 cleanup()



To: hooper, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


D3845: worker: support more return types in posix worker

2018-06-26 Thread hooper (Danny Hooper)
hooper created this revision.
Herald added a subscriber: mercurial-devel.
Herald added a reviewer: hg-reviewers.

REVISION SUMMARY
  This allows us to return things that aren't tuple(int, str) from worker
  functions. I wanted to use marshal instead of pickle, but it seems to read 
from
  the pipe in non-blocking mode, which means it stops before it sees the 
results.
  
  The windows worker already supports arbitrary return values without
  serialization, because it uses threads instead of subprocesses.

REPOSITORY
  rHG Mercurial

REVISION DETAIL
  https://phab.mercurial-scm.org/D3845

AFFECTED FILES
  mercurial/worker.py

CHANGE DETAILS

diff --git a/mercurial/worker.py b/mercurial/worker.py
--- a/mercurial/worker.py
+++ b/mercurial/worker.py
@@ -155,8 +155,8 @@
 
 def workerfunc():
 os.close(rfd)
-for i, item in func(*(staticargs + (pargs,))):
-os.write(wfd, '%d %s\n' % (i, item))
+for result in func(*(staticargs + (pargs,))):
+os.write(wfd, util.pickle.dumps(result))
 return 0
 
 ret = scmutil.callcatch(ui, workerfunc)
@@ -187,9 +187,15 @@
 os.kill(os.getpid(), -status)
 sys.exit(status)
 try:
-for line in util.iterfile(fp):
-l = line.split(' ', 1)
-yield int(l[0]), l[1][:-1]
+while True:
+try:
+yield util.pickle.load(fp)
+except EOFError:
+break
+except IOError as e:
+if e.errno == errno.EINTR:
+continue
+raise
 except: # re-raises
 killworkers()
 cleanup()



To: hooper, #hg-reviewers
Cc: mercurial-devel
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel