Re: [PATCH 07 of 10 V9] exchange: getbundle `bookmarks` part generator

2016-11-15 Thread Gregory Szorc
On Sun, Nov 13, 2016 at 2:30 AM, Stanislau Hlebik  wrote:

> # HG changeset patch
> # User Stanislau Hlebik 
> # Date 1479032456 28800
> #  Sun Nov 13 02:20:56 2016 -0800
> # Branch stable
> # Node ID 606bb4a7fb818f24d52e764828ba0d0a7921f999
> # Parent  bf21586f26e5a41f7d8bf342d4b4c16d71dbc6d2
> exchange: getbundle `bookmarks` part generator
>
> This generator will be used during pull operation.
>
> diff --git a/mercurial/exchange.py b/mercurial/exchange.py
> --- a/mercurial/exchange.py
> +++ b/mercurial/exchange.py
> @@ -1680,6 +1680,21 @@
>  if chunks:
>  bundler.newpart('hgtagsfnodes', data=''.join(chunks))
>
> +@getbundle2partsgenerator('bookmarks')
> +def _getbundlebookmarkspart(bundler, repo, source, bundlecaps=None,
> +b2caps=None, heads=None, common=None,
> +**kwargs):
> +if not kwargs.get('bookmarks'):
> +return
> +if 'bookmarks' not in b2caps:
> +raise ValueError(
> +_('bookmarks are requested but client is not capable '
> +  'of receiving it'))
> +
> +bookmarks = _getbookmarks(repo, **kwargs)
> +encodedbookmarks = bookmod.encodebookmarks(bookmarks)
> +bundler.newpart('bookmarks', data=encodedbookmarks)
> +
>  def _getbookmarks(repo, **kwargs):
>  """Returns bookmark to node mapping.
>
> diff --git a/mercurial/wireproto.py b/mercurial/wireproto.py
> --- a/mercurial/wireproto.py
> +++ b/mercurial/wireproto.py
> @@ -228,7 +228,9 @@
>   'bundlecaps': 'scsv',
>   'listkeys': 'csv',
>   'cg': 'boolean',
> - 'cbattempted': 'boolean'}
> + 'cbattempted': 'boolean',
> + 'bookmarks': 'boolean',
> +}
>

IIRC there were patches submitted last cycle around bookmark filtering -
clients saying which bookmarks they were interested in. This implies there
is advanced functionality around bookmarks exchange around the corner. So I
have reservations about a "bookmarks" argument that is simply a boolean and
doesn't allow more advanced uses later.

Anyway, does this argument to the "getbundle" wire protocol command need to
exist at all? Today, clients can ask for bookmarks data by adding a
"bookmarks" value to the "listkeys" getbundle wire protocol argument. If
the client sent a bundle2 capability that indicated it can receive the
"bookmarks" dedicated bundle2 part, then we wouldn't need to introduce a
new argument to "getbundle."

That being said, I can go both ways on this. Sometimes having an explicit
argument for "send me X" is nice to have for clarity. My final opinion
likely hinges on what other bookmarks exchange features are planned. If
"send a subset of bookmarks" or a similar feature is in the works, I'd
prefer to shoehorn that into the same getbundle argument so there is only 1
variable controlling how bookmarks are sent.



>
>  # client side
>
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 04 of 10 V9] bookmarks: introduce binary encoding

2016-11-15 Thread Gregory Szorc
On Sun, Nov 13, 2016 at 2:30 AM, Stanislau Hlebik  wrote:

> # HG changeset patch
> # User Stanislau Hlebik 
> # Date 1479032456 28800
> #  Sun Nov 13 02:20:56 2016 -0800
> # Branch stable
> # Node ID 8b6ab3b8df3aae90ac72d7fa4603e7d5b4ab55e9
> # Parent  0917ce41b232a5fc2a6e503cbad15879e037a2f9
> bookmarks: introduce binary encoding
>
> Bookmarks binary encoding will be used for `bookmarks` bundle2 part.
> The format is: <4 bytes - bookmark size, big-endian>
><1 byte - 0 if node is empty 1 otherwise><20 bytes node>
> BookmarksEncodeError and BookmarksDecodeError maybe thrown if input is
> incorrect.
>

I'm pretty happy with this series up until and including this patch.


>
> diff --git a/mercurial/bookmarks.py b/mercurial/bookmarks.py
> --- a/mercurial/bookmarks.py
> +++ b/mercurial/bookmarks.py
> @@ -8,7 +8,9 @@
>  from __future__ import absolute_import
>
>  import errno
> +import io
>  import os
> +import struct
>
>  from .i18n import _
>  from .node import (
> @@ -23,6 +25,71 @@
>  util,
>  )
>
> +_NONEMPTYNODE = struct.pack('?', False)
> +_EMPTYNODE = struct.pack('?', True)
> +
> +def _unpackbookmarksize(stream):
> +"""Returns 0 if stream is empty.
> +"""
> +
> +intstruct = struct.Struct('>i')
> +expectedsize = intstruct.size
> +encodedbookmarksize = stream.read(expectedsize)
> +if not encodedbookmarksize:
> +return 0
> +if len(encodedbookmarksize) != expectedsize:
> +raise ValueError(
> +_('cannot decode bookmark size: '
> +  'expected size: %d, actual size: %d') %
> +(expectedsize, len(encodedbookmarksize)))
> +return intstruct.unpack(encodedbookmarksize)[0]
> +
> +def encodebookmarks(bookmarks):
> +"""Encodes bookmark to node mapping to the byte string.
> +
> +Format: <4 bytes - bookmark size>
> +<1 byte - 0 if node is empty 1 otherwise><20 bytes node>
> +Node may be 20 byte binary string or empty
> +"""
> +
> +intstruct = struct.Struct('>i')
> +for bookmark, node in sorted(bookmarks.iteritems()):
> +yield intstruct.pack(len(bookmark))
> +yield encoding.fromlocal(bookmark)
> +if node:
> +if len(node) != 20:
> +raise ValueError(_('node must be 20 or bytes long'))
> +yield _NONEMPTYNODE
> +yield node
> +else:
> +yield _EMPTYNODE
> +
> +def decodebookmarks(buf):
> +"""Decodes buffer into bookmark to node mapping.
> +
> +Node is either 20 bytes or empty.
> +"""
> +
> +stream = io.BytesIO(buf)
> +bookmarks = {}
> +bookmarksize = _unpackbookmarksize(stream)
> +boolstruct = struct.Struct('?')
> +while bookmarksize:
> +bookmark = stream.read(bookmarksize)
> +if len(bookmark) != bookmarksize:
> +raise ValueError(
> +_('cannot decode bookmark: expected size: %d, '
> +'actual size: %d') % (bookmarksize, len(bookmark)))
> +bookmark = encoding.tolocal(bookmark)
> +packedemptynodeflag = stream.read(boolstruct.size)
> +emptynode = boolstruct.unpack(packedemptynodeflag)[0]
> +node = ''
> +if not emptynode:
> +node = stream.read(20)
> +bookmarks[bookmark] = node
> +bookmarksize = _unpackbookmarksize(stream)
> +return bookmarks
> +
>  def _getbkfile(repo):
>  """Hook so that extensions that mess with the store can hook bm
> storage.
>
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 7 v2] bdiff: give slight preference to longest matches in the middle of the B side

2016-11-15 Thread Gregory Szorc
On Tue, Nov 15, 2016 at 1:13 PM, Mads Kiilerich  wrote:

> On 11/15/2016 09:57 PM, Mads Kiilerich wrote:
>
>> # HG changeset patch
>> # User Mads Kiilerich 
>> # Date 1478626653 -3600
>> #  Tue Nov 08 18:37:33 2016 +0100
>> # Node ID 5ae7e8061a9671e8941a7a17e316254d228acf59
>> # Parent  5bb26f29b1509520ca3af4c540775cab50b4d6c0
>> bdiff: give slight preference to longest matches in the middle of the B
>> side
>>
>
> This and the following patches can probably be rearranged and folded to
> give less churn. I would however appreciate to get another round of
> thorough review with this structure - that seems to me to give a more
> natural progression.
>
> The benefit from these changes is mainly "better diffs". Better diffs do
> not necessarily compress better and there might be some small increases in
> actual diff size. I have not noticed any significant performance changes.
>
> Greg, can you verify it doesn't impact your bdiff benchmarks in your
> environment in a bad way?
>

I /think/ my perfbdiff changes have all mostly landed. So you should be
able to perform benchmarks. I've been using
https://hg.mozilla.org/mozilla-unified for many measurements.

If you want me to double verify, could you please push this series to a
repository somewhere? I don't have Mercurial configured to apply patches
from the mailing list and don't want to go through the hassle of doing that
:(
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range

2016-11-15 Thread Gregory Szorc
On Tue, Nov 15, 2016 at 1:06 PM, Mads Kiilerich  wrote:

> On 11/12/2016 03:58 PM, Jun Wu wrote:
>
>> Excerpts from Mads Kiilerich's message of 2016-11-03 22:34:15 +0100:
>>
>>> # HG changeset patch
>>> # User Mads Kiilerich 
>>> # Date 1478208837 -3600
>>> #  Thu Nov 03 22:33:57 2016 +0100
>>> # Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1
>>> # Parent  3e0216b2a0995cb21946bc13fb21391013332c57
>>> bdiff: make sure we append to repeated lines instead of inserting into
>>> range
>>>
>>> This will mitigate the symptoms that tests exposed in the previous
>>> changeset.
>>>
>>> Arguably, we need similar handling for longer sequences of repeated
>>> lines...
>>>
>> I'm concerned about this. This patch looks like a workaround for the
>> simplest case. While you are aware it could have issues with "longer
>> sequences of repeated lines". And that's not that uncommon. There are
>> real-world JSON files like:
>>
>>[
>>  ...
>>  {
>>"a": "b",
>>  },
>>  {
>>"a": "b",
>>  },
>>  {
>>"a": "b",
>>  },
>>  ...
>>]
>>
>> Actually, those kinds of JSON files are part of the motivation of mpm's
>> f1ca249696ed. This series would surely affect them, while I'm not sure the
>> change is in a good way or not yet.
>>
>> Note that after f1ca249696ed, there are still unsolved issues with some
>> large JSON files - we have seen (from user report) diff output being
>> "wrong", where human could generate much smaller diff easily.
>>
>> If you are not in a hurry, I'd like to learn this area deeper before
>> commenting confidently. That may take me 1 to 2 weeks.
>>
>
> The test case I added was made to cover the same problem as your JSON
> example. I think v2 of my bdiff patch series addresses the concerns you
> mention in a reasonable way.
>
> Sure, I would appreciate to get more thorough review and feedback and test
> cases. I page in and out and have no hurry ... except that Greg is making
> rapid progress in the same code ;-)


I wouldn't worry too much about me :)

My profiling shows we're still spending upwards of 2/3 of CPU time in
bdiff_splitlines. This is code that runs *before* the actual diff algorithm
parts that you are touching. As long as CPU time is dominated by obtaining
state for the diff algorithm, I'll likely stay away from that code and
anything I do shouldn't interfere too much with your work.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2] commands: introduce `hg display`

2016-11-15 Thread Gregory Szorc
On Thu, Nov 10, 2016 at 9:44 AM, Pierre-Yves David <
pierre-yves.da...@ens-lyon.org> wrote:

>
>
> On 11/08/2016 03:50 AM, Gregory Szorc wrote:
>
>> On Mon, Nov 7, 2016 at 4:41 AM, Pierre-Yves David
>> >
>> wrote:
>>
>> On 11/05/2016 07:26 PM, Gregory Szorc wrote:
>>
>> # HG changeset patch
>> # User Gregory Szorc > >
>> # Date 1478370119 25200
>> #  Sat Nov 05 11:21:59 2016 -0700
>> # Node ID 0ebc1e627383bdf017f509a60725c0b39d6d40d9
>> # Parent  4bce42e7330311b58ecd38a4acbcd2d2dd1351ba
>> commands: introduce `hg display`
>>
>>
>> I've not looked into this in details yet¹, but is looks like it
>> should be flagged (EXPERIMENTAL) and having some plan page could be
>> useful.
>>
>>
>> I'm fine marking it as experimental. How do you want that done?
>>
>
> As for everything else, having "(EXPERIMENTAL)" on the first line should
> do it.
>
> Regarding the plan page, this proposed command was discussed and had
>> buy-in at the 4.0 Sprint. I believe the commit message captures most of
>> the important details of that discussion.
>>
> >
>
>> I'm not sure what benefit a wiki page will provide. If it is capturing
>> planned display views, I can list those here:
>>
>> * Things we have read-only commands for today (tags, branches,
>> bookmarks, possibly qseries)
>> * inprogress/smartlog/wip/unfinished (whatever we call it)
>> * stack
>> * Anything else people think of
>>
>
> You certainly have buy in the the general idea, I expect many details to
> be ironed out. A wiki page will allow us to discuss and track many of them:
>
> - Possible targets,
> - Possible output,
> - backward compatibility story,
> - progress tracking,
> - name discussion (because apparently some people have idea)
> - etc,
>
> Cheers and extra thank for tackling this,
>

https://www.mercurial-scm.org/wiki/DisplayCommandPlan now exists.

What are the next plans for this series? I'd *really* like to get something
landed because I feel it could benefit from us developers using it during
the 4.1 cycle. This feature is not something I feel comfortable landing
towards the end of a cycle and then shipping: it needs time to bake.

Perhaps we could take a laissez faire attitude and let it exist in @
enabled by default and make a shipping decision towards the end of the
release cycle?
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2 V3] perf: add asv benchmarks

2016-11-15 Thread Gregory Szorc
On Tue, Nov 15, 2016 at 7:55 AM, Philippe Pepiot  wrote:

> # HG changeset patch
> # User Philippe Pepiot 
> # Date 1475136994 -7200
> #  Thu Sep 29 10:16:34 2016 +0200
> # Node ID 94e48d7dc9630543e0f4179b2ca96f3c66967f6a
> # Parent  ab6e50ddc2c56dcf170991293005be6d6f80a232
> perf: add asv benchmarks
>
> Airspeed velocity (ASV) is a python framework for benchmarking Python
> packages
> over their lifetime. The results are displayed in an interactive web
> frontend.
>
> Add ASV benchmarks for mercurial that use contrib/perf.py extension that
> could
> be run against multiple reference repositories.
>
> The benchmark suite now includes revsets from contrib/base-revsets.txt with
> variants, perftags, perfstatus, perfmanifest and perfheads.
>
> Installation requires asv>=0.2, python-hglib and virtualenv
>
> This is part of PerformanceTrackingSuitePlan
> https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan
>
> diff --git a/.hgignore b/.hgignore
> --- a/.hgignore
> +++ b/.hgignore
> @@ -49,6 +49,7 @@ mercurial.egg-info
>  tags
>  cscope.*
>  .idea/*
> +.asv/*
>  i18n/hg.pot
>  locale/*/LC_MESSAGES/hg.mo
>  hgext/__index__.py
> diff --git a/contrib/asv.conf.json b/contrib/asv.conf.json
> new file mode 100644
> --- /dev/null
> +++ b/contrib/asv.conf.json
> @@ -0,0 +1,127 @@
> +{
> +// The version of the config file format.  Do not change, unless
> +// you know what you are doing.
> +"version": 1,
>

I'll pretend that a custom JSON parser that recognizes a non-standard
comment syntax doesn't exist :)


> +
> +// The name of the project being benchmarked
> +"project": "mercurial",
> +
> +// The project's homepage
> +"project_url": "http://mercurial-scm.org/;,
>

https:/


> +
> +// The URL or local path of the source code repository for the
> +// project being benchmarked
> +"repo": "..",
> +
> +// List of branches to benchmark. If not provided, defaults to
> "master"
> +// (for git) or "default" (for mercurial).
> +// "branches": ["master"], // for git
> +// "branches": ["default"],// for mercurial
> +"branches": ["default", "stable"],
> +
> +// The DVCS being used.  If not set, it will be automatically
> +// determined from "repo" by looking at the protocol in the URL
> +// (if remote), or by looking for special directories, such as
> +// ".git" (if local).
> +// "dvcs": "git",
>

This block can likely be deleted.


> +
> +// The tool to use to create environments.  May be "conda",
> +// "virtualenv" or other value depending on the plugins in use.
> +// If missing or the empty string, the tool will be automatically
> +// determined by looking for tools on the PATH environment
> +// variable.
> +"environment_type": "virtualenv",
> +
> +// the base URL to show a commit for the project.
> +"show_commit_url": "https://www.mercurial-scm.org/repo/hg/rev/;,
> +
> +// The Pythons you'd like to test against.  If not provided, defaults
> +// to the current version of Python used to run `asv`.
> +// "pythons": ["2.7", "3.3"],
> +
> +// The matrix of dependencies to test.  Each key is the name of a
> +// package (in PyPI) and the values are version numbers.  An empty
> +// list or empty string indicates to just test against the default
> +// (latest) version. null indicates that the package is to not be
> +// installed. If the package to be tested is only available from
> +// PyPi, and the 'environment_type' is conda, then you can preface
> +// the package name by 'pip+', and the package will be installed via
> +// pip (with all the conda available packages installed first,
> +// followed by the pip installed packages).
> +//
> +// "matrix": {
> +// "numpy": ["1.6", "1.7"],
> +// "six": ["", null],// test with and without six
> installed
> +// "pip+emcee": [""],   // emcee is only available for install
> with pip.
> +// },
> +
> +// Combinations of libraries/python versions can be excluded/included
> +// from the set to test. Each entry is a dictionary containing
> additional
> +// key-value pairs to include/exclude.
> +//
> +// An exclude entry excludes entries where all values match. The
> +// values are regexps that should match the whole string.
> +//
> +// An include entry adds an environment. Only the packages listed
> +// are installed. The 'python' key is required. The exclude rules
> +// do not apply to includes.
> +//
> +// In addition to package names, the following keys are available:
> +//
> +// - python
> +// Python version, as in the *pythons* variable above.
> +// - environment_type
> +// Environment type, as above.
> +// - sys_platform
> +// Platform, as in sys.platform. Possible values for the common
> +// cases: 'linux2', 'win32', 

Re: [PATCH 5 of 5 rfc] bdiff: make sure we append to repeated lines instead of inserting into range

2016-11-15 Thread Mads Kiilerich

On 11/12/2016 03:58 PM, Jun Wu wrote:

Excerpts from Mads Kiilerich's message of 2016-11-03 22:34:15 +0100:

# HG changeset patch
# User Mads Kiilerich 
# Date 1478208837 -3600
#  Thu Nov 03 22:33:57 2016 +0100
# Node ID be83c5f4ec8931cb7e771a80a3ef5e2042a005c1
# Parent  3e0216b2a0995cb21946bc13fb21391013332c57
bdiff: make sure we append to repeated lines instead of inserting into range

This will mitigate the symptoms that tests exposed in the previous changeset.

Arguably, we need similar handling for longer sequences of repeated lines...

I'm concerned about this. This patch looks like a workaround for the
simplest case. While you are aware it could have issues with "longer
sequences of repeated lines". And that's not that uncommon. There are
real-world JSON files like:

   [
 ...
 {
   "a": "b",
 },
 {
   "a": "b",
 },
 {
   "a": "b",
 },
 ...
   ]

Actually, those kinds of JSON files are part of the motivation of mpm's
f1ca249696ed. This series would surely affect them, while I'm not sure the
change is in a good way or not yet.

Note that after f1ca249696ed, there are still unsolved issues with some
large JSON files - we have seen (from user report) diff output being
"wrong", where human could generate much smaller diff easily.

If you are not in a hurry, I'd like to learn this area deeper before
commenting confidently. That may take me 1 to 2 weeks.


The test case I added was made to cover the same problem as your JSON 
example. I think v2 of my bdiff patch series addresses the concerns you 
mention in a reasonable way.


Sure, I would appreciate to get more thorough review and feedback and 
test cases. I page in and out and have no hurry ... except that Greg is 
making rapid progress in the same code ;-)


/Mads
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 7 v2] tests: explore some bdiff cases

2016-11-15 Thread Mads Kiilerich
# HG changeset patch
# User Mads Kiilerich 
# Date 1478626653 -3600
#  Tue Nov 08 18:37:33 2016 +0100
# Node ID 0f97aaa1b3f5c25e02296c73e6e93d77a197910c
# Parent  b7354208678098580f6e5f975d734c1e8dd9d846
tests: explore some bdiff cases

diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py
--- a/tests/test-bdiff.py
+++ b/tests/test-bdiff.py
@@ -78,3 +78,17 @@ testfixws("", "", 1)
 testfixws("", "", 0)
 
 print("done")
+
+print("Odd diff for a trivial change:")
+showdiff(
+''.join('<%s\n-\n' % i for i in range(5)),
+''.join('>%s\n-\n' % i for i in range(5)))
+
+print("Diff 1 to 3 lines - preference for adding / removing at the end of 
sequences:")
+showdiff('a\n', 'a\n' * 3)
+print("Diff 1 to 5 lines - preference for adding / removing at the end of 
sequences:")
+showdiff('a\n', 'a\n' * 5)
+print("Diff 3 to 1 lines - preference for adding / removing at the end of 
sequences:")
+showdiff('a\n' * 3, 'a\n')
+print("Diff 5 to 1 lines - this diff seems weird:")
+showdiff('a\n' * 5, 'a\n')
diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out
--- a/tests/test-bdiff.py.out
+++ b/tests/test-bdiff.py.out
@@ -42,3 +42,41 @@ showdiff(
  'f\n'
 done
 done
+Odd diff for a trivial change:
+showdiff(
+  '<0\n-\n<1\n-\n<2\n-\n<3\n-\n<4\n-\n',
+  '>0\n-\n>1\n-\n>2\n-\n>3\n-\n>4\n-\n'):
+ 0 8 '<0\n-\n<1\n' -> '>0\n'
+ '-\n'
+ 10 13 '<2\n' -> '>1\n'
+ '-\n'
+ 15 18 '<3\n' -> '>2\n'
+ '-\n'
+ 20 23 '<4\n' -> '>3\n'
+ '-\n'
+ 25 25 '' -> '>4\n-\n'
+Diff 1 to 3 lines - preference for adding / removing at the end of sequences:
+showdiff(
+  'a\n',
+  'a\na\na\n'):
+ 'a\n'
+ 2 2 '' -> 'a\na\n'
+Diff 1 to 5 lines - preference for adding / removing at the end of sequences:
+showdiff(
+  'a\n',
+  'a\na\na\na\na\n'):
+ 'a\n'
+ 2 2 '' -> 'a\na\na\na\n'
+Diff 3 to 1 lines - preference for adding / removing at the end of sequences:
+showdiff(
+  'a\na\na\n',
+  'a\n'):
+ 'a\n'
+ 2 6 'a\na\n' -> ''
+Diff 5 to 1 lines - this diff seems weird:
+showdiff(
+  'a\na\na\na\na\n',
+  'a\n'):
+ 0 2 'a\n' -> ''
+ 'a\n'
+ 4 10 'a\na\na\n' -> ''
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 2] manifest: make revlog verification optional

2016-11-15 Thread Martin von Zweigbergk via Mercurial-devel
On Tue, Nov 15, 2016 at 9:59 AM, Gregory Szorc  wrote:
> On Mon, Nov 14, 2016 at 3:27 PM, Durham Goode  wrote:
>>
>> # HG changeset patch
>> # User Durham Goode 
>> # Date 1479165447 28800
>> #  Mon Nov 14 15:17:27 2016 -0800
>> # Node ID 27209d52a5865422c5ef4ba05cedb28ce32919ed
>> # Parent  046a7e828ea63ec940ffae1089a33fae7954da2e
>> manifest: make revlog verification optional
>>
>> This patches adds an parameter to manifestlog.get() to disable hash
>> checking.
>> This will be used in an upcoming patch to support treemanifestctx reading
>> sub-trees without loading them from the revlog. (This is already supported
>> but
>> does not go through the manifestlog.get() code path)
>
>
> I could leverage this on the base revlog class because `hg debugupgraderepo`
> can spend >50% of its CPU time doing SHA-1 verification (most of that when
> converting manifests - there are tens of gigabytes of raw manifest text that
> needs to be hashed when converting the revlog). That could be a follow-up,
> of course.

I suspect you thought the "revlog verification" was about hash
verification. So did I when read the subject line. I was confused why
that would be relevant, but then I looked at the patch and forgot
about that. It's queued already, so it doesn't seem worth updating it
at this point.

>
>>
>>
>> diff --git a/mercurial/manifest.py b/mercurial/manifest.py
>> --- a/mercurial/manifest.py
>> +++ b/mercurial/manifest.py
>> @@ -1278,9 +1278,12 @@ class manifestlog(object):
>>  """
>>  return self.get('', node)
>>
>> -def get(self, dir, node):
>> +def get(self, dir, node, verify=True):
>>  """Retrieves the manifest instance for the given node. Throws a
>>  LookupError if not found.
>> +
>> +`verify` - if True an exception will be thrown if the node is not
>> in
>> +   the revlog
>>  """
>>  if node in self._dirmancache.get(dir, ()):
>>  cachemf = self._dirmancache[dir][node]
>> @@ -1292,19 +1295,21 @@ class manifestlog(object):
>>
>>  if dir:
>>  if self._revlog._treeondisk:
>> -dirlog = self._revlog.dirlog(dir)
>> -if node not in dirlog.nodemap:
>> -raise LookupError(node, dirlog.indexfile,
>> -  _('no node'))
>> +if verify:
>> +dirlog = self._revlog.dirlog(dir)
>> +if node not in dirlog.nodemap:
>> +raise LookupError(node, dirlog.indexfile,
>> +  _('no node'))
>>  m = treemanifestctx(self._repo, dir, node)
>>  else:
>>  raise error.Abort(
>>  _("cannot ask for manifest directory '%s' in a
>> flat "
>>"manifest") % dir)
>>  else:
>> -if node not in self._revlog.nodemap:
>> -raise LookupError(node, self._revlog.indexfile,
>> -  _('no node'))
>> +if verify:
>> +if node not in self._revlog.nodemap:
>> +raise LookupError(node, self._revlog.indexfile,
>> +  _('no node'))
>>  if self._treeinmem:
>>  m = treemanifestctx(self._repo, '', node)
>>  else:
>> ___
>> Mercurial-devel mailing list
>> Mercurial-devel@mercurial-scm.org
>> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
>
>
> ___
> Mercurial-devel mailing list
> Mercurial-devel@mercurial-scm.org
> https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel
>
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 1 of 7 v2] tests: make test-bdiff.py easier to maintain

2016-11-15 Thread Mads Kiilerich
# HG changeset patch
# User Mads Kiilerich 
# Date 1479243409 -3600
#  Tue Nov 15 21:56:49 2016 +0100
# Node ID b7354208678098580f6e5f975d734c1e8dd9d846
# Parent  e1d6aa0e4c3aed73e0dc523b8a8fd5f9fe23510a
tests: make test-bdiff.py easier to maintain

Add more stdout logging to help navigate the .out file.

diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py
--- a/tests/test-bdiff.py
+++ b/tests/test-bdiff.py
@@ -11,13 +11,12 @@ def test1(a, b):
 if d:
 c = mpatch.patches(a, [d])
 if c != b:
-print("***", repr(a), repr(b))
-print("bad:")
-print(repr(c)[:200])
-print(repr(d))
+print("bad diff+patch result from\n  %r to\n  %r:" % (a, b))
+print("bdiff: %r" % d)
+print("patched: %r" % c[:200])
 
 def test(a, b):
-print("***", repr(a), repr(b))
+print("test", repr(a), repr(b))
 test1(a, b)
 test1(b, a)
 
@@ -43,13 +42,21 @@ test("a\nb", "a\nb")
 
 #issue1295
 def showdiff(a, b):
+print('showdiff(\n  %r,\n  %r):' % (a, b))
 bin = bdiff.bdiff(a, b)
 pos = 0
+q = 0
 while pos < len(bin):
 p1, p2, l = struct.unpack(">lll", bin[pos:pos + 12])
 pos += 12
-print(p1, p2, repr(bin[pos:pos + l]))
+if p1:
+print('', repr(a[q:p1]))
+print('', p1, p2, repr(a[p1:p2]), '->', repr(bin[pos:pos + l]))
 pos += l
+q = p2
+if q < len(a):
+print('', repr(a[q:]))
+
 showdiff("x\n\nx\n\nx\n\nx\n\nz\n", "x\n\nx\n\ny\n\nx\n\nx\n\nz\n")
 showdiff("x\n\nx\n\nx\n\nx\n\nz\n", "x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n")
 # we should pick up abbbc. rather than bc.de as the longest match
diff --git a/tests/test-bdiff.py.out b/tests/test-bdiff.py.out
--- a/tests/test-bdiff.py.out
+++ b/tests/test-bdiff.py.out
@@ -1,27 +1,44 @@
-*** 'a\nc\n\n\n\n' 'a\nb\n\n\n'
-*** 'a\nb\nc\n' 'a\nc\n'
-*** '' ''
-*** 'a\nb\nc' 'a\nb\nc'
-*** 'a\nb\nc\nd\n' 'a\nd\n'
-*** 'a\nb\nc\nd\n' 'a\nc\ne\n'
-*** 'a\nb\nc\n' 'a\nc\n'
-*** 'a\n' 'c\na\nb\n'
-*** 'a\n' ''
-*** 'a\n' 'b\nc\n'
-*** 'a\n' 'c\na\n'
-*** '' 'adjfkjdjksdhfksj'
-*** '' 'ab'
-*** '' 'abc'
-*** 'a' 'a'
-*** 'ab' 'ab'
-*** 'abc' 'abc'
-*** 'a\n' 'a\n'
-*** 'a\nb' 'a\nb'
-6 6 'y\n\n'
-6 6 'y\n\n'
-9 9 'y\n\n'
-0 0 'a\nb\nb\n'
-12 12 'b\nc\n.\n'
-16 18 ''
+test 'a\nc\n\n\n\n' 'a\nb\n\n\n'
+test 'a\nb\nc\n' 'a\nc\n'
+test '' ''
+test 'a\nb\nc' 'a\nb\nc'
+test 'a\nb\nc\nd\n' 'a\nd\n'
+test 'a\nb\nc\nd\n' 'a\nc\ne\n'
+test 'a\nb\nc\n' 'a\nc\n'
+test 'a\n' 'c\na\nb\n'
+test 'a\n' ''
+test 'a\n' 'b\nc\n'
+test 'a\n' 'c\na\n'
+test '' 'adjfkjdjksdhfksj'
+test '' 'ab'
+test '' 'abc'
+test 'a' 'a'
+test 'ab' 'ab'
+test 'abc' 'abc'
+test 'a\n' 'a\n'
+test 'a\nb' 'a\nb'
+showdiff(
+  'x\n\nx\n\nx\n\nx\n\nz\n',
+  'x\n\nx\n\ny\n\nx\n\nx\n\nz\n'):
+ 'x\n\nx\n\n'
+ 6 6 '' -> 'y\n\n'
+ 'x\n\nx\n\nz\n'
+showdiff(
+  'x\n\nx\n\nx\n\nx\n\nz\n',
+  'x\n\nx\n\ny\n\nx\n\ny\n\nx\n\nz\n'):
+ 'x\n\nx\n\n'
+ 6 6 '' -> 'y\n\n'
+ 'x\n\n'
+ 9 9 '' -> 'y\n\n'
+ 'x\n\nz\n'
+showdiff(
+  'a\nb\nb\nb\nc\n.\nd\ne\n.\nf\n',
+  'a\nb\nb\na\nb\nb\nb\nc\n.\nb\nc\n.\nd\ne\nf\n'):
+ 0 0 '' -> 'a\nb\nb\n'
+ 'a\nb\nb\nb\nc\n.\n'
+ 12 12 '' -> 'b\nc\n.\n'
+ 'd\ne\n'
+ 16 18 '.\n' -> ''
+ 'f\n'
 done
 done
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 5 of 7 v2] bdiff: give slight preference to longest matches in the middle of the B side

2016-11-15 Thread Mads Kiilerich
# HG changeset patch
# User Mads Kiilerich 
# Date 1478626653 -3600
#  Tue Nov 08 18:37:33 2016 +0100
# Node ID 5ae7e8061a9671e8941a7a17e316254d228acf59
# Parent  5bb26f29b1509520ca3af4c540775cab50b4d6c0
bdiff: give slight preference to longest matches in the middle of the B side

We already have a slight preference for matches close to the middle on the A
side. Now, do the same on the B side.

j is iterating the b range backwards and we thus accept a new j if the previous
match was in the upper half.

This makes the test-bhalf diff "correct". It obviously also gives more
preference to balanced recursion than to appending to sequences. That is kind
of correct, but will also unfortunately make some bundles bigger. No doubt, we
can also create examples where it will make them smaller ...

The bundle size for 4.0 (hg bundle --base null -r 4.0 x.hg) happens to go from
22803824 to 22806817 bytes - an 0.01% increase.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -143,7 +143,7 @@ static int longest_match(struct bdiff_li
struct pos *pos,
 int a1, int a2, int b1, int b2, int *omi, int *omj)
 {
-   int mi = a1, mj = b1, mk = 0, i, j, k, half;
+   int mi = a1, mj = b1, mk = 0, i, j, k, half, bhalf;
 
/* window our search on large regions to better bound
   worst-case performance. by choosing a window at the end, we
@@ -152,6 +152,7 @@ static int longest_match(struct bdiff_li
a1 = a2 - 3;
 
half = (a1 + a2 - 1) / 2;
+   bhalf = (b1 + b2 - 1) / 2;
 
for (i = a1; i < a2; i++) {
/* skip all lines in b after the current block */
@@ -187,8 +188,8 @@ static int longest_match(struct bdiff_li
/* same match but closer to half */
mi = i;
mj = j;
-   } else if (i == mi) {
-   /* same i but earlier j */
+   } else if (i == mi && mj > bhalf) {
+   /* same i but best earlier j */
mj = j;
}
}
diff --git a/tests/test-annotate.t b/tests/test-annotate.t
--- a/tests/test-annotate.t
+++ b/tests/test-annotate.t
@@ -91,9 +91,9 @@ annotate (JSON)
 annotate -n b
 
   $ hg annotate -n b
+  1: a
   0: a
   1: a
-  1: a
   3: b4
   3: b5
   3: b6
@@ -111,8 +111,8 @@ annotate --no-follow b
 annotate -nl b
 
   $ hg annotate -nl b
+  1:1: a
   0:1: a
-  1:2: a
   1:3: a
   3:4: b4
   3:5: b5
@@ -121,9 +121,9 @@ annotate -nl b
 annotate -nf b
 
   $ hg annotate -nf b
+  1 a: a
   0 a: a
   1 a: a
-  1 a: a
   3 b: b4
   3 b: b5
   3 b: b6
@@ -131,8 +131,8 @@ annotate -nf b
 annotate -nlf b
 
   $ hg annotate -nlf b
+  1 a:1: a
   0 a:1: a
-  1 a:2: a
   1 a:3: a
   3 b:4: b4
   3 b:5: b5
@@ -156,9 +156,9 @@ annotate -nlf b
 annotate after merge
 
   $ hg annotate -nf b
+  1 a: a
   0 a: a
   1 a: a
-  1 a: a
   3 b: b4
   4 b: c
   3 b: b5
@@ -166,8 +166,8 @@ annotate after merge
 annotate after merge with -l
 
   $ hg annotate -nlf b
+  1 a:1: a
   0 a:1: a
-  1 a:2: a
   1 a:3: a
   3 b:4: b4
   4 b:5: c
@@ -198,7 +198,7 @@ annotate after merge with -l
 annotate after rename merge
 
   $ hg annotate -nf b
-  0 a: a
+  1 a: a
   6 b: z
   1 a: a
   3 b: b4
@@ -209,7 +209,7 @@ annotate after rename merge
 annotate after rename merge with -l
 
   $ hg annotate -nlf b
-  0 a:1: a
+  1 a:1: a
   6 b:2: z
   1 a:3: a
   3 b:4: b4
@@ -226,7 +226,7 @@ Issue2807: alignment of line numbers wit
   $ echo more >> b
   $ hg ci -mmore -d '7 0'
   $ hg annotate -nlf b
-   0 a: 1: a
+   1 a: 1: a
6 b: 2: z
1 a: 3: a
3 b: 4: b4
@@ -240,15 +240,15 @@ Issue2807: alignment of line numbers wit
 linkrev vs rev
 
   $ hg annotate -r tip -n a
+  1: a
   0: a
   1: a
-  1: a
 
 linkrev vs rev with -l
 
   $ hg annotate -r tip -nl a
+  1:1: a
   0:1: a
-  1:2: a
   1:3: a
 
 Issue589: "undelete" sequence leads to crash
diff --git a/tests/test-bdiff.py b/tests/test-bdiff.py
--- a/tests/test-bdiff.py
+++ b/tests/test-bdiff.py
@@ -79,14 +79,14 @@ testfixws("", "", 0)
 
 print("done")
 
-print("Odd diff for a trivial change:")
+print("Nice diff for a trivial change:")
 showdiff(
 ''.join('<%s\n-\n' % i for i in range(5)),
 ''.join('>%s\n-\n' % i for i in range(5)))
 
-print("Diff 1 to 3 lines - preference for adding / removing at the end of 
sequences:")
+print("Diff 1 to 3 lines - preference for balanced recursion:")
 showdiff('a\n', 'a\n' * 3)
-print("Diff 1 to 5 lines - preference for adding / removing at the end of 
sequences:")
+print("Diff 1 to 5 lines - preference for balanced recursion:")
 showdiff('a\n', 'a\n' * 5)
 print("Diff 3 to 1 lines - preference for balanced recursion:")
 showdiff('a\n' * 3, 

[PATCH 4 of 7 v2] bdiff: rearrange the "better longest match" code

2016-11-15 Thread Mads Kiilerich
# HG changeset patch
# User Mads Kiilerich 
# Date 1478626653 -3600
#  Tue Nov 08 18:37:33 2016 +0100
# Node ID 5bb26f29b1509520ca3af4c540775cab50b4d6c0
# Parent  fbce612c9be26860338f6ba4ab5df819b6d79521
bdiff: rearrange the "better longest match" code

This is primarily to make the code more managable and prepare for later
changes.

More specific assignments might also be slightly faster, even thought it also
might generate a bit more code.

diff --git a/mercurial/bdiff.c b/mercurial/bdiff.c
--- a/mercurial/bdiff.c
+++ b/mercurial/bdiff.c
@@ -177,10 +177,20 @@ static int longest_match(struct bdiff_li
 
/* best match so far? we prefer matches closer
   to the middle to balance recursion */
-   if (k > mk || (k == mk && (i <= mi || i <= half))) {
+   if (k > mk) {
+   /* a longer match */
mi = i;
mj = j;
mk = k;
+   } else if (k == mk) {
+   if (i > mi && i <= half) {
+   /* same match but closer to half */
+   mi = i;
+   mj = j;
+   } else if (i == mi) {
+   /* same i but earlier j */
+   mj = j;
+   }
}
}
}
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH V2] util: improve iterfile so it chooses code path wisely

2016-11-15 Thread Jun Wu
# HG changeset patch
# User Jun Wu 
# Date 1479241551 0
#  Tue Nov 15 20:25:51 2016 +
# Node ID f3d2f4ebc4006043684db52e4487756dd4e2d238
# Parent  d1a0a64f6e16432333bea0476098c46a61222b9b
# Available At https://bitbucket.org/quark-zju/hg-draft
#  hg pull https://bitbucket.org/quark-zju/hg-draft -r f3d2f4ebc400
util: improve iterfile so it chooses code path wisely

We have performance concerns on "iterfile" as it is 4X slower on normal
files. While modern systems have the nice property that reading a "fast"
(on-disk) file cannot be interrupted and should be made use of.

This patch dumps the related knowledge in comments. And "iterfile" chooses
code paths wisely:
1. If fp is a normal file, use the fast path.
2. If it's CPython 3, or PyPY, use the fast path.
3. If fp is not a normal file and CPython version >= 2.7.4, use the same
   workaround (4x slower) as before.
4. If fp is not a normal file and CPython version < 2.7.4, use another
   workaround (2x slower but may block longer then necessary) which
   basically re-invents the buffer + readline logic in Python.

This will give us good confidence on both correctness and performance
dealing with EINTR in iterfile(fp) for all known Python versions.

diff --git a/mercurial/util.py b/mercurial/util.py
--- a/mercurial/util.py
+++ b/mercurial/util.py
@@ -25,8 +25,10 @@ import hashlib
 import imp
 import os
+import platform as pyplatform
 import re as remod
 import shutil
 import signal
 import socket
+import stat
 import string
 import subprocess
@@ -2191,8 +2193,71 @@ def wrap(line, width, initindent='', han
 return wrapper.fill(line).encode(encoding.encoding)
 
-def iterfile(fp):
-"""like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is
-known to have such issues."""
-return iter(fp.readline, '')
+if (pyplatform.python_implementation() == 'CPython' and
+sys.version_info < (3, 0)):
+# There is an issue in CPython that some IO methods do not handle EINTR
+# correctly. The following table shows what CPython version (and functions)
+# are affected (Y: has the EINTR bug, N: otherwise):
+#
+#| < 2.7.4 | 2.7.4 to 2.7.12 | >= 3.0
+#   --
+#fp.__iter__ | Y   | Y   | N
+#fp.read*| Y   | N [1]   | N
+#
+# [1]: fixed by hg changeset 67dc99a989cd.
+#
+# Here we workaround the EINTR issue for fileobj.__iter__. Other methods
+# like "read*" are ignored for now, as Python < 2.7.4 is a minority.
+#
+# Although we can workaround the EINTR issue for fp.__iter__, it is slower:
+# "for x in fp" is 4x faster than "for x in iter(fp.readline, '')" in
+# CPython 2, because the latter maintains an internal readahead buffer.
+#
+# On modern systems like Linux, the "read" syscall cannot be interrupted
+# when reading "fast" files like on-disk files. So the EINTR issue only
+# affects things like pipes, sockets, ttys etc. We treat "normal" (S_ISREG)
+# files approximately as "fast" files and use the fast (unsafe) code path,
+# to minimize the performance impact.
+if sys.version_info >= (2, 7, 4):
+# fp.readline deals with EINTR correctly, use it as a workaround
+def _safeiterfile(fp):
+return iter(fp.readline, '')
+else:
+# fp.read* are broken too, manually deal with EINTR in a stupid way
+# note: this may block longer than necessary because of bufsize.
+def _safeiterfile(fp, bufsize=4096):
+fd = fp.fileno()
+line = ''
+while True:
+try:
+buf = os.read(fd, bufsize)
+except OSError as ex:
+if ex.errno == errno.EINTR:
+continue
+else:
+raise
+line += buf
+if '\n' in buf:
+splitted = line.splitlines(True)
+line = ''
+for l in splitted:
+if l[-1] == '\n':
+yield l
+else:
+line = l
+if not buf:
+break
+
+def iterfile(fp):
+fastpath = True
+if type(fp) is file:
+fastpath = stat.S_ISREG(os.fstat(fp.fileno()).st_mode)
+if fastpath:
+return fp
+else:
+return _safeiterfile(fp)
+else:
+# PyPy and CPython 3 do not have the EINTR issue thus no workaround needed.
+def iterfile(fp):
+return fp
 
 def iterlines(iterator):
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] util: improve iterfile so it impacts little on performance

2016-11-15 Thread Jun Wu
Excerpts from Jun Wu's message of 2016-11-15 16:04:13 +:
> # HG changeset patch
> # User Jun Wu 
> # Date 1479225350 0
> #  Tue Nov 15 15:55:50 2016 +
> # Node ID 3cd2e9873bc1d565300b629e72100800075d12bb
> # Parent  d1a0a64f6e16432333bea0476098c46a61222b9b
> # Available At https://bitbucket.org/quark-zju/hg-draft 
> #  hg pull https://bitbucket.org/quark-zju/hg-draft  -r 
> 3cd2e9873bc1
> util: improve iterfile so it impacts little on performance
> 
> We have performance concerns on "iterfile" as it is 4X slower on normal
> files. While modern systems have the nice property that reading a "fast"
> (on-disk) file cannot be interrupted and should be made use of.
> 
> This patch dumps the related knowledge in comments. And tries to minimize
> the performance impact: it only use the slower but safer approach for
> non-normal files. It gives up for Python < 2.7.4 because the slower approach

I think I can fix Python < 2.7.4 using some slow code path as well.
So I'm dropping this one. A new version is coming.

> does not make a difference in terms of safety. And it avoids the workaround
> for Python >= 3 and PyPy who don't have the EINTR issue.
> 
> diff --git a/mercurial/util.py b/mercurial/util.py
> --- a/mercurial/util.py
> +++ b/mercurial/util.py
> @@ -25,8 +25,10 @@ import hashlib
>  import imp
>  import os
> +import platform as pyplatform
>  import re as remod
>  import shutil
>  import signal
>  import socket
> +import stat
>  import string
>  import subprocess
> @@ -2191,8 +2193,31 @@ def wrap(line, width, initindent='', han
>  return wrapper.fill(line).encode(encoding.encoding)
>  
> -def iterfile(fp):
> -"""like fp.__iter__ but does not have issues with EINTR. Python 2.7.12 is
> -known to have such issues."""
> -return iter(fp.readline, '')
> +if (pyplatform.python_implementation() == 'CPython' and
> +sys.version_info <= (3, 0) and sys.version_info >= (2, 7, 4)):
> +# There is an issue with CPython 2 that file.__iter__ does not handle 
> EINTR
> +# correctly. CPython <= 2.7.12 is known to have the issue.
> +# In CPython >= 2.7.4, file.read, file.readline etc. deal with EINTR
> +# correctly so we can use the workaround below. However the workaround is
> +# about 4X slower than the native iterator because the latter does
> +# readahead caching in CPython layer.
> +# On modern systems like Linux, the "read" syscall cannot be interrupted
> +# for reading "fast" files like on-disk files. So the EINTR issue only
> +# affects things like pipes, sockets, ttys etc. We treat "normal" 
> (S_ISREG)
> +# files approximately as "fast" files and use the fast (unsafe) code 
> path.
> +def iterfile(fp):
> +fastpath = True
> +try:
> +fastpath = stat.S_ISREG(os.fstat(fp.fileno()).st_mode)
> +except (AttributeError, OSError): # no fileno, or stat fails
> +pass
> +if fastpath:
> +return fp
> +else:
> +return iter(fp.readline, '')
> +else:
> +# For CPython < 2.7.4, the workaround wouldn't make things better.
> +# PyPy and CPython 3 do not have the EINTR issue thus no workaround 
> needed.
> +def iterfile(fp):
> +return fp
>  
>  def iterlines(iterator):
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2 narrowhg-ext] manifest: fix narrowmanifest to be narrowmanifestlog

2016-11-15 Thread Martin von Zweigbergk via Mercurial-devel
queued these with some minor changes (dropped unused import and
changed an "add" to "write"). Thanks!

On Mon, Nov 14, 2016 at 3:30 PM, Durham Goode  wrote:
> # HG changeset patch
> # User Durham Goode 
> # Date 1479166205 28800
> #  Mon Nov 14 15:30:05 2016 -0800
> # Node ID 331ce6adfeccf8f21bdda8ddfb28da8f4b410848
> # Parent  33646477f95dcfd2b7d4eede4d190569ca142be6
> manifest: fix narrowmanifest to be narrowmanifestlog
>
> Upstream hg has completely refactored the manifest class, so we need to update
> our internal narrowmanifest* logic. In some ways the new logic is simpler, 
> since
> we only need to wrap manifestlog.get() to cover all manifestctx creations
> instead of having to wrap every manifest revlog to capture all creations. We
> still need to wrap manifestrevlog creations to insert the narrowrevlog mixin,
> but that is now completely separate logic from the tree oriented
> excludeddirmanfiestctx logic.
>
> diff --git a/src/narrowrepo.py b/src/narrowrepo.py
> --- a/src/narrowrepo.py
> +++ b/src/narrowrepo.py
> @@ -37,11 +37,16 @@ def wraprepo(repo):
>  narrowrevlog.makenarrowchangelog(cl)
>  return cl
>
> +def _constructmanifest(self):
> +manifest = super(narrowrepository, self)._constructmanifest()
> +narrowrevlog.makenarrowmanifestrevlog(manifest)
> +return manifest
> +
>  @cacheprop('00manifest.i')
> -def manifest(self):
> -mf = super(narrowrepository, self).manifest
> -narrowrevlog.makenarrowmanifest(mf, self)
> -return mf
> +def manifestlog(self):
> +mfl = super(narrowrepository, self).manifestlog
> +narrowrevlog.makenarrowmanifestlog(mfl, self)
> +return mfl
>
>  def file(self, f):
>  fl = super(narrowrepository, self).file(f)
> diff --git a/src/narrowrevlog.py b/src/narrowrevlog.py
> --- a/src/narrowrevlog.py
> +++ b/src/narrowrevlog.py
> @@ -17,15 +17,6 @@ assert (revlog.REVIDX_KNOWN_FLAGS & ELLI
>  revlog.REVIDX_KNOWN_FLAGS |= ELLIPSIS_NODE_FLAG
>
>
> -if util.safehasattr(manifest, 'treemanifestctx'):
> -def inittreemanifestctx(orig, self, revlog, dir, node):
> -orig(self, revlog, dir, node)
> -if isinstance(self._revlog, excludeddirmanifestlog):
> -self._data = excludeddir(dir, node)
> -
> -extensions.wrapfunction(manifest.treemanifestctx, '__init__',
> -inittreemanifestctx)
> -
>  class narrowrevlogmixin(object):
>  """Mixin for revlog subclasses to make them support narrow clones."""
>
> @@ -77,15 +68,15 @@ class excludeddir(manifest.treemanifest)
>  def copy(self):
>  return self
>
> -class excludeddirmanifestlog(manifest.manifest):
> -def __init__(self, dir):
> +class excludeddirmanifestctx(manifest.treemanifestctx):
> +def __init__(self, dir, node):
>  self._dir = dir
> -self._treeondisk = True
> +self._node = node
>
> -def read(self, node):
> -return excludeddir(self._dir, node)
> +def read(self):
> +return excludeddir(self._dir, self._node)
>
> -def add(self, *args):
> +def write(self, *args):
>  # We should never write entries in dirlogs outside the narrow clone.
>  # However, the method still gets called from writesubtree() in
>  # _addtree(), so we need to handle it. We should possibly make that
> @@ -93,17 +84,23 @@ class excludeddirmanifestlog(manifest.ma
>  # in excludeddir instances).
>  pass
>
> -def makenarrowmanifest(mf, repo):
> -if not isinstance(mf, narrowrevlogmixin):
> -class narrowmanifest(narrowrevlogmixin, mf.__class__):
> +def makenarrowmanifestrevlog(mfrevlog):
> +if not isinstance(mfrevlog, narrowrevlogmixin):
> +class narrowmanifestrevlog(narrowrevlogmixin, mfrevlog.__class__):
>  def dirlog(self, dir):
> -if not repo.narrowmatch().visitdir(dir[:-1] or '.'):
> -dirlog = excludeddirmanifestlog(dir)
> -else:
> -dirlog = super(narrowmanifest, self).dirlog(dir)
> -makenarrowmanifest(dirlog, repo)
> -return dirlog
> -mf.__class__ = narrowmanifest
> +result = super(narrowmanifestrevlog, self).dirlog(dir)
> +makenarrowmanifestrevlog(result)
> +return result
> +
> +mfrevlog.__class__ = narrowmanifestrevlog
> +
> +def makenarrowmanifestlog(mfl, repo):
> +class narrowmanifestlog(mfl.__class__):
> +def get(self, dir, node, verify=True):
> +if not repo.narrowmatch().visitdir(dir[:-1] or '.'):
> +return excludeddirmanifestctx(dir, node)
> +return super(narrowmanifestlog, self).get(dir, node, 
> verify=verify)
> +mfl.__class__ = narrowmanifestlog
>
>  def makenarrowfilelog(fl, narrowmatch):
>  if not 

Re: [PATCH V2] perf: add asv benchmarks

2016-11-15 Thread Philippe Pepiot

On 11/14/2016 11:02 AM, Jun Wu wrote:


Generally looks good to me. Some inline comments about small issues.

I wonder if it is possible to get rid of the dependencies on virtualenv and
hglib as installing them could be troublesome in some environments. I think
"virtualenv" could be replaced by just copying the checked out files to
somewhere, and hglib seems redundant.

Thanks, sent a V3 with requested changes.

Currently asv only support virtualenv (+pip) and conda environments to 
install target package. It could be enhanced to support installation 
using `setup.py install --prefix` and set PYTHONPATH accordingly 
(assuming the package requirements are satisfied).
hglib is required to interact with "repo" from asv.conf.json, it could 
directly use "hg" command via subprocess instead (like git support) but 
I'm not sure it will be relevant from asv point of view.


Cheers,


Excerpts from Philippe Pepiot's message of 2016-11-03 16:58:13 +0100:

# HG changeset patch
# User Philippe Pepiot 
# Date 1475136994 -7200
#  Thu Sep 29 10:16:34 2016 +0200
# Node ID 67e096cea548a37ba80ddf04e62a1cc1d50e9c96
# Parent  b032a7b676c6637b2ac6f3ef29431013b15a08f9
perf: add asv benchmarks

Airspeed velocity (ASV) is a python framework for benchmarking Python packages
over their lifetime. The results are displayed in an interactive web frontend.

Add ASV benchmarks for mercurial that use contrib/perf.py extension that could
be run against multiple reference repositories.

The benchmark suite now includes revsets from contrib/base-revsets.txt with
variants, perftags, perfstatus, perfmanifest and perfheads.

Installation requires asv>=0.2, python-hglib and virtualenv

This is part of PerformanceTrackingSuitePlan
https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan

diff --git a/.hgignore b/.hgignore
--- a/.hgignore
+++ b/.hgignore
@@ -49,6 +49,7 @@ mercurial.egg-info
  tags
  cscope.*
  .idea/*
+.asv/*
  i18n/hg.pot
  locale/*/LC_MESSAGES/hg.mo
  hgext/__index__.py
diff --git a/contrib/asv.conf.json b/contrib/asv.conf.json
new file mode 100644
--- /dev/null
+++ b/contrib/asv.conf.json
@@ -0,0 +1,127 @@
+{
+// The version of the config file format.  Do not change, unless
+// you know what you are doing.
+"version": 1,
+
+// The name of the project being benchmarked
+"project": "mercurial",
+
+// The project's homepage
+"project_url": "http://mercurial-scm.org/ ",
+
+// The URL or local path of the source code repository for the
+// project being benchmarked
+"repo": "..",
+
+// List of branches to benchmark. If not provided, defaults to "master"
+// (for git) or "default" (for mercurial).
+// "branches": ["master"], // for git
+// "branches": ["default"],// for mercurial
+"branches": ["default", "stable"],
+
+// The DVCS being used.  If not set, it will be automatically
+// determined from "repo" by looking at the protocol in the URL
+// (if remote), or by looking for special directories, such as
+// ".git" (if local).
+// "dvcs": "git",
+
+// The tool to use to create environments.  May be "conda",
+// "virtualenv" or other value depending on the plugins in use.
+// If missing or the empty string, the tool will be automatically
+// determined by looking for tools on the PATH environment
+// variable.
+"environment_type": "virtualenv",
+
+// the base URL to show a commit for the project.
+"show_commit_url": "https://www.mercurial-scm.org/repo/hg/rev/ ",
+
+// The Pythons you'd like to test against.  If not provided, defaults
+// to the current version of Python used to run `asv`.
+// "pythons": ["2.7", "3.3"],
+
+// The matrix of dependencies to test.  Each key is the name of a
+// package (in PyPI) and the values are version numbers.  An empty
+// list or empty string indicates to just test against the default
+// (latest) version. null indicates that the package is to not be
+// installed. If the package to be tested is only available from
+// PyPi, and the 'environment_type' is conda, then you can preface
+// the package name by 'pip+', and the package will be installed via
+// pip (with all the conda available packages installed first,
+// followed by the pip installed packages).
+//
+// "matrix": {
+// "numpy": ["1.6", "1.7"],
+// "six": ["", null],// test with and without six installed
+// "pip+emcee": [""],   // emcee is only available for install with 
pip.
+// },
+
+// Combinations of libraries/python versions can be excluded/included
+// from the set to test. Each entry is a dictionary containing additional
+// key-value pairs to include/exclude.
+//
+// An exclude entry excludes entries where all values match. The
+// values are regexps that should match the whole string.
+//
+// An include entry adds an environment. Only the packages 

[PATCH 1 of 2 V3] perf: omit copying ui and redirect to ferr if buffer API is in use

2016-11-15 Thread Philippe Pepiot
# HG changeset patch
# User Philippe Pepiot 
# Date 1479222657 -3600
#  Tue Nov 15 16:10:57 2016 +0100
# Node ID ab6e50ddc2c56dcf170991293005be6d6f80a232
# Parent  e1d6aa0e4c3aed73e0dc523b8a8fd5f9fe23510a
perf: omit copying ui and redirect to ferr if buffer API is in use

This allow to get the output of contrib/perf.py commands using the
ui.pushbuffer() API.

diff --git a/contrib/perf.py b/contrib/perf.py
--- a/contrib/perf.py
+++ b/contrib/perf.py
@@ -136,13 +136,14 @@ def gettimer(ui, opts=None):
 
 if opts is None:
 opts = {}
-# redirect all to stderr
-ui = ui.copy()
-uifout = safeattrsetter(ui, 'fout', ignoremissing=True)
-if uifout:
-# for "historical portability":
-# ui.fout/ferr have been available since 1.9 (or 4e1ccd4c2b6d)
-uifout.set(ui.ferr)
+# redirect all to stderr unless buffer api is in use
+if not ui._buffers:
+ui = ui.copy()
+uifout = safeattrsetter(ui, 'fout', ignoremissing=True)
+if uifout:
+# for "historical portability":
+# ui.fout/ferr have been available since 1.9 (or 4e1ccd4c2b6d)
+uifout.set(ui.ferr)
 
 # get a formatter
 uiformatter = getattr(ui, 'formatter', None)
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


[PATCH 2 of 2 V3] perf: add asv benchmarks

2016-11-15 Thread Philippe Pepiot
# HG changeset patch
# User Philippe Pepiot 
# Date 1475136994 -7200
#  Thu Sep 29 10:16:34 2016 +0200
# Node ID 94e48d7dc9630543e0f4179b2ca96f3c66967f6a
# Parent  ab6e50ddc2c56dcf170991293005be6d6f80a232
perf: add asv benchmarks

Airspeed velocity (ASV) is a python framework for benchmarking Python packages
over their lifetime. The results are displayed in an interactive web frontend.

Add ASV benchmarks for mercurial that use contrib/perf.py extension that could
be run against multiple reference repositories.

The benchmark suite now includes revsets from contrib/base-revsets.txt with
variants, perftags, perfstatus, perfmanifest and perfheads.

Installation requires asv>=0.2, python-hglib and virtualenv

This is part of PerformanceTrackingSuitePlan
https://www.mercurial-scm.org/wiki/PerformanceTrackingSuitePlan

diff --git a/.hgignore b/.hgignore
--- a/.hgignore
+++ b/.hgignore
@@ -49,6 +49,7 @@ mercurial.egg-info
 tags
 cscope.*
 .idea/*
+.asv/*
 i18n/hg.pot
 locale/*/LC_MESSAGES/hg.mo
 hgext/__index__.py
diff --git a/contrib/asv.conf.json b/contrib/asv.conf.json
new file mode 100644
--- /dev/null
+++ b/contrib/asv.conf.json
@@ -0,0 +1,127 @@
+{
+// The version of the config file format.  Do not change, unless
+// you know what you are doing.
+"version": 1,
+
+// The name of the project being benchmarked
+"project": "mercurial",
+
+// The project's homepage
+"project_url": "http://mercurial-scm.org/;,
+
+// The URL or local path of the source code repository for the
+// project being benchmarked
+"repo": "..",
+
+// List of branches to benchmark. If not provided, defaults to "master"
+// (for git) or "default" (for mercurial).
+// "branches": ["master"], // for git
+// "branches": ["default"],// for mercurial
+"branches": ["default", "stable"],
+
+// The DVCS being used.  If not set, it will be automatically
+// determined from "repo" by looking at the protocol in the URL
+// (if remote), or by looking for special directories, such as
+// ".git" (if local).
+// "dvcs": "git",
+
+// The tool to use to create environments.  May be "conda",
+// "virtualenv" or other value depending on the plugins in use.
+// If missing or the empty string, the tool will be automatically
+// determined by looking for tools on the PATH environment
+// variable.
+"environment_type": "virtualenv",
+
+// the base URL to show a commit for the project.
+"show_commit_url": "https://www.mercurial-scm.org/repo/hg/rev/;,
+
+// The Pythons you'd like to test against.  If not provided, defaults
+// to the current version of Python used to run `asv`.
+// "pythons": ["2.7", "3.3"],
+
+// The matrix of dependencies to test.  Each key is the name of a
+// package (in PyPI) and the values are version numbers.  An empty
+// list or empty string indicates to just test against the default
+// (latest) version. null indicates that the package is to not be
+// installed. If the package to be tested is only available from
+// PyPi, and the 'environment_type' is conda, then you can preface
+// the package name by 'pip+', and the package will be installed via
+// pip (with all the conda available packages installed first,
+// followed by the pip installed packages).
+//
+// "matrix": {
+// "numpy": ["1.6", "1.7"],
+// "six": ["", null],// test with and without six installed
+// "pip+emcee": [""],   // emcee is only available for install with 
pip.
+// },
+
+// Combinations of libraries/python versions can be excluded/included
+// from the set to test. Each entry is a dictionary containing additional
+// key-value pairs to include/exclude.
+//
+// An exclude entry excludes entries where all values match. The
+// values are regexps that should match the whole string.
+//
+// An include entry adds an environment. Only the packages listed
+// are installed. The 'python' key is required. The exclude rules
+// do not apply to includes.
+//
+// In addition to package names, the following keys are available:
+//
+// - python
+// Python version, as in the *pythons* variable above.
+// - environment_type
+// Environment type, as above.
+// - sys_platform
+// Platform, as in sys.platform. Possible values for the common
+// cases: 'linux2', 'win32', 'cygwin', 'darwin'.
+//
+// "exclude": [
+// {"python": "3.2", "sys_platform": "win32"}, // skip py3.2 on windows
+// {"environment_type": "conda", "six": null}, // don't run without 
six on conda
+// ],
+//
+// "include": [
+// // additional env for python2.7
+// {"python": "2.7", "numpy": "1.8"},
+// // additional env if run on windows+conda
+// {"platform": "win32", "environment_type": "conda", "python": "2.7", 

Re: [PATCH 3 of 6] patch: migrate to util.iterfile

2016-11-15 Thread Yuya Nishihara
On Tue, 15 Nov 2016 15:14:09 +, Jun Wu wrote:
> Excerpts from Jun Wu's message of 2016-11-15 15:11:52 +:
> > Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +:
> > > 
> > > On 11/15/2016 01:38 PM, Jun Wu wrote:
> > > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900:
> > > >> On Mon, 14 Nov 2016 23:35:55 +, Jun Wu wrote:
> > > >>> # HG changeset patch
> > > >>> # User Jun Wu 
> > > >>> # Date 1479165246 0
> > > >>> #  Mon Nov 14 23:14:06 2016 +
> > > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace
> > > >>> # Parent  8097b8c66952003addd5b683a14265c1d3cc201f
> > > >>> # Available At https://bitbucket.org/quark-zju/hg-draft 
> > > >>> #  hg pull https://bitbucket.org/quark-zju/hg-draft   -r 
> > > >>> d1637df5ffd9
> > > >>> patch: migrate to util.iterfile
> > > >>
> > > >> Not all read()s are interruptible by signals. I don't remember the 
> > > >> detail, but
> > > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note 
> > > >> that a
> > > >> (local) disk is not a slow device according to this definition; I/O 
> > > >> operations
> > > >> on disk devices are not interrupted by signals."
> > > >
> > > > Good to know! I guess modern OS have similar decisions, although POSIX 
> > > > seems
> > > > to be not helpful here.
> > > >
> > > >> Since readline() doesn't have cache in CPython layer, it would be 
> > > >> slightly
> > > >> slower than fp.__iter__().
> > > >
> > > > A quick benchmark shows it's ~4x slower.
> > > >
> > > > Maybe make the function smarter to only use the workaround for a pipe?
> > > >
> > > >   if hassafeattr(fp, 'fileno') and 
> > > > stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode):
> > > >   return iter(fp.readline, '')

No idea if it can cover all cases. (e.g. i don't know if tty is the "slow"
device.)

> > > Should we drop this series from hg-committed or a followup is enough?
> > 
> > +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we
> > support Python 2.6, we are at risk that any fileobj.read() is broken.
> 
> Actually, I think the worker issue is still worthwhile to fix (the current
> situation is better than none anyway). So I will send follow-ups:
> 
>   - Fix util.iterfile perf
>   - Only use SIGCHILD handler in worker, if Python >= 2.7.4

It sounds good to patch worker.py which is known to have the problem, and we
wouldn't care if the read of progress messages gets 4x slower.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Vendoring zstd

2016-11-15 Thread Gregory Szorc


> On Nov 15, 2016, at 06:31, Pierre-Yves David  
> wrote:
> 
> 
> 
>> On 11/10/2016 10:22 PM, Augie Fackler wrote:
>>> On Thu, Nov 10, 2016 at 10:07:32AM -0800, Gregory Szorc wrote:
>>> I plan to add support for zstd to Mercurial this release cycle. I don't
>>> think support itself is too controversial. (There will be bikeshedding over
>>> implementation details, of course.)
>>> 
>>> Since zstd is relatively new and not part of a Python distribution, we
>>> can't rely on its shared libraries being available on systems or even part
>>> of the system packager. In addition, the Python bindings to zstd rely on
>>> zstd APIs that require static linking to access. In addition, said zstd
>>> APIs are not guaranteed stable (which is why they aren't exported). So it
>>> is important for the Python bindings to be compiled against a known version
>>> of zstd. This is why the Python bindings ship the zstd sources and don't
>>> treat zstd as an external dependency. On top of that, the Python bindings
>>> are a bit fluid and targeting multiple versions from Mercurial could be
>>> painful until the API matures.
>>> 
>>> If we're going to have proper zstd support that "just works," I think we
>>> need to vendor the Python bindings (which includes the zstd sources) into
>>> the hg repo.
>>> 
>>> I have 2 questions:
>>> 
>>> 1) Is vendoring python-zstandard (
>>> https://github.com/indygreg/python-zstandard) (or at least the parts we
>>> need) acceptable?
>> 
>> It's fine with me. We might want to send a note out to the packaging
>> list giving them a heads up that:
>> 
>> 1) We're doing this
>> 2) We'd appreciate it if they didn't unbundle zstd, given the above
>> 
>> On our end, we should probably degrade gracefully if python-zstandard
>> isn't available, so that distros can unbundle and just not have the
>> dependency (at least until the APIs settle down).
> 
> I'm fine with that too and Augie extra step seems good.
> 
> To me, the main reason to vendor this is the API instability (the format 
> itself is stable, right?).

The format should be stable.

> 
> Otherwise having Mercurial degrade gracefully if zst is not available seems 
> like the best behavior for now (while zstd in Mercurial get mature). Later We 
> could build package includig zstd (as we do for python2.7) for distribution 
> that are missing it and just use the distribution package for the other 
> (provided the API eventually stabilize). Maintaining zstd package is more of 
> a distribution job and I would be happier if we do not have to handle 
> security maintainance of zstd on the long run.

In the series I sent, there is not an option to disable zstd. I can do that as 
a follow-up enhancement to setup.py.

> 
>>> 2) Should I patchbomb the ~1 MB of sources or should the vendoring
>>> changeset(s) be submitted a different way?
>> 
>> I think for the monster import change, we should probably just have
>> you send a pull request.
> 
> +1
> 
> -- 
> Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 6] patch: migrate to util.iterfile

2016-11-15 Thread Jun Wu
Excerpts from Jun Wu's message of 2016-11-15 15:11:52 +:
> Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +:
> > 
> > On 11/15/2016 01:38 PM, Jun Wu wrote:
> > > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900:
> > >> On Mon, 14 Nov 2016 23:35:55 +, Jun Wu wrote:
> > >>> # HG changeset patch
> > >>> # User Jun Wu 
> > >>> # Date 1479165246 0
> > >>> #  Mon Nov 14 23:14:06 2016 +
> > >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace
> > >>> # Parent  8097b8c66952003addd5b683a14265c1d3cc201f
> > >>> # Available At https://bitbucket.org/quark-zju/hg-draft 
> > >>> #  hg pull https://bitbucket.org/quark-zju/hg-draft   -r 
> > >>> d1637df5ffd9
> > >>> patch: migrate to util.iterfile
> > >>
> > >> Not all read()s are interruptible by signals. I don't remember the 
> > >> detail, but
> > >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note 
> > >> that a
> > >> (local) disk is not a slow device according to this definition; I/O 
> > >> operations
> > >> on disk devices are not interrupted by signals."
> > >
> > > Good to know! I guess modern OS have similar decisions, although POSIX 
> > > seems
> > > to be not helpful here.
> > >
> > >> Since readline() doesn't have cache in CPython layer, it would be 
> > >> slightly
> > >> slower than fp.__iter__().
> > >
> > > A quick benchmark shows it's ~4x slower.
> > >
> > > Maybe make the function smarter to only use the workaround for a pipe?
> > >
> > >   if hassafeattr(fp, 'fileno') and 
> > > stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode):
> > >   return iter(fp.readline, '')
> > >   else:
> > >   return fp
> > 
> > Should we drop this series from hg-committed or a followup is enough?
> 
> +1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we
> support Python 2.6, we are at risk that any fileobj.read() is broken.

Actually, I think the worker issue is still worthwhile to fix (the current
situation is better than none anyway). So I will send follow-ups:

  - Fix util.iterfile perf
  - Only use SIGCHILD handler in worker, if Python >= 2.7.4
> 
> > 
> > Cheers,
> > 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 6] patch: migrate to util.iterfile

2016-11-15 Thread Jun Wu
Excerpts from Pierre-Yves David's message of 2016-11-15 14:07:15 +:
> 
> On 11/15/2016 01:38 PM, Jun Wu wrote:
> > Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900:
> >> On Mon, 14 Nov 2016 23:35:55 +, Jun Wu wrote:
> >>> # HG changeset patch
> >>> # User Jun Wu 
> >>> # Date 1479165246 0
> >>> #  Mon Nov 14 23:14:06 2016 +
> >>> # Node ID d1637df5ffd91e95da25213f06f346adb3269ace
> >>> # Parent  8097b8c66952003addd5b683a14265c1d3cc201f
> >>> # Available At https://bitbucket.org/quark-zju/hg-draft 
> >>> #  hg pull https://bitbucket.org/quark-zju/hg-draft   -r 
> >>> d1637df5ffd9
> >>> patch: migrate to util.iterfile
> >>
> >> Not all read()s are interruptible by signals. I don't remember the detail, 
> >> but
> >> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note 
> >> that a
> >> (local) disk is not a slow device according to this definition; I/O 
> >> operations
> >> on disk devices are not interrupted by signals."
> >
> > Good to know! I guess modern OS have similar decisions, although POSIX seems
> > to be not helpful here.
> >
> >> Since readline() doesn't have cache in CPython layer, it would be slightly
> >> slower than fp.__iter__().
> >
> > A quick benchmark shows it's ~4x slower.
> >
> > Maybe make the function smarter to only use the workaround for a pipe?
> >
> >   if hassafeattr(fp, 'fileno') and 
> > stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode):
> >   return iter(fp.readline, '')
> >   else:
> >   return fp
> 
> Should we drop this series from hg-committed or a followup is enough?

+1. I guess I have to abandon the SIGCHILD patch, too (sadly). As long as we
support Python 2.6, we are at risk that any fileobj.read() is broken.

> 
> Cheers,
> 
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 6] patch: migrate to util.iterfile

2016-11-15 Thread Martijn Pieters
On 15 November 2016 at 12:14, Yuya Nishihara  wrote:

> Not all read()s are interruptible by signals. I don't remember the detail,
> but
> signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note
> that a
> (local) disk is not a slow device according to this definition; I/O
> operations
> on disk devices are not interrupted by signals."
>
> Since readline() doesn't have cache in CPython layer, it would be slightly
> slower than fp.__iter__().
>

It looks like this isn't the case for Windows; I found
http://bugs.python.org/issue1633941, where you can provoke the bug by using
^Z while reading from stdin.

Also, before Python 2.7.4 (including 2.6) there is no EINTR handling *at
all*, all reading operations that can be interrupted are affected. See
http://bugs.python.org/issue12268, which aimed at fixing that issue but
managed to miss readahead.

-- 
Martijn Pieters
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: Vendoring zstd

2016-11-15 Thread Pierre-Yves David



On 11/10/2016 10:22 PM, Augie Fackler wrote:

On Thu, Nov 10, 2016 at 10:07:32AM -0800, Gregory Szorc wrote:

I plan to add support for zstd to Mercurial this release cycle. I don't
think support itself is too controversial. (There will be bikeshedding over
implementation details, of course.)

Since zstd is relatively new and not part of a Python distribution, we
can't rely on its shared libraries being available on systems or even part
of the system packager. In addition, the Python bindings to zstd rely on
zstd APIs that require static linking to access. In addition, said zstd
APIs are not guaranteed stable (which is why they aren't exported). So it
is important for the Python bindings to be compiled against a known version
of zstd. This is why the Python bindings ship the zstd sources and don't
treat zstd as an external dependency. On top of that, the Python bindings
are a bit fluid and targeting multiple versions from Mercurial could be
painful until the API matures.

If we're going to have proper zstd support that "just works," I think we
need to vendor the Python bindings (which includes the zstd sources) into
the hg repo.

I have 2 questions:

1) Is vendoring python-zstandard (
https://github.com/indygreg/python-zstandard) (or at least the parts we
need) acceptable?


It's fine with me. We might want to send a note out to the packaging
list giving them a heads up that:

1) We're doing this
2) We'd appreciate it if they didn't unbundle zstd, given the above

On our end, we should probably degrade gracefully if python-zstandard
isn't available, so that distros can unbundle and just not have the
dependency (at least until the APIs settle down).


I'm fine with that too and Augie extra step seems good.

To me, the main reason to vendor this is the API instability (the format 
itself is stable, right?).


Otherwise having Mercurial degrade gracefully if zst is not available 
seems like the best behavior for now (while zstd in Mercurial get 
mature). Later We could build package includig zstd (as we do for 
python2.7) for distribution that are missing it and just use the 
distribution package for the other (provided the API eventually 
stabilize). Maintaining zstd package is more of a distribution job and I 
would be happier if we do not have to handle security maintainance of 
zstd on the long run.



2) Should I patchbomb the ~1 MB of sources or should the vendoring
changeset(s) be submitted a different way?


I think for the monster import change, we should probably just have
you send a pull request.


+1

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 02 of 10 V8] bookmarks: introduce listbookmarks()

2016-11-15 Thread Pierre-Yves David



On 11/12/2016 11:41 PM, Gregory Szorc wrote:

On Sat, Nov 12, 2016 at 12:19 PM, Stanislau Hlebik > wrote:

# HG changeset patch
# User Stanislau Hlebik >
# Date 1478980589 28800
#  Sat Nov 12 11:56:29 2016 -0800
# Branch stable
# Node ID 13b3e16c68d303a523550980d7739bb676420851
# Parent  fe9e78883230a875ae7acf6c7a5b324c1d9016f5
bookmarks: introduce listbookmarks()

`bookmarks` bundle2 part will work with binary nodes.
To avoid unnecessary conversions between binary and hex nodes
let's add `listbookmarks()` that returns binary nodes.


Between this and part 1, you've introduced API compatibility breakage
without an "(API)" annotation in the commit message.

I'm not sure why you've done it this way. Introducing
`listbookmarksbin()` gets you to a near identical end state without the
API breakage.


Having a different function name seems like a good idea. We could add a 
deprecation warning on the old one. (and drop it when due)


Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH] match: adding support for repository-root-based globs

2016-11-15 Thread Pierre-Yves David



On 11/15/2016 04:59 AM, Rodrigo Damazio Bovendorp via Mercurial-devel wrote:

# HG changeset patch
# User Rodrigo Damazio Bovendorp 
# Date 1475944120 25200
#  Sat Oct 08 09:28:40 2016 -0700
# Node ID 93434cce258a797fcc3997c0af994a524695e273
# Parent  b032a7b676c6637b2ac6f3ef29431013b15a08f9
match: adding support for repository-root-based globs


I saw that Foozy created a plan page about this, it seem to have good 
summary of the current matcher we have but from my reading it is a bit 
fuzzy about the current variation we have in behavior from one command 
to another and from flag usage. I think it is important to have a global 
view of the situation here to be able to efficiently tackle the issues 
at hand.


I'm traveling in Marocco with poor internet connectivity until the end 
of the week. I would prefer if we could not take a final discussion 
until I've time to discuss it more at the beginning of next week. Sorry 
for the extra delay.



The broader plan is to add explicit base directories for all patterns:
  === ===
pattern type root-ed  cwd-ed  any-of-path
  === ===
wildcard rootglob cwdglob anyglob
regexp   rootre   cwdre   anyre
raw string   rootpath cwdpath anypath
  === ===
(table by foozy)

I'm starting by adding rootglob.
One important characteristic and difference from the older glob types is
that rootglob does a *full* match, meaning that a * at the end will never
match recursively, even when the glob is used as an include pattern.

diff -r b032a7b676c6 -r 93434cce258a mercurial/help/patterns.txt
--- a/mercurial/help/patterns.txt   Tue Nov 01 18:54:03 2016 -0700
+++ b/mercurial/help/patterns.txt   Sat Oct 08 09:28:40 2016 -0700
@@ -40,6 +40,11 @@
 ``-I`` or ``-X`` options), can match also against directories: files
 under matched directories are treated as matched.

+For ``-I`` and ``-X`` options, ``glob:`` will match directories recursively.
+``rootglob:``, on the other end, does a full match, meaning that all files, in
+directories or subdirectories, will only match if the entire expression 
matches.
+In that case, ``**`` can be used to obtain recursiveness.
+
 Plain examples::

   path:foo/bar   a name bar in a directory named foo in the root
@@ -48,13 +53,18 @@

 Glob examples::

-  glob:*.c   any name ending in ".c" in the current directory
-  *.cany name ending in ".c" in the current directory
-  **.c   any name ending in ".c" in any subdirectory of the
- current directory including itself.
-  foo/*.cany name ending in ".c" in the directory foo
-  foo/**.c   any name ending in ".c" in any subdirectory of foo
- including itself.
+  glob:*.cany name ending in ".c" in the current directory
+  *.c any name ending in ".c" in the current directory
+  **.cany name ending in ".c" in any subdirectory of the
+  current directory including itself.
+  foo/*   any file in directory foo plus all its subdirectories,
+  recursively
+  foo/*.c any name ending in ".c" in the directory foo
+  foo/**.cany name ending in ".c" in any subdirectory of foo
+  including itself.
+  rootglob:*.cany name ending in ".c" in the repository root
+  rootglob:foo/*  all files inside foo but not its subdirectories
+  rootglob:foo/** all files inside foo and its subdirectories

 Regexp examples::

diff -r b032a7b676c6 -r 93434cce258a mercurial/match.py
--- a/mercurial/match.pyTue Nov 01 18:54:03 2016 -0700
+++ b/mercurial/match.pySat Oct 08 09:28:40 2016 -0700
@@ -105,6 +105,8 @@
 'glob:' - a glob relative to cwd
 're:' - a regular expression
 'path:' - a path relative to repository root
+'rootglob:' - a glob relative to repository root. Unlike glob, *
+will never match subdirectories.
 'relglob:' - an unrooted glob (*.c matches C files in all dirs)
 'relpath:' - a path relative to cwd
 'relre:' - a regexp that needn't match the start of a name
@@ -286,7 +288,7 @@
 for kind, pat in [_patsplit(p, default) for p in patterns]:
 if kind in ('glob', 'relpath'):
 pat = pathutil.canonpath(root, cwd, pat, auditor)
-elif kind in ('relglob', 'path'):
+elif kind in ('relglob', 'path', 'rootglob'):
 pat = util.normpath(pat)
 elif kind in ('listfile', 'listfile0'):
 try:
@@ -447,7 +449,8 @@
 if ':' in pattern:
 kind, pat = pattern.split(':', 1)
 if kind in ('re', 'glob', 'path', 'relglob', 'relpath', 'relre',
-'listfile', 'listfile0', 'set', 'include', 'subinclude'):
+'listfile', 'listfile0', 'set', 'include', 'subinclude',
+'rootglob'):
  

Re: [PATCH 1 of 3 STABLE] util: add utility function to skip avoiding file stat ambiguity if EPERM

2016-11-15 Thread Yuya Nishihara
On Tue, 15 Nov 2016 22:33:31 +0900, FUJIWARA Katsunori wrote:
> At Mon, 14 Nov 2016 22:43:04 +0900,
> Yuya Nishihara wrote:
> > 
> > On Sun, 13 Nov 2016 06:16:08 +0900, FUJIWARA Katsunori wrote:
> > > # HG changeset patch
> > > # User FUJIWARA Katsunori 
> > > # Date 1478984783 -32400
> > > #  Sun Nov 13 06:06:23 2016 +0900
> > > # Branch stable
> > > # Node ID b496a464399cb68628b09e52aa8cf379c98428e6
> > > # Parent  4ed8bb8a153f91420777d98dea10ebbcd403a375
> > > util: add utility function to skip avoiding file stat ambiguity if EPERM
> > 
> > The series looks good to me. I found a nit, but it isn't important. I'll
> > push the patches tomorrow if there are no other comments.
> > 
> > > --- a/mercurial/util.py
> > > +++ b/mercurial/util.py
> > > @@ -1497,6 +1497,24 @@ class filestat(object):
> > >  except AttributeError:
> > >  return False
> > >  
> > > +def avoidambig(self, path, old):
> > > +"""Change file stat of specified path to avoid ambiguity
> > > +
> > > +'old' should be previous filestat of 'path'.
> > > +
> > > +This skips avoiding ambiguity, if a process doesn't have
> > > +appropriate privileges for 'path'.
> > > +"""
> > > +advanced = (old.stat.st_mtime + 1) & 0x7fff
> > > +try:
> > > +os.utime(path, (advanced, advanced))
> > > +except OSError as inst:
> > > +if inst.errno == errno.EPERM:
> > > +# utime() on the file created by another user causes 
> > > EPERM,
> > > +# if a process doesn't have appropriate privileges
> > > +return
> > > +raise
> > 
> > avoidambig() has no relation to self, which requires path and old filestat 
> > as
> > arguments. So I don't think it should be a method of filestat class.
> 
> Oh, this useless "self" is a vestige of my wavering, sorry.
> 
> In fact, avoidambig() was implemented as a combination of examining
> ambiguity (= using self.stat) and advancing st_mtime for convenience,
> at first.
> 
>   before:
> if newstat.isambig(oldstat):
> advanced = (oldstat.stat.st_mtime + 1) & 0x7fff
> os.utime(dstpath, (advanced, advanced))
> 
>   after:
> newstat.avoidambig(oldstat, dstpath)
> 
> But finally, this patch separates advancing st_mtime from examining
> ambiguity for review-ability of fixing urgent bug on stable
> branch. With this simplified changes, reviewer can focus only on
> centralizing code path around os.utime() into avoidambig().
> 
> Then, which fixing below would you like on stable ?
> 
>   1. define avoidambig() as a regular function in util.py
> 
>   2. mark avoidambig() by @static annotation
> 
>  this can centralize logic related to ExactCacheValidationPlan
>  into filestat class
> 
>   3. integrate advancing st_mtime and examining ambiguity into
>  avoidambig() for convenience
> 
>  this requires "self" for avoidambig()

I've just pushed these patches. If we have a plan to extend avoidambig()
in default branch, I think the current state is fine. Anyway, that's a
really minor nit.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 3 of 6] patch: migrate to util.iterfile

2016-11-15 Thread Pierre-Yves David



On 11/15/2016 01:38 PM, Jun Wu wrote:

Excerpts from Yuya Nishihara's message of 2016-11-15 21:14:10 +0900:

On Mon, 14 Nov 2016 23:35:55 +, Jun Wu wrote:

# HG changeset patch
# User Jun Wu 
# Date 1479165246 0
#  Mon Nov 14 23:14:06 2016 +
# Node ID d1637df5ffd91e95da25213f06f346adb3269ace
# Parent  8097b8c66952003addd5b683a14265c1d3cc201f
# Available At https://bitbucket.org/quark-zju/hg-draft
#  hg pull https://bitbucket.org/quark-zju/hg-draft  -r d1637df5ffd9
patch: migrate to util.iterfile


Not all read()s are interruptible by signals. I don't remember the detail, but
signal(7) of Linux says "read(2) ... calls on "slow" devices. ... Note that a
(local) disk is not a slow device according to this definition; I/O operations
on disk devices are not interrupted by signals."


Good to know! I guess modern OS have similar decisions, although POSIX seems
to be not helpful here.


Since readline() doesn't have cache in CPython layer, it would be slightly
slower than fp.__iter__().


A quick benchmark shows it's ~4x slower.

Maybe make the function smarter to only use the workaround for a pipe?

  if hassafeattr(fp, 'fileno') and stat.S_ISFIFO(os.fstat(fp.fileno()).st_mode):
  return iter(fp.readline, '')
  else:
  return fp


Should we drop this series from hg-committed or a followup is enough?

Cheers,

--
Pierre-Yves David
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 6 of 8 V5] worker: make sure killworkers runs at most once

2016-11-15 Thread Yuya Nishihara
On Tue, 15 Nov 2016 02:39:09 +, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu 
> # Date 1479176697 0
> #  Tue Nov 15 02:24:57 2016 +
> # Node ID 8402c91c250a9dd369296dcdf00f7b50110ff6ae
> # Parent  9dbb3532b173a980f341e41d9e96338a386364e5
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #  hg pull https://bitbucket.org/quark-zju/hg-draft -r 
> 8402c91c250a
> worker: make sure killworkers runs at most once

> @@ -90,4 +91,5 @@ def _posixworker(ui, func, staticargs, a
>  signal.signal(signal.SIGINT, signal.SIG_IGN)
>  pids, problem = set(), [0]
> +problemcount = itertools.count()
>  def killworkers():
>  # if one worker bails, there's no good reason to wait for the rest
> @@ -114,5 +116,8 @@ def _posixworker(ui, func, staticargs, a
>  pids.remove(p)
>  st = _exitstatus(st)
> -if st and not problem[0]:
> +# next(itertools.count()) cannot be interrupted by a Python 
> signal
> +# handler - true for both CPython and PyPy. So killworkers() runs
> +# at most once.
> +if st and next(problemcount) == 0:

I'm not sure if this atomic counter buys. Since we'll remove threading and
unregister SIGCHLD handler before killworkers(), the only concern would be
whether pids are collected by waitpid() or not. If waitforworkers() is
interrupted between os.waitpid() and pids.remove(), the waited pid could be
killed again. But this patch doesn't prevent it.

  two workers; the latter fails:
  # pids = [2, 3]
  SIGCHLD-2
waitpid(2) -> p = 2, st = ok
SIGCHLD-3
  waitpid(2) -> ECHLD
  waitpid(3) -> p = 3, st = error
  pids.remove(3)
  # pids = [2]
  killworkers()
pids.remove(2)

  two workers; both fail:
  # pids = [2, 3]
  SIGCHLD-2
waitpid(2) -> p = 2, st = error
SIGCHLD-3
  waitpid(2) -> ECHLD
  waitpid(3) -> p = 3, st = error
  pids.remove(3)
  # pids = [2]
  killworkers()
pids.remove(2)
# pids = []
killworkers()

Maybe we can discard pid if waitpid() fails with ECHLD?

The series looks good. I'm going to queue these patches. If you agree this
atomic counter is somewhat redundant, I'll drop it.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 1 of 3 STABLE] util: add utility function to skip avoiding file stat ambiguity if EPERM

2016-11-15 Thread FUJIWARA Katsunori
At Mon, 14 Nov 2016 22:43:04 +0900,
Yuya Nishihara wrote:
> 
> On Sun, 13 Nov 2016 06:16:08 +0900, FUJIWARA Katsunori wrote:
> > # HG changeset patch
> > # User FUJIWARA Katsunori 
> > # Date 1478984783 -32400
> > #  Sun Nov 13 06:06:23 2016 +0900
> > # Branch stable
> > # Node ID b496a464399cb68628b09e52aa8cf379c98428e6
> > # Parent  4ed8bb8a153f91420777d98dea10ebbcd403a375
> > util: add utility function to skip avoiding file stat ambiguity if EPERM
> 
> The series looks good to me. I found a nit, but it isn't important. I'll
> push the patches tomorrow if there are no other comments.
> 
> > --- a/mercurial/util.py
> > +++ b/mercurial/util.py
> > @@ -1497,6 +1497,24 @@ class filestat(object):
> >  except AttributeError:
> >  return False
> >  
> > +def avoidambig(self, path, old):
> > +"""Change file stat of specified path to avoid ambiguity
> > +
> > +'old' should be previous filestat of 'path'.
> > +
> > +This skips avoiding ambiguity, if a process doesn't have
> > +appropriate privileges for 'path'.
> > +"""
> > +advanced = (old.stat.st_mtime + 1) & 0x7fff
> > +try:
> > +os.utime(path, (advanced, advanced))
> > +except OSError as inst:
> > +if inst.errno == errno.EPERM:
> > +# utime() on the file created by another user causes EPERM,
> > +# if a process doesn't have appropriate privileges
> > +return
> > +raise
> 
> avoidambig() has no relation to self, which requires path and old filestat as
> arguments. So I don't think it should be a method of filestat class.

Oh, this useless "self" is a vestige of my wavering, sorry.

In fact, avoidambig() was implemented as a combination of examining
ambiguity (= using self.stat) and advancing st_mtime for convenience,
at first.

  before:
if newstat.isambig(oldstat):
advanced = (oldstat.stat.st_mtime + 1) & 0x7fff
os.utime(dstpath, (advanced, advanced))

  after:
newstat.avoidambig(oldstat, dstpath)

But finally, this patch separates advancing st_mtime from examining
ambiguity for review-ability of fixing urgent bug on stable
branch. With this simplified changes, reviewer can focus only on
centralizing code path around os.utime() into avoidambig().

Then, which fixing below would you like on stable ?

  1. define avoidambig() as a regular function in util.py

  2. mark avoidambig() by @static annotation

 this can centralize logic related to ExactCacheValidationPlan
 into filestat class

  3. integrate advancing st_mtime and examining ambiguity into
 avoidambig() for convenience

 this requires "self" for avoidambig()

--
[FUJIWARA Katsunori] fo...@lares.dti.ne.jp
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH 2 of 2 V4] osutil: implement setprocname to set process title for some platforms

2016-11-15 Thread Yuya Nishihara
On Mon, 14 Nov 2016 16:35:42 +, Jun Wu wrote:
> # HG changeset patch
> # User Jun Wu 
> # Date 1478898677 0
> #  Fri Nov 11 21:11:17 2016 +
> # Node ID 695fa432ce250fdc40d5871a0009907262fe7949
> # Parent  98761d64eaaf67f3bdb99f3f80a57910e2624b78
> # Available At https://bitbucket.org/quark-zju/hg-draft
> #  hg pull https://bitbucket.org/quark-zju/hg-draft -r 
> 695fa432ce25
> osutil: implement setprocname to set process title for some platforms

Queued, thanks.

> +#ifndef SETPROCNAME_USE_NONE
> +static PyObject *setprocname(PyObject *self, PyObject *args)
> +{
> + const char *name = NULL;
> + if (!PyArg_ParseTuple(args, "s", ))
> + return NULL;
> +
> +#if defined(SETPROCNAME_USE_SETPROCTITLE)
> + setproctitle("%s", name);
> +#elif defined(SETPROCNAME_USE_ARGVREWRITE)
> + {
> + static char *argvstart = NULL;
> + static size_t argvsize = 0;
> + if (argvstart == NULL) {
> + int argc = 0, i;
> + char **argv = NULL;
> + extern void Py_GetArgcArgv(int *argc, char ***argv);
> + Py_GetArgcArgv(, );
> +
> + /* Check the memory we can use. Typically, argv[i] and
> +  * argv[i + 1] are continuous. */
> + char *argvend = argvstart = argv[0];

Moved the declaration of argvend next to argv.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel


Re: [PATCH RESEND] revert: do not reverse hunks in interactive when REV is not parent (issue5096)

2016-11-15 Thread Denis Laxalde

Pierre-Yves David a écrit :

We can extract two main points in the section you reply to and I'm not
sure which one you address in your reply. I'm going to re-emit them in a
simplified/modified form to clarify

First important point
=

In the following case:
* `hg revert -i`
* `hg revert -i -r ancestor`
* `hg revert -i -r precursor`
* `hg revert -i -r background` #combination of the above,

Seeing the hunk the way they are currently is exactly what I want to
see. These hunks are shown that way by all other commands (diff, commit,
amend, shelve) and not seeing the same thing in revert is super
confusing to me.


There's an essential different with these "other commands"
(commit/shelve) and revert: the former operate on the repository history
while the latter operates on the working directory. In this respect,
when interactively applying changes to the repository (commit-like
commands, former case), one gets prompted with a diff as it would apply
to the current revision. This patch essentially proposes to apply the
same logic to the revert command (i.e. prompt the user with a change
that would be applied to the working directory) and to prompt the user
with an intelligible action (avoid double negation).

For me, as a user, REV being in the "background" or not is not
particularly relevant. I sometimes use -r REV with REV being on some
"unrelated" branch. More often do I use -r REV with REV being a
descendant changeset (for instance to extract a refactoring that I
committed together with a functional change -- hg up REV^; hg rev -i -r
REV; then commit/rebase). I can't figure out how your reasoning would
apply in this case, but I'm certain that being prompted with changes as
they would apply to the working directory won't be confusing for me.


 (typical, (use hg diff, see something to drop; use hg
revert, drop the same thing at the same spot).


That one (which corresponds to `hg revert -i [-r .]`) already works that
way, only it shows a "discard this hunk" prompt.



An excellent example for such case is `hg revert -i -r .~1`, associate
with `hg pdif `(hg diff -r .~1) and amend I'm a bit user of it.

In that case, I'm not "worried it will be confusing" I know from
experience that I'm greatly confused by it. This was one of the
motivation of dcc56e10c23b;

The fact "hg revert -i" display the right thing in these situation is
partly validated by the fact this patch for not change the direction for
the first item in my list (plain `hg revert -i`)


Without -r REV, `hg revert -i` is basically a "discard uncommitted
changes" command. So ones gets prompted with these uncommitted changes
and whether to discard them or not. Sure the diff is reversed with
respected to other cases, but the fundamental idea is that the
combination of "diff direction" and "operation" (discard or apply) is
consistent.


The second import point
===

Having different direction given invocation is confusing.

Case one (this exact patch):


We have different direction between

* hg revert -i

and

* hg revert -i -r any


Again, if we consider the working directory as the reference to compare
to, the direction is not reversed in either case. We'll always be
prompted to accept changes (be they a "discard" or an "apply" operation)
from either the parent (first case, diff hunks as outputted by `hg diff
-r .`) or this "any" revision (second case, diff hunks as outputted by
`hg diff -r . -r any`) to the working directory.



I expect some confusion out of that and as pointed in the previous
section I think the behavior for '--rev background' is the wrong one.

Case two (keep current state for case point in the first section)
-

We have different direction between

* `hg revert -i`
* `hg revert -i -r ancestor`
* `hg revert -i -r precursor`
* `hg revert -i -r background` #combination of the above,

and

* `hg revert -i -r other`


Unless I missed something, directions should be the same for all cases
where -r is specified (i.e. `hg diff -r . -r REV`).



I'm still strongly inclined to take this patch, because you're the
only one opposed.


This is a backward compatibility breaking change, confusing some users,
and with currently no way to restore the previous experience. I greatly
advice we move with caution here, especially since we have a softer way
to more forward on the topic.


Which "softer way"? We could not drop the experimental option to let the
users you refer to have the direction "reversed", I guess.
___
Mercurial-devel mailing list
Mercurial-devel@mercurial-scm.org
https://www.mercurial-scm.org/mailman/listinfo/mercurial-devel