Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
John Keeping writes: > On Thu, Jan 17, 2013 at 02:24:37PM -0800, Junio C Hamano wrote: >> John Keeping writes: >> >>> You're right - I think we need to add ", errors='replace'" to the call >>> to encode. >> >> Of if it is used just as a opaque token, you can .encode('hex') or >> something to punt on the whole issue, no? > > Even better. Are you happy to squash that in (assuming nothing else > comes up) or shall I resend? If you go the .encode('hex') route, the log message needs to explain why the hashed values are now different from the old implementation and justify why it is safe to do so. I do not think I want to do that myself ;-). Thanks. > > git-remote-testpy.py | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/git-remote-testpy.py b/git-remote-testpy.py > index d94a66a..f8dc196 100644 > --- a/git-remote-testpy.py > +++ b/git-remote-testpy.py > @@ -31,9 +31,9 @@ from git_remote_helpers.git.exporter import GitExporter > from git_remote_helpers.git.importer import GitImporter > from git_remote_helpers.git.non_local import NonLocalGit > > -if sys.hexversion < 0x01050200: > -# os.makedirs() is the limiter > -sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or > later.\n") > +if sys.hexversion < 0x0200: > +# string.encode() is the limiter > +sys.stderr.write("git-remote-testgit: requires Python 2.0 or > later.\n") > sys.exit(1) > > def get_repo(alias, url): > @@ -45,7 +45,7 @@ def get_repo(alias, url): > repo.get_head() > > hasher = _digest() > -hasher.update(repo.path) > +hasher.update(repo.path.encode('utf-8')) > repo.hash = hasher.hexdigest() > > repo.get_base_path = lambda base: os.path.join( -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
On Thu, Jan 17, 2013 at 02:24:37PM -0800, Junio C Hamano wrote: > John Keeping writes: > >> You're right - I think we need to add ", errors='replace'" to the call >> to encode. > > Of if it is used just as a opaque token, you can .encode('hex') or > something to punt on the whole issue, no? Even better. Are you happy to squash that in (assuming nothing else comes up) or shall I resend? git-remote-testpy.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/git-remote-testpy.py b/git-remote-testpy.py index d94a66a..f8dc196 100644 --- a/git-remote-testpy.py +++ b/git-remote-testpy.py @@ -31,9 +31,9 @@ from git_remote_helpers.git.exporter import GitExporter from git_remote_helpers.git.importer import GitImporter from git_remote_helpers.git.non_local import NonLocalGit -if sys.hexversion < 0x01050200: -# os.makedirs() is the limiter -sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or later.\n") +if sys.hexversion < 0x0200: +# string.encode() is the limiter +sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n") sys.exit(1) def get_repo(alias, url): @@ -45,7 +45,7 @@ def get_repo(alias, url): repo.get_head() hasher = _digest() -hasher.update(repo.path) +hasher.update(repo.path.encode('utf-8')) repo.hash = hasher.hexdigest() repo.get_base_path = lambda base: os.path.join( -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
John Keeping writes: > You're right - I think we need to add ", errors='replace'" to the call > to encode. Of if it is used just as a opaque token, you can .encode('hex') or something to punt on the whole issue, no? > >> > git-remote-testpy.py | 8 >> > 1 file changed, 4 insertions(+), 4 deletions(-) >> > >> > diff --git a/git-remote-testpy.py b/git-remote-testpy.py >> > index d94a66a..f8dc196 100644 >> > --- a/git-remote-testpy.py >> > +++ b/git-remote-testpy.py >> > @@ -31,9 +31,9 @@ from git_remote_helpers.git.exporter import GitExporter >> > from git_remote_helpers.git.importer import GitImporter >> > from git_remote_helpers.git.non_local import NonLocalGit >> > >> > -if sys.hexversion < 0x01050200: >> > -# os.makedirs() is the limiter >> > -sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or >> > later.\n") >> > +if sys.hexversion < 0x0200: >> > +# string.encode() is the limiter >> > +sys.stderr.write("git-remote-testgit: requires Python 2.0 or >> > later.\n") >> > sys.exit(1) >> > >> > def get_repo(alias, url): >> > @@ -45,7 +45,7 @@ def get_repo(alias, url): >> > repo.get_head() >> > >> > hasher = _digest() >> > -hasher.update(repo.path) >> > +hasher.update(repo.path.encode('utf-8')) >> > repo.hash = hasher.hexdigest() >> > >> > repo.get_base_path = lambda base: os.path.join( -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
On Thu, Jan 17, 2013 at 09:00:48PM +, John Keeping wrote: > On Thu, Jan 17, 2013 at 12:36:33PM -0800, Junio C Hamano wrote: >> John Keeping writes: >> >>> Under Python 3 'hasher.update(...)' must take a byte string and not a >>> unicode string. Explicitly encode the argument to this method as UTF-8 >>> so that this code works under Python 3. >>> >>> This moves the required Python version forward to 2.0. >>> >>> Signed-off-by: John Keeping >>> --- >> >> Hmph. So what happens when the path is _not_ encoded in UTF-8? > > Do you mean encodable? As you say below it will currently throw an > exception. Now my brain's not working - we shouldn't get an error converting from a Unicode string to UTF-8, so I think this patch is OK as it is. > > Is the repo.hash (and local.hash that gets a copy of it) something > > that needs to stay the same across multiple invocations of this > > remote helper, and between the currently shipped Git and the version > > of Git after applying this patch? > > It's used to specify the path of the repository for importing or > exporting, so it should stay consistent across invocations. However, > this is only an example remote helper so I don't think we should worry > if it changes from one Git release to the next. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
On Thu, Jan 17, 2013 at 12:36:33PM -0800, Junio C Hamano wrote: > John Keeping writes: > >> Under Python 3 'hasher.update(...)' must take a byte string and not a >> unicode string. Explicitly encode the argument to this method as UTF-8 >> so that this code works under Python 3. >> >> This moves the required Python version forward to 2.0. >> >> Signed-off-by: John Keeping >> --- > > Hmph. So what happens when the path is _not_ encoded in UTF-8? Do you mean encodable? As you say below it will currently throw an exception. > Is the repo.hash (and local.hash that gets a copy of it) something > that needs to stay the same across multiple invocations of this > remote helper, and between the currently shipped Git and the version > of Git after applying this patch? It's used to specify the path of the repository for importing or exporting, so it should stay consistent across invocations. However, this is only an example remote helper so I don't think we should worry if it changes from one Git release to the next. >If that is not the case, and if > this is used only to get a randomly-looking 40-byte hexadecimal > string, then a lossy attempt to .encode('utf-8') and falling back to > replace or ignore bytes in the original that couldn't be interpreted > as part of a UTF-8 string would be OK, but doesn't .encode('utf-8') > throw an exception if not told to 'ignore' or something? You're right - I think we need to add ", errors='replace'" to the call to encode. > > git-remote-testpy.py | 8 > > 1 file changed, 4 insertions(+), 4 deletions(-) > > > > diff --git a/git-remote-testpy.py b/git-remote-testpy.py > > index d94a66a..f8dc196 100644 > > --- a/git-remote-testpy.py > > +++ b/git-remote-testpy.py > > @@ -31,9 +31,9 @@ from git_remote_helpers.git.exporter import GitExporter > > from git_remote_helpers.git.importer import GitImporter > > from git_remote_helpers.git.non_local import NonLocalGit > > > > -if sys.hexversion < 0x01050200: > > -# os.makedirs() is the limiter > > -sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or > > later.\n") > > +if sys.hexversion < 0x0200: > > +# string.encode() is the limiter > > +sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n") > > sys.exit(1) > > > > def get_repo(alias, url): > > @@ -45,7 +45,7 @@ def get_repo(alias, url): > > repo.get_head() > > > > hasher = _digest() > > -hasher.update(repo.path) > > +hasher.update(repo.path.encode('utf-8')) > > repo.hash = hasher.hexdigest() > > > > repo.get_base_path = lambda base: os.path.join( -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
Junio C Hamano writes: > John Keeping writes: > >> Under Python 3 'hasher.update(...)' must take a byte string and not a >> unicode string. Explicitly encode the argument to this method as UTF-8 >> so that this code works under Python 3. >> >> This moves the required Python version forward to 2.0. >> >> Signed-off-by: John Keeping >> --- > > Hmph. So what happens when the path is _not_ encoded in UTF-8? Oh, my brain was not working. Forget this part, and sorry for the noise. We are not decoding a bytestring to an array of unicode characters, but going the other way around here. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
John Keeping writes: > Under Python 3 'hasher.update(...)' must take a byte string and not a > unicode string. Explicitly encode the argument to this method as UTF-8 > so that this code works under Python 3. > > This moves the required Python version forward to 2.0. > > Signed-off-by: John Keeping > --- Hmph. So what happens when the path is _not_ encoded in UTF-8? Is the repo.hash (and local.hash that gets a copy of it) something that needs to stay the same across multiple invocations of this remote helper, and between the currently shipped Git and the version of Git after applying this patch? If that is not the case, and if this is used only to get a randomly-looking 40-byte hexadecimal string, then a lossy attempt to .encode('utf-8') and falling back to replace or ignore bytes in the original that couldn't be interpreted as part of a UTF-8 string would be OK, but doesn't .encode('utf-8') throw an exception if not told to 'ignore' or something? > git-remote-testpy.py | 8 > 1 file changed, 4 insertions(+), 4 deletions(-) > > diff --git a/git-remote-testpy.py b/git-remote-testpy.py > index d94a66a..f8dc196 100644 > --- a/git-remote-testpy.py > +++ b/git-remote-testpy.py > @@ -31,9 +31,9 @@ from git_remote_helpers.git.exporter import GitExporter > from git_remote_helpers.git.importer import GitImporter > from git_remote_helpers.git.non_local import NonLocalGit > > -if sys.hexversion < 0x01050200: > -# os.makedirs() is the limiter > -sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or later.\n") > +if sys.hexversion < 0x0200: > +# string.encode() is the limiter > +sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n") > sys.exit(1) > > def get_repo(alias, url): > @@ -45,7 +45,7 @@ def get_repo(alias, url): > repo.get_head() > > hasher = _digest() > -hasher.update(repo.path) > +hasher.update(repo.path.encode('utf-8')) > repo.hash = hasher.hexdigest() > > repo.get_base_path = lambda base: os.path.join( -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 6/8] git-remote-testpy: hash bytes explicitly
Under Python 3 'hasher.update(...)' must take a byte string and not a unicode string. Explicitly encode the argument to this method as UTF-8 so that this code works under Python 3. This moves the required Python version forward to 2.0. Signed-off-by: John Keeping --- git-remote-testpy.py | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/git-remote-testpy.py b/git-remote-testpy.py index d94a66a..f8dc196 100644 --- a/git-remote-testpy.py +++ b/git-remote-testpy.py @@ -31,9 +31,9 @@ from git_remote_helpers.git.exporter import GitExporter from git_remote_helpers.git.importer import GitImporter from git_remote_helpers.git.non_local import NonLocalGit -if sys.hexversion < 0x01050200: -# os.makedirs() is the limiter -sys.stderr.write("git-remote-testgit: requires Python 1.5.2 or later.\n") +if sys.hexversion < 0x0200: +# string.encode() is the limiter +sys.stderr.write("git-remote-testgit: requires Python 2.0 or later.\n") sys.exit(1) def get_repo(alias, url): @@ -45,7 +45,7 @@ def get_repo(alias, url): repo.get_head() hasher = _digest() -hasher.update(repo.path) +hasher.update(repo.path.encode('utf-8')) repo.hash = hasher.hexdigest() repo.get_base_path = lambda base: os.path.join( -- 1.8.1.1.260.g99b33f4.dirty -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html