Re: Review of git multimail
Jed Brown j...@59a2.org writes: Note that RHEL5 has only python2.4 and will be supported through March, 2017. Since it is not feasible to have code that works in both python3 and any versions prior to python2.6, any chosen dialect will be broken by default on some major distributions that still have full vendor support. At worse, if git-multimail is ported to Python 3, people using old distros will still be able to use today's version which works in Python 2 and already does a good job. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
On 07/03/2013 12:21 AM, Junio C Hamano wrote: Ramkumar Ramachandra artag...@gmail.com writes: def get(self, name, default=''): try: values = self._split(read_git_output( ['config', '--get', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) Wait, what is the point of using --null and then splitting by hand using a poorly-defined static method? Why not drop the --null and splitlines() as usual? You may actually have spotted a bug or misuse of --get here. With this sample configuration: $ cat sample \EOF [a] one = value one = another [b] one = value\nanother EOF A script cannot differentiate between them without using '--null'. $ git config -f sample --get-all a.one $ git config -f sample --get-all b.one But that matters only when you use --get-all, not --get. If this method wants to make sure that the user did not misuse a.one as a multi-valued configuration variable, use of --null --get-all followed by checking how many items the command gives you back would be a way to do so. No, the code in question was a simple sanity check (i.e., mostly a check of my own sanity and understanding of git config behavior) preceding the information-losing next line return values[0]. If it had been meant as a check that the user hadn't misconfigured the system, then I wouldn't have used assert but rather raised a ConfigurationException with an explanatory message. I would be happy to add the checking that you described, but I didn't have the impression that it is the usual convention. Does code that wants a single value from the config usually verify that there is one-and-only-one value, or does it typically just do the equivalent of git config --get and use the returned (effectively the last) value? Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
Michael Haggerty mhag...@alum.mit.edu writes: I would be happy to add the checking that you described, but I didn't have the impression that it is the usual convention. Does code that wants a single value from the config usually verify that there is one-and-only-one value, or does it typically just do the equivalent of git config --get and use the returned (effectively the last) value? In most cases, variables are one value per key and follow the last one wins rule, which is the reason why we read from the most generic to the most specific (i.e. $GIT_DIR/config is read last). For such uses, reading from --get, and not from --get-all, is absolutely the right thing to do. But then as Ram said, there probably is not a need for --null; you can just read from textual --get to the end without any splitting (using splitlines is of course wrong if you do so). Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
On Wed, Jul 03, 2013 at 10:02:34AM +0200, Michael Haggerty wrote: On 07/03/2013 12:21 AM, Junio C Hamano wrote: Ramkumar Ramachandra artag...@gmail.com writes: def get(self, name, default=''): try: values = self._split(read_git_output( ['config', '--get', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) Wait, what is the point of using --null and then splitting by hand using a poorly-defined static method? Why not drop the --null and splitlines() as usual? You may actually have spotted a bug or misuse of --get here. With this sample configuration: $ cat sample \EOF [a] one = value one = another [b] one = value\nanother EOF A script cannot differentiate between them without using '--null'. $ git config -f sample --get-all a.one $ git config -f sample --get-all b.one But that matters only when you use --get-all, not --get. If this method wants to make sure that the user did not misuse a.one as a multi-valued configuration variable, use of --null --get-all followed by checking how many items the command gives you back would be a way to do so. No, the code in question was a simple sanity check (i.e., mostly a check of my own sanity and understanding of git config behavior) preceding the information-losing next line return values[0]. If it had been meant as a check that the user hadn't misconfigured the system, then I wouldn't have used assert but rather raised a ConfigurationException with an explanatory message. I would be happy to add the checking that you described, but I didn't have the impression that it is the usual convention. Does code that wants a single value from the config usually verify that there is one-and-only-one value, or does it typically just do the equivalent of git config --get and use the returned (effectively the last) value? Doesn't git config --get return an error if there are multiple values? The answer is apparently no - I wrote the text below from git-config(1) and then checked the behaviour. This seems to be a regression in git-config (bisect running now). I think the correct answer is what's below, but it doesn't work like this in current Git: If you want a single value then I think it's normal to just read the output of git config and let it handle the error cases, without needing to split the result at all. I think there is a different issue in the except block following the code quoted at the top though - you will return default if a key happens to be multi-valued. The script should check the return code and raise a ConfigurationException if it is 2. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
On Wed, Jul 03, 2013 at 09:29:02AM +0100, John Keeping wrote: Doesn't git config --get return an error if there are multiple values? The answer is apparently no - I wrote the text below from git-config(1) and then checked the behaviour. This seems to be a regression in git-config (bisect running now). Ah, that was an intentional change in commit 00b347d (git-config: do not complain about duplicate entries, 2012-10-23). So the issue is that the documentation was not updated when the behaviour was changed. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
Michael Haggerty wrote: On 07/02/2013 09:23 PM, Ramkumar Ramachandra wrote: git_multimail.py wrote: #! /usr/bin/env python2 Do all distributions ship it as python2 now? No, but nor is python always Python version 2.x (I believe that Arch Linux now installs Python 3 as python). This topic was discussed here [1]. Basically, my reasoning is that (a) PEP 394 [2] says that python2 is the correct name for a Python 2.x interpreter, (b) I believe that other distros are moving in that direction, and (c) if the script says python2 but no python2 is installed, the error is pretty obvious, whereas if the script says python and that is actually Python 3.x, the error would be more cryptic. Yeah, this is good reasoning. And yes, I'm on Arch: python points to python3, and python2 points to python2. A couple of thoughts while we're on the subject: 1. We should probably convert git-remote-{hg,bzr} to use this style too: they give cryptic errors now, and I have a ~/bin/python pointing to python2 which is higher up in $PATH to work around it. Debian uses an alternatives mechanism to have multiple versions of the same package, but I personally think the system is ugly. 2. Is there a way to determine the python version in-script and error-out quickly? Is it worth the ugliness? 'Command %s failed with retcode %s' % (' '.join(cmd), retcode,) So cmd is a list. Yes, commands are always lists in my code because it is less error-prone than trying to quote arguments correctly for the shell. Do you think I should document that here, or elsewhere, or everywhere, or ...? If you look at the prototype of execvpe(), the calling semantics are immediately clear, but we don't have that luxury in Python: probably rename the variable cmd_argv? class ConfigurationException(Exception): pass Dead code? No, this defines an exception class that inherits all of its methods (including its constructors) from Exception. This is useful because an exception of type ConfigurationException is distinguishable from other types of Exceptions, and can be caught using except ConfigurationException, e. Okay. I was under the impression that you had some future extension plans. def get(self, name, default=''): try: values = self._split(read_git_output( ['config', '--get', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) Wait, what is the point of using --null and then splitting by hand using a poorly-defined static method? Why not drop the --null and splitlines() as usual? To avoid confusion if a single config value contains end-of-line characters. In this case we are using --get, so only a single value is allowed anyway, and presumably we could take the output and strip a single last '\n' from it. But null-terminated output is simply easier to handle in general and I don't see an advantage to avoiding its usage. My rationale for avoiding it comes from Python's lack of inbuilt functions to handle it; but yeah, Junio pointed out that quirk in the previous email. return default else: raise raise what? This is the Python construct to re-throw the exception that was caught in the catch block containing it; i.e., the CommandError from a few lines earlier. Ah, thanks. def get_recipients(self, name, default=None): lines = self.get_all(name, default=None) if lines is None: return default return ', '.join(line.strip() for line in lines) Ugh. ? I would expect it to return a list that can be churned on further, not a string that's ready for rendering. Doesn't it resemble the dictionary get(), and even your own get_all() in name and signature? Dead code? git_multimail is used as a library by migrate-mailhook-config, and that script uses these methods. I see. Perhaps clean separation to avoid confusing readers? def generate_summaries(*log_args): cmd = [ 'log', '--abbrev', '--format=%h %s', ] + list(log_args) + ['--'] What is log_args if not a list? It is a tuple and therefore needs to be converted to a list here. Oh, I always thought *log_args meant list; to my surprise, it's a tuple. Just rev-parse --verify --short $SHA1^0: if it resolves, set self.short; one liner? I don't follow. We need both the long and the short SHA-1s to fill in the templates. What code exactly do you propose to replace with your one-liner? Oh, you need both. I was hoping to cut the cat-file -t too, but you need another call resolving $SHA1^{object} to distinguish between commits and tags; so never mind. What is the point of letting the user instantiate a GitObject without a valid .sha1 in the first place? '0'*40 is passed to the post-receive script to indicate no object; for example, a branch deletion is represented as
Re: Review of git multimail
Ramkumar Ramachandra wrote: New-style class. I wonder why you suddenly switched. ? All of the classes are new-style classes. When you say class Foo:, aren't you declaring an old-style class by default in python2? New-style classes are those that explicitly inherit from object (implicit in python3). I just noticed that all your classes are new-style. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
On 07/03/2013 12:23 PM, Ramkumar Ramachandra wrote: Michael Haggerty wrote: On 07/02/2013 09:23 PM, Ramkumar Ramachandra wrote: git_multimail.py wrote: #! /usr/bin/env python2 Do all distributions ship it as python2 now? No, but nor is python always Python version 2.x (I believe that Arch Linux now installs Python 3 as python). This topic was discussed here [1]. Basically, my reasoning is that (a) PEP 394 [2] says that python2 is the correct name for a Python 2.x interpreter, (b) I believe that other distros are moving in that direction, and (c) if the script says python2 but no python2 is installed, the error is pretty obvious, whereas if the script says python and that is actually Python 3.x, the error would be more cryptic. Yeah, this is good reasoning. And yes, I'm on Arch: python points to python3, and python2 points to python2. A couple of thoughts while we're on the subject: 1. We should probably convert git-remote-{hg,bzr} to use this style too: they give cryptic errors now, and I have a ~/bin/python pointing to python2 which is higher up in $PATH to work around it. Debian uses an alternatives mechanism to have multiple versions of the same package, but I personally think the system is ugly. 2. Is there a way to determine the python version in-script and error-out quickly? Is it worth the ugliness? The problems is that the whole script is parsed before execution starts, so using the wrong interpreter likely leads to a SyntaxError before the script even gains control. The correct solution, I'm afraid, is to have a build step that determines the correct Python shebang contents at build times and rewrites the script like the filename.perl - filename transformations that are already done for Perl. 'Command %s failed with retcode %s' % (' '.join(cmd), retcode,) So cmd is a list. Yes, commands are always lists in my code because it is less error-prone than trying to quote arguments correctly for the shell. Do you think I should document that here, or elsewhere, or everywhere, or ...? If you look at the prototype of execvpe(), the calling semantics are immediately clear, but we don't have that luxury in Python: probably rename the variable cmd_argv? Added to to-do list. [...] def get_recipients(self, name, default=None): lines = self.get_all(name, default=None) if lines is None: return default return ', '.join(line.strip() for line in lines) Ugh. ? I would expect it to return a list that can be churned on further, not a string that's ready for rendering. Doesn't it resemble the dictionary get(), and even your own get_all() in name and signature? We support both single and multiple-valued options, and each value can contain multiple comma-separated RFC 2822 email addresses: git config multimailhook.mailinglist la...@example.com git config --add multimailhook.mailinglist m...@example.com, cur...@example.com git config --add multimailhook.mailinglist 'Shemp, the other one sh...@example.com' (I think that last one is valid.) So we could turn the arguments into a list, but to be useful it would require the individual values to be parsed into possibly multiple addresses. That seemed overkill, given that we only need the result as a single string. [...] Dead code? git_multimail is used as a library by migrate-mailhook-config, and that script uses these methods. I see. Perhaps clean separation to avoid confusing readers? Think of git_multimail.py as a library that can be included, e.g. by a post-receive script, but just happens to be executable as well. Splitting it up more would prevent a one-file install, which I think would be unfortunate. [...] try: value = value % values except KeyError, e: if DEBUG: sys.stderr.write( 'Warning: unknown variable %r in the following line; line skipped:\n' '%s' % (e.args[0], line,) ) If DEBUG isn't on, you risk leaving the value string interpolated without even telling the user. What does it mean to the end user? There are some nice-to-have values in the templates that are not necessary and might be missing if the user hasn't gone to the trouble to configure every last setting. For example, if no emaildomain is defined then the pusher_email cannot be determined, resulting in the Reply-To header being omitted. My assumption is that a sysadmin would turn on DEBUG when testing the script, check that any missing headers are uninteresting, and then turn off DEBUG for production use so that users don't have to see the warnings every time they push. Ah, so that is the intended usage. If the impact of omitting certain headers (due to lack of information) doesn't result in unusable emails being generated, I think we're good.
Re: Review of git multimail
Ramkumar Ramachandra artag...@gmail.com writes: Yeah, this is good reasoning. And yes, I'm on Arch: python points to python3, and python2 points to python2. I'm also on Arch and it has been this way since October 2010 [1]. Ubuntu plans to remove python2 from the desktop CD images in 14.04 [2], so having code that does not work with python3 will become more painful pretty soon. Note that RHEL5 has only python2.4 and will be supported through March, 2017. Since it is not feasible to have code that works in both python3 and any versions prior to python2.6, any chosen dialect will be broken by default on some major distributions that still have full vendor support. A couple of thoughts while we're on the subject: 1. We should probably convert git-remote-{hg,bzr} to use this style too: Python-2.6.8 from python.org installs only python2.6 and python, but not python2, so this will break on a lot of older systems. Some distributions have been nice enough to provide python2 symlinks anyway. Michael's rationale that at least the error message is obvious still stands. Debian uses an alternatives mechanism to have multiple versions of the same package Alternatives is global configuration that doesn't really solve this problem anyway. [1] https://www.archlinux.org/news/python-is-now-python-3/ [2] https://wiki.ubuntu.com/Python/3 pgpSMrbzwufrN.pgp Description: PGP signature
Review of git multimail
Hi, I figured that we should quickly read through git-multimail and give it an on-list review. Hopefully, it'll educate the list about what this is, and help improve the script itself. Sources: https://github.com/mhagger/git-multimail git_multimail.py wrote: #! /usr/bin/env python2 Do all distributions ship it as python2 now? class CommandError(Exception): def __init__(self, cmd, retcode): self.cmd = cmd self.retcode = retcode Exception.__init__( self, 'Command %s failed with retcode %s' % (' '.join(cmd), retcode,) So cmd is a list. class ConfigurationException(Exception): pass Dead code? def read_git_output(args, input=None, keepends=False, **kw): Read the output of a Git command. return read_output( ['git', '-c', 'i18n.logoutputencoding=%s' % (ENCODING,)] + args, input=input, keepends=keepends, **kw ) Okay, although I'm wondering what i18n.logoutputencoding has to do with anything. def read_output(cmd, input=None, keepends=False, **kw): if input: stdin = subprocess.PIPE else: stdin = None p = subprocess.Popen( cmd, stdin=stdin, stdout=subprocess.PIPE, stderr=subprocess.PIPE, **kw ) (out, err) = p.communicate(input) retcode = p.wait() if retcode: raise CommandError(cmd, retcode) if not keepends: out = out.rstrip('\n\r') return out Helper function that serves a single caller, read_git_output(). def read_git_lines(args, keepends=False, **kw): Return the lines output by Git command. Return as single lines, with newlines stripped off. return read_git_output(args, keepends=True, **kw).splitlines(keepends) Okay. class Config(object): def __init__(self, section, git_config=None): Represent a section of the git configuration. If git_config is specified, it is passed to git config in the GIT_CONFIG environment variable, meaning that git config will read the specified path rather than the Git default config paths. self.section = section if git_config: self.env = os.environ.copy() self.env['GIT_CONFIG'] = git_config else: self.env = None Okay. @staticmethod def _split(s): Split NUL-terminated values. words = s.split('\0') assert words[-1] == '' return words[:-1] Ugh. Two callers of this poorly-defined static method: I wonder if we'd be better off inlining it. def get(self, name, default=''): try: values = self._split(read_git_output( ['config', '--get', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) Wait, what is the point of using --null and then splitting by hand using a poorly-defined static method? Why not drop the --null and splitlines() as usual? assert len(values) == 1 When does this assert fail? return values[0] except CommandError: return default If you're emulating the dictionary get method, default=None. This is not C, where all codepaths of the function must return the same type. def get_bool(self, name, default=None): try: value = read_git_output( ['config', '--get', '--bool', '%s.%s' % (self.section, name)], env=self.env, ) except CommandError: return default return value == 'true' Correct. On success, return bool. On failure, return None. def get_all(self, name, default=None): Read a (possibly multivalued) setting from the configuration. Return the result as a list of values, or default if the name is unset. try: return self._split(read_git_output( ['config', '--get-all', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) except CommandError, e: CommandError as e? if e.retcode == 1: What does this cryptic retcode mean? return default else: raise raise what? You've instantiated the Config class in two places: user and multimailhook sections. Considering that you're going to read all the keys in that section, why not --get-regexp, pre-load the configuration into a dictionary and refer to that instead of spawning 'git config' every time you need a configuration value? def get_recipients(self, name, default=None): Read a recipients list from the configuration. Return the result as a comma-separated list of email addresses, or default if the option is unset. If the setting has multiple values, concatenate them with comma separators. lines =
Re: Review of git multimail
On Wed, Jul 03, 2013 at 12:53:39AM +0530, Ramkumar Ramachandra wrote: class CommandError(Exception): def __init__(self, cmd, retcode): self.cmd = cmd self.retcode = retcode Exception.__init__( self, 'Command %s failed with retcode %s' % (' '.join(cmd), retcode,) So cmd is a list. class ConfigurationException(Exception): pass Dead code? Huh? ConfigurationException is used elsewhere. def read_git_output(args, input=None, keepends=False, **kw): Read the output of a Git command. return read_output( ['git', '-c', 'i18n.logoutputencoding=%s' % (ENCODING,)] + args, input=input, keepends=keepends, **kw ) Okay, although I'm wondering what i18n.logoutputencoding has to do with anything. Making sure the output is what's expected and not influenced by environment variables? def read_output(cmd, input=None, keepends=False, **kw): if input: stdin = subprocess.PIPE else: stdin = None p = subprocess.Popen( cmd, stdin=stdin, stdout=subprocess.PIPE, stderr=subprocess.PIPE, **kw ) (out, err) = p.communicate(input) retcode = p.wait() if retcode: raise CommandError(cmd, retcode) if not keepends: out = out.rstrip('\n\r') return out Helper function that serves a single caller, read_git_output(). def read_git_lines(args, keepends=False, **kw): Return the lines output by Git command. Return as single lines, with newlines stripped off. return read_git_output(args, keepends=True, **kw).splitlines(keepends) Okay. class Config(object): def __init__(self, section, git_config=None): Represent a section of the git configuration. If git_config is specified, it is passed to git config in the GIT_CONFIG environment variable, meaning that git config will read the specified path rather than the Git default config paths. self.section = section if git_config: self.env = os.environ.copy() self.env['GIT_CONFIG'] = git_config else: self.env = None Okay. @staticmethod def _split(s): Split NUL-terminated values. words = s.split('\0') assert words[-1] == '' return words[:-1] Ugh. Two callers of this poorly-defined static method: I wonder if we'd be better off inlining it. In what way poorly defined? Personally I'd make it _split_null at the top level but it seems sensible. def get(self, name, default=''): try: values = self._split(read_git_output( ['config', '--get', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) Wait, what is the point of using --null and then splitting by hand using a poorly-defined static method? Why not drop the --null and splitlines() as usual? assert len(values) == 1 When does this assert fail? In can't, which is presumably why it's an assert - it checks that we're processing the Git output as expected. return values[0] except CommandError: return default If you're emulating the dictionary get method, default=None. This is not C, where all codepaths of the function must return the same type. def get_bool(self, name, default=None): try: value = read_git_output( ['config', '--get', '--bool', '%s.%s' % (self.section, name)], env=self.env, ) except CommandError: return default return value == 'true' Correct. On success, return bool. On failure, return None. def get_all(self, name, default=None): Read a (possibly multivalued) setting from the configuration. Return the result as a list of values, or default if the name is unset. try: return self._split(read_git_output( ['config', '--get-all', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) except CommandError, e: CommandError as e? Not before Python 2.6. if e.retcode == 1: What does this cryptic retcode mean? It mirrors subprocess.CalledProcessError, retcode is the return code of the process. return default else: raise raise what? The current exception - this is pretty idiomatic Python. You've instantiated the Config class in two places: user and multimailhook sections. Considering that you're going to read all the keys in that section, why not --get-regexp, pre-load the configuration into a dictionary and refer to that instead of
Re: Review of git multimail
John Keeping wrote: I have to say that I don't think this is a particularly useful review, you seem to have skipped huge portions of the code and spent a lot of time making us read your thought process rather than providing constructive feedback. What feedback there is mostly seems to be expressions of disgust rather than actionable points. Well, ignore it then. I'm sorry for having wasted your time. I did what I could in the time I was willing to spend reading it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Review of git multimail
Ramkumar Ramachandra artag...@gmail.com writes: def get(self, name, default=''): try: values = self._split(read_git_output( ['config', '--get', '--null', '%s.%s' % (self.section, name)], env=self.env, keepends=True, )) Wait, what is the point of using --null and then splitting by hand using a poorly-defined static method? Why not drop the --null and splitlines() as usual? You may actually have spotted a bug or misuse of --get here. With this sample configuration: $ cat sample \EOF [a] one = value one = another [b] one = value\nanother EOF A script cannot differentiate between them without using '--null'. $ git config -f sample --get-all a.one $ git config -f sample --get-all b.one But that matters only when you use --get-all, not --get. If this method wants to make sure that the user did not misuse a.one as a multi-valued configuration variable, use of --null --get-all followed by checking how many items the command gives you back would be a way to do so. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html