Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
Hi, On 2015-02-01 23:34, Junio C Hamano wrote: Jeff King p...@peff.net writes: 1. I'm a user who has set my preferred core.whitespace in my ~/.gitconfig. A particular project I am working on uses an alternate tabwidth. How do I set that in the repo config without repeating my defaults? Isn't it cumulative? At least it should be (but I wouldn't be too surprised if the recent config reader caching broke it). Please note that it is really, really difficult for a regular Git user to figure out which whitespace settings are in effect using just `git config` whether it is cumulative or not. Also, please note it makes it hard on users that there are a bunch of config settings which *override* previous settings, and others that are *cumulative*. The latter we cannot change, and the former we cannot change for the whitespace settings. However, when introducing new settings (such as the fsck severity levels), we should go out of our way to keep things as simple as possible. For example, if a *simple* `git config` can answer the question Is feature XYZ turned on, I would consider the design superior to a design that requires additional code just to figure out whether feature XYZ is turned on, let alone to turn it on without affecting other settings. In other words, I would like to caution against recommending the whitespace setting as an example to follow: anybody who is unfamiliar with the whitespace setting will find the cumulative last-setting-wins (per item inside the whitespace-separated list, not per config setting) strategy confusing. On the other hand, if you offer `whitespace.tabWidth` and `whitespace.indentWithNoTab` separately, everything is really crystal clear to new users without having much explaining to do, let alone having to introduce new tooling. Ciao, Dscho P.S.: Of course I know that it is too late for `whitespace`, but it offers itself as a good subject to demonstrate the merits of the different suggestions. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
Jeff King p...@peff.net writes: I didn't reply to the latter part of this message yesterday, because I wanted to think more on it. But is it such a bad thing to have them in conflict? Can't we define a set of rules that does what people expects? For example, by the last one wins principle, any time we see whitespace.tab-in-indent, it clears the setting of whitespace.indent-with-non-tab, and vice versa. This isn't represented syntactically in the config file, ... I agree. Both one-variable-per-knob and value-with-list-of-knobs do not express the semantic linkage between knobs; once we convince the users that one-variable-per-knob format does not mean they represent independent and orthgonal settings, the issue becomes a trade-off between * Is it concise to let end users skim through? * Is it easy to parse by scripters? By the way, this is the exact case I am concerned about for fsck.*. Our use case at GitHub would be something like: a. We set up sane defaults for fsck.* in /etc/gitconfig b. User complains that we will not accept their push, which contains objects with malformed committers. c. Support investigates, determines that the malformed objects are part of a well-established history, and that they are OK to enter. d. We relax fsck.committerIdent in that repo's $GIT_DIR/config file. Copy-and-pasting the rest of the rules from (a) into the repo config file in step (d) is not ideal. It probably can be worked around by the later-one-wins rule per item, e.g. after seeing fsck.ignore = A B C in /etc/gitconfig and then seeing fsck.error = B in $GIT_DIR/config, the latter will flip the three-way radio button for B from ignore to error (the other possible setting of the radio button is 'warn'), while leaving the three-way radio buttons for A and C set to ignore. We can (and have to) do the same with fsck.B = ignore in /etc/gitconfig that gets overridden by fsck.B = error in $GIT_DIR/config, and that comes _free_, which makes it an attractive proposition. As I already said, I am fine with fsck.missingTagger = ignore. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
Michael Haggerty mhag...@alum.mit.edu writes: You make an interesting point: values that start as a list of independent booleans can grow dependencies over time. In retrospect, ISTM that a better interface for the indentation-related whitespace settings would have been something like * whitespace.indent -- a single value chosen from tabs-only | spaces-only | tabs-when-possible | anything * whitespace.tabwidth -- an integer value This would have made the mutual-exclusivity of those choices manifest in the style of configuration rather than hoping that the user notices that his settings contradict each other. Let's dig into this example some more by imagining some other hypothetical future extensions. Let's not; that line of thought entirely misses the point. If you start from one set of variables, you can define a structure (e.g. there are indentation-related and you must choose only one among them) over it after the fact. Once you have chosen a structure, you have to live with it. Either you make sure that a structure itself is extensible, or you make sure you accept a new variable only if it fits within a structure. Either way, you lose. You cannot predict the future, and you do not want to constrain those who will contribute to the project in the future. My aversion to one-variable-per-knob was primarily against the because that is how the variables are internally represented; a collection of enums that can be independently set argument. If we assume that one-variable-per-knob style implies variables that can be independently set, that _is_ defining the structure the future work has to live within. But as I and Peff discussed in the other sub(sub)thread, having two variables placed flatly in the namespace, e.g. ws.indentWithTab and ws.noTabInIndent, does not have to mean they are independent. And the opposite is also true; having these two knobs as possible elements of the value of a single variable does not imply they always have meaningful interactions. So I am fine with fsck.missingTagger = ignore/warn/error, as long as the argument that supports the style is not because fsck.missingTagger and fsck.malformedIdent are independent. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
On Sun, Feb 01, 2015 at 06:12:38AM +0100, Michael Haggerty wrote: + When choosing the variable namespace, do not use variable name for + specifying possibly unbounded set of things, most notably anything + an end user can freely come up with (e.g. branch names), but also + large fixed set defined by the system that can grow over time + (e.g. what kind of common whitespace problems to notice). I think we can all agree with this rule for anything an end user can freely come up with. Such sets are truly unbounded. But what is the justification for applying it to large fixed set defined by the system that can grow over time? Any set of items that needs to be programmed one by one is not unbounded in the same sense. It is true that it can grow over time, but there is a practical limit on how many such options we would ever implement, and at any given time the set has a well-defined, finite number of members. I had the same reaction on reading this. We should be striving to break config options down as much as possible to single scalar values, because that is the only format that is understood systematically by the config code. If a config option's value is a list, then we have to come up with an ad-hoc syntax for the list, which we parse in the config callback. And that leaves users of git config to reinvent that parsing themselves when they want to do simple things like remove item B from the list. I think the examples you gave over on the fsck thread all make the same point. + [...] Use + subsection names or variable values, like existing variables + branch.name.description and core.whitespace do, instead. But there is also a precedent for the opposite approach: advice.*. The pager.*, color.* (and color.$program.*) examples come to mind. For example, we did not add: [core] usePagerFor = log, diff, -status but instead: [pager] log = true diff = true status = false Not only is the latter easier to manipulate and examine with the existing config tools, I think it is more flexible in the long run. We later extended the syntax to allow: [pager] log = diff-highlight | less which would have been even more awkward in the userPagerFor format (you could use log=..., of course, but now you need to get into whitespace quoting and other complexities, all of which are handled already by the config code in the latter case). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
Michael Haggerty mhag...@alum.mit.edu writes: On 01/28/2015 11:33 PM, Junio C Hamano wrote: + When choosing the variable namespace, do not use variable name for + specifying possibly unbounded set of things, most notably anything + an end user can freely come up with (e.g. branch names), but also + large fixed set defined by the system that can grow over time + (e.g. what kind of common whitespace problems to notice). I think we can all agree with this rule for anything an end user can freely come up with. Such sets are truly unbounded. But what is the justification for applying it to large fixed set defined by the system that can grow over time? Any set of items that needs to be programmed one by one is not unbounded in the same sense. It is true that it can grow over time, but there is a practical limit on how many such options we would ever implement, and at any given time the set has a well-defined, finite number of members. I suppose that this is a reaction to Johannes's proposal [1] to add configuration settings like git config fsck.badDate ignore Not really. This started after I looked at the add.ignore-errors issue discussed and I was trying to codify existing practices by running greps, blames and logs on 'master' to see what potential pitfalls are, and what good existing practices to follow suit we already had. The fsck changes were not in my mind at all. I can see it argued that for things that are completely independent (e.g. the consequence of setting fsck.badDate can never be affected by how fsck.missingTagger is configured), separate configuration variables may not be wrong per-se, but I can see that a set of knobs that would have been originally independent, as the operation grow richer, gain more semantics to make them related, and at that point, I doubt they are internally independent; expose them as independent to the end users argument would hold water that well. A good example is core.whitespace, where you would have started with a simple set of booleans (blank-at-eol and space-before-tab are conceptually independent and will stay to be), but once you start supporting indent-with-non-tab and tab-in-indent you cannot pretend that they are independent. And that is the existing practice I primarily had in mind. We didn't do whitespace.tabInIndent = true whitespace.indentWithNonTab = true to pretend they are independent and still internally having to make sure the consistency of the setting. We structured the syntax for ease of the end user (not scripter) to shorter core.whitespace = tab-in-indent,indent-with-non-tab as we need the consistency thing either way (and it is easier to see the consistency mistakes when they appear next to each other). And I am happy that we chose wisely in an early version that didn't use one-variable-per-knob but used list-of-features-as-value instead. [2] which you didn't like [3] but I did [4]. (Did you miss [4], in which I think I made some good arguments for Johannes's proposal? I don't think you responded to it.) I saw it, found it as an argument between not good and weak (see above), kept in my mailbox while trying to decide if it was worth responding, and then forgot about it after I got busy dealing with other topics that have more relevance to the upcoming release ;-) I think it would be more productive to finish the concrete discussion about the fsck proposal,... Surely. (0) fsck.bad-date is out. (1) fsck.badDate is OK. I expect there is very high chance for them to stay independent forever. (2) fsck.ignore = bad-date,... is still my preference, only if it aligns with existing core.whitespace precedence that allows users to leverage the familiarity. I see Peff cites pager.cmd, but I think it was something that we would rather shouldn't have done, similar to alias.cmd. They are bad precedents we shouldn't encourage new things to mimic. But that is not from one-variable-with-list-is-better (it is not better for these independent ones) but is purely from the syntax point of view. There is no good reason to force case insensitivity to the command names used as the key in these cases, but we do because we made them the last-level variable names, and we have to avoid hyphens in the command names while at it if we want to be consistent (and the consistency was the point of the original add.ignore-errors discussion that started all this, after all). If they were pager.diff.enable = true alias.co.command = checkout the world would have been a lot better place. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
On Sun, Feb 01, 2015 at 12:18:34PM -0800, Junio C Hamano wrote: I can see it argued that for things that are completely independent (e.g. the consequence of setting fsck.badDate can never be affected by how fsck.missingTagger is configured), separate configuration variables may not be wrong per-se, but I can see that a set of knobs that would have been originally independent, as the operation grow richer, gain more semantics to make them related, and at that point, I doubt they are internally independent; expose them as independent to the end users argument would hold water that well. I'm not sure I buy this argument. There are two syntaxes being discussed here. Let's look at each. Let's call this type 1: [core] foo = bar -moof xyzzy=whatever We have a single variable, but the value of that variable is essentially a list of sub-variables. The sub-variables are either booleans (like bar or -moof) or possibly have values of their own (with an =). The sub-variables are tied together logically by being part of a single core.foo. Presumably the user can infer some relationship between them through this. Type 2 is more like: [foo] bar = true moof = false xyzzy = whatever So we still have our same set of sub-variables, except now each is a first-class config variable. They are tied together logically by being part of a single section (I called it foo here, which drops the core; but clearly we could give it whatever descriptive name we wanted). From the user's perspective, I don't see how the implied relationships are significantly different. In type 1, they are placed inside a single value and in type 2, they are not. Both are a form of grouping. Moreover, I am not even sure that the syntax is an important element in communicating semantic relationships. In these examples, are bar and moof dependent? Clearly they are _related_ by our grouping. But does one depend on the other? Similarly, with the existing core.whitespace, what tells you which of the sub-variables are related. The blank-at-eol and space-before-tab variables are both whitespace-related, but do not depend on each other at all. But indent-with-non-tab and tab-in-indent are. Yet those two pairs are represented the same way syntactically. I don't think you can infer semantic independence from syntax. It's simply too blunt an instrument (and as you noted, it may even change over time). We structured the syntax for ease of the end user (not scripter) to shorter core.whitespace = tab-in-indent,indent-with-non-tab as we need the consistency thing either way (and it is easier to see the consistency mistakes when they appear next to each other). I agree that this provides a slight advantage to type 1, because it requires syntactically that the definitions are near each other (whereas with split variables, they can literally be in different files). And if everything else were equal, that would be enough. But I think type 1 has a lot of other disadvantages. For example: 1. I'm a user who has set my preferred core.whitespace in my ~/.gitconfig. A particular project I am working on uses an alternate tabwidth. How do I set that in the repo config without repeating my defaults? 2. I'm writing a hook whose behavior depends on the whitespace settings. How do I ask git whether blank-at-eol is enabled? 3. I'm a user who wants to set whitespace config. I prefer using git config to editing the file manually. How do I turn off blank-at-eol without disrupting my existing settings? An astute reader will notice that case (1) is a double-edged sword. If your defaults have blank-at-eol and you want to set indent-with-non-tab in the repo, that's fine. If the default is tab-in-indent and you want to set indent-with-non-tab, then those are in conflict (i.e., this is the exact thing you were complaining about above). But is it such a bad thing to have them in conflict? Can't we define a set of rules that does what people expects? For example, by the last one wins principle, any time we see whitespace.tab-in-indent, it clears the setting of whitespace.indent-with-non-tab, and vice versa. This isn't represented syntactically in the config file, but it is an easy rule, does what people would want, and can be documented in config.txt (which is where we have to talk about such semantic conflicts anyway). By the way, this is the exact case I am concerned about for fsck.*. Our use case at GitHub would be something like: a. We set up sane defaults for fsck.* in /etc/gitconfig b. User complains that we will not accept their push, which contains objects with malformed committers. c. Support investigates, determines that the malformed objects are part of a well-established history, and that they are OK to enter. d. We relax fsck.committerIdent in that repo's $GIT_DIR/config file. Copy-and-pasting the rest of the rules from (a) into the repo config file in step (d) is not ideal.
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
Jeff King p...@peff.net writes: From the user's perspective, I don't see how the implied relationships are significantly different. In type 1, they are placed inside a single value and in type 2, they are not. Both are a form of grouping. Moreover, I am not even sure that the syntax is an important element in communicating semantic relationships. I think we are in agreement and I do not see how you can draw different conclusions. The above argument refutes the point Michael made a big deal out of, which was that these are individual and independent bools in the implementation detail of the code, expose that as such to the end users. I do not buy that point, i.e. it is not a good argument to favor style 2 over style 1, which was the primary thing I wanted to illustrate in the message you are responding to. 1. I'm a user who has set my preferred core.whitespace in my ~/.gitconfig. A particular project I am working on uses an alternate tabwidth. How do I set that in the repo config without repeating my defaults? Isn't it cumulative? At least it should be (but I wouldn't be too surprised if the recent config reader caching broke it). 2. I'm writing a hook whose behavior depends on the whitespace settings. How do I ask git whether blank-at-eol is enabled? If that becomes an issue, git config would have to learn about them, just like it knows about how to do --color depending on the tty-ness of the output. 3. I'm a user who wants to set whitespace config. I prefer using git config to editing the file manually. How do I turn off blank-at-eol without disrupting my existing settings? See above 1. I see Peff cites pager.cmd, but I think it was something that we would rather shouldn't have done, similar to alias.cmd. They are bad precedents we shouldn't encourage new things to mimic. But that is not from one-variable-with-list-is-better (it is not better for these independent ones) but is purely from the syntax point of view. Yeah, I'd agree that the problem there is orthogonal to the type 1/2 thing above. I don't think it has been a big deal in practice, just because people with good taste do not name their commands with uppercase anyway. I'd be happy to transition to pager.*.enabled, etc, if we care. I have no strong opinion. It was primarily meant to illustrate why pager.cmd and alias.cmd were bad precedents that should not be used to support design of future things. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
Junio, Thanks for your thoughtful response. On 02/01/2015 09:18 PM, Junio C Hamano wrote: Michael Haggerty mhag...@alum.mit.edu writes: On 01/28/2015 11:33 PM, Junio C Hamano wrote: + When choosing the variable namespace, do not use variable name for + specifying possibly unbounded set of things, most notably anything + an end user can freely come up with (e.g. branch names), but also + large fixed set defined by the system that can grow over time + (e.g. what kind of common whitespace problems to notice). I think we can all agree with this rule for anything an end user can freely come up with. Such sets are truly unbounded. But what is the justification for applying it to large fixed set defined by the system that can grow over time? [...] I can see it argued that for things that are completely independent (e.g. the consequence of setting fsck.badDate can never be affected by how fsck.missingTagger is configured), separate configuration variables may not be wrong per-se, but I can see that a set of knobs that would have been originally independent, as the operation grow richer, gain more semantics to make them related, and at that point, I doubt they are internally independent; expose them as independent to the end users argument would hold water that well. A good example is core.whitespace, where you would have started with a simple set of booleans (blank-at-eol and space-before-tab are conceptually independent and will stay to be), but once you start supporting indent-with-non-tab and tab-in-indent you cannot pretend that they are independent. And that is the existing practice I primarily had in mind. We didn't do whitespace.tabInIndent = true whitespace.indentWithNonTab = true to pretend they are independent and still internally having to make sure the consistency of the setting. We structured the syntax for ease of the end user (not scripter) to shorter core.whitespace = tab-in-indent,indent-with-non-tab as we need the consistency thing either way (and it is easier to see the consistency mistakes when they appear next to each other). And I am happy that we chose wisely in an early version that didn't use one-variable-per-knob but used list-of-features-as-value instead. You make an interesting point: values that start as a list of independent booleans can grow dependencies over time. In retrospect, ISTM that a better interface for the indentation-related whitespace settings would have been something like * whitespace.indent -- a single value chosen from tabs-only | spaces-only | tabs-when-possible | anything * whitespace.tabwidth -- an integer value This would have made the mutual-exclusivity of those choices manifest in the style of configuration rather than hoping that the user notices that his settings contradict each other. Let's dig into this example some more by imagining some other hypothetical future extensions. Suppose we wanted to support different whitespace rules for different types of files [1]. For example, I might never want to forbid cr-at-eol everywhere, and might usually like to uses spaces for my indentation, but for Makefiles need to indent using tabs. The type 2 syntax, I think, is pretty straightforward: [whitespace] cr-at-eol = error indent = spaces-only [whitespace makefile] indent = tabs-only Our usual rules, last setting wins and foo.*.bar, if present, takes precedence overrides foo.bar, make it pretty clear how the above should be interpreted. What would be the type 1 syntax for this? Would cr-at-eol be inherited from core.whitespace to core.makefile.whitespace? If not, then I have to repeat cr-at-eol: [core] whitespace = cr-at-eol tab-in-indent [core makefile] whitespace = cr-at-eol indent-with-non-tab [2]. If values are inherited, then do I also have to countermand tab-in-indent in the makefile rule? [core] whitespace = cr-at-eol tab-in-indent [core makefile] whitespace = indent-with-non-tab -tab-in-indent Or does indent-with-non-tab automatically supersede tab-in-indent, according to last-setting-wins (an interpretation that starts requiring the config parser to have domain-specific knowledge)?: [core] whitespace = cr-at-eol tab-in-indent [core makefile] whitespace = indent-with-non-tab But if that is the case, which setting wins in this scenario?: [core] whitespace = cr-at-eol tab-in-indent [core makefile] whitespace = indent-with-non-tab # In another config file maybe: [core] whitespace = space-before-tab Does core.whitespace = space-before-tab supersede core.makefile.whitespace = indent-with-non-tab here? No matter which of the type 1 variants we choose, we would have to invent new rules for users and config parsers to learn, and some of those rules would require
Re: [PATCH 3/3] CodingGuidelines: describe naming rules for configuration variables
On 01/28/2015 11:33 PM, Junio C Hamano wrote: We may want to say something about command line option names in the new section as well, but for now, let's make sure everybody is clear on how to structure and name their configuration variables. The text for the rules are partly taken from the log message of Jonathan's 6b3020a2 (add: introduce add.ignoreerrors synonym for add.ignore-errors, 2010-12-01). Signed-off-by: Junio C Hamano gits...@pobox.com --- Documentation/CodingGuidelines | 25 + 1 file changed, 25 insertions(+) diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines index 894546d..8fbac15 100644 --- a/Documentation/CodingGuidelines +++ b/Documentation/CodingGuidelines @@ -413,6 +413,31 @@ Error Messages - Say what the error is first (cannot open %s, not %s: cannot open) +Externally Visible Names + + - For configuration variable names, follow the existing convention: + + . The section name indicates the affected subsystem. + + . The subsection name, if any, indicates which of an unbounded set + of things to set the value for. + + . The variable name describes the effect of tweaking this knob. + + The section and variable names that consist of multiple words are + formed by concatenating the words without punctuations (e.g. `-`), + and are broken using bumpyCaps in documentation as a hint to the + reader. + + When choosing the variable namespace, do not use variable name for + specifying possibly unbounded set of things, most notably anything + an end user can freely come up with (e.g. branch names), but also + large fixed set defined by the system that can grow over time + (e.g. what kind of common whitespace problems to notice). I think we can all agree with this rule for anything an end user can freely come up with. Such sets are truly unbounded. But what is the justification for applying it to large fixed set defined by the system that can grow over time? Any set of items that needs to be programmed one by one is not unbounded in the same sense. It is true that it can grow over time, but there is a practical limit on how many such options we would ever implement, and at any given time the set has a well-defined, finite number of members. I suppose that this is a reaction to Johannes's proposal [1] to add configuration settings like git config fsck.badDate ignore [2] which you didn't like [3] but I did [4]. (Did you miss [4], in which I think I made some good arguments for Johannes's proposal? I don't think you responded to it.) I think it would be more productive to finish the concrete discussion about the fsck proposal, and to use the insight gained in that specific case to inform the decision about whether we need the new general rule to cover large fixed set[s] defined by the system. + [...] Use + subsection names or variable values, like existing variables + branch.name.description and core.whitespace do, instead. But there is also a precedent for the opposite approach: advice.*. [advice] pushUpdateRejected = true|false pushNonFFCurrent = true|false statusHints= true|false resolveConflict= true|false etc. In fact, I would argue that [core] whitespace = blank-at-eol -indent-with-non-tab is a bad model to follow, and would better have been expressed in a style like [whitespace] blankAtEOL = warn indentWithNonTab = ignore for the same reasons that I argued for fsck.*. The latter style would also have made it easier to add other choices like whitespace.spaceBeforeTab = error. Not that I am advocating that particular enhancement; I am just giving an example of how the current style of configuration is less flexible. [...] Michael [1] http://article.gmane.org/gmane.comp.version-control.git/261068 [2] Actually, the original proposal was for fsck.bad-date, but I think we agree that fsck.badDate would be the correct version of this proposal. [3] http://article.gmane.org/gmane.comp.version-control.git/261738 [4] http://article.gmane.org/gmane.comp.version-control.git/262841 -- Michael Haggerty mhag...@alum.mit.edu -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html