Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C
Hey Junio, On Sun, Aug 28, 2016 at 2:52 AM, Junio C Hamano wrote: > > Pranit Bauva writes: > > >>> +struct bisect_terms { > >>> + struct strbuf term_good; > >>> + struct strbuf term_bad; > >>> +}; > >> > >> I think "struct strbuf" is overrated. ... > >> I think you can just say "const char *" in this case. > > > > Using struct strbuf is not really overrated but in fact required. But > > Nothing is required. > > I can make your life easier by requiring you to never use struct > strbuf as a structure field type, though. You will almost never > need it unless you are building something exotic anyway. > > Step back and think what "strbuf" is good for. It holds a pointer > to a piece of memory the field owns, over-allocates and knows the > size of allocation and usage. That is good if you need to > > (1) frequently find out the length of the string; without a > separate .len member you would have to run strlen(). > > (2) incrementally add to the string in-place; as it overallocates, > appending to the string would not have to involve realloc() > every time and the cost of it is amortized. > > (3) make complex operations like splicing another string in, > trimming substring out, etc. > > You need to do none of the above to these fields. term.term_good is > either taken from an argv[] element, or you read from a line from a > file and set it. You may do some trivial computation and set it to > the result, like "the other field uses 'old', so this one need to be > set to 'new'". The user of the field either has the string and sets > it there, or reads the field's value as a whole string. No string > manipulation in the field in-place is needed. > > > yes, for this patch it might seem as overrated. In the shell code > > initally TERM_GOOD is set to "good" while TERM_BAD is set to "bad". > > Now there are a lot of instances (one of which is bisect_start() > > function) where this can change. So if we keep it as "const char *", > > it would be right to change the value of it after wards. And we cannot > > keep it as "char []" because we don't know its size before hand. > > You are not making sense. Nobody would suggest to use a fixed-size > char array in this structure. That wouldn't even work in the case > where you are stuffing what comes in an argv[] element in there, > e.g. > > terms.term_good = argv[3]; > > And you can of course support changing the value of the field > without using strbuf. Just update the pointer to point at the piece > of memory that holds the new value. > > In short, I do not see any good reason why the term_good field has > to be anything other than "char *term_good" or "const char *term_good". > > Now, what you need to consider choosing between the two depends on > where these strings can come from. If they are known to be always > unchanging between the time you set it til the end of the program > (e.g. using an element in argv[]), you can just assign without > making any copy and you can use "const char *term_good". All other > cases, the structure needs to take the ownership, and you would need > to make a copy if you don't have the ownership, e.g. > > terms.term_good = xstrdup(argv[3]); > > You may be reading from a file, a line at a time and you may have a > line's content in a strbuf. You do not (yet) own the buffer after > reading it, e.g. > > strbuf_getline(&buf, fp); > terms.term_good = strbuf_detach(&buf, NULL); > > Of course, if you need to take ownership of the memory, you would > need to free(3) it as needed, which means the pattern to set the > field would become > > free(terms.term_good); > terms.term_good = ... some new value ...; > > Using strbuf as a local variable is good. It gives a higher level > of abstraction when you are actually performing string operations. > In most applications, however, a field in a struct is where the > result of a step of computation is kept, not a scratch-pad to > perform steps of computation in. When you are ready to update the > value of a field, you _have_ a completed string, and you can just > use "char *" field to point at it. There is no need for strbuf in > the field. > > Don't look at the data structure used in trailer.[ch] as a model; it > is an example of a terribly bad implementation taste, a pattern that > should not be followed. Print it, not read it and burn it as a good > symbolic gesture. Thanks for explaining when to use strbuf. I am convinced that the thing I am aiming for can be done with the help of "const char *". Though I will have to use strbuf in get_terms() and detach the string buffer from there as you have mentioned previously. Regards, Pranit Bauva
Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C
Pranit Bauva writes: >>> +struct bisect_terms { >>> + struct strbuf term_good; >>> + struct strbuf term_bad; >>> +}; >> >> I think "struct strbuf" is overrated. ... >> I think you can just say "const char *" in this case. > > Using struct strbuf is not really overrated but in fact required. But Nothing is required. I can make your life easier by requiring you to never use struct strbuf as a structure field type, though. You will almost never need it unless you are building something exotic anyway. Step back and think what "strbuf" is good for. It holds a pointer to a piece of memory the field owns, over-allocates and knows the size of allocation and usage. That is good if you need to (1) frequently find out the length of the string; without a separate .len member you would have to run strlen(). (2) incrementally add to the string in-place; as it overallocates, appending to the string would not have to involve realloc() every time and the cost of it is amortized. (3) make complex operations like splicing another string in, trimming substring out, etc. You need to do none of the above to these fields. term.term_good is either taken from an argv[] element, or you read from a line from a file and set it. You may do some trivial computation and set it to the result, like "the other field uses 'old', so this one need to be set to 'new'". The user of the field either has the string and sets it there, or reads the field's value as a whole string. No string manipulation in the field in-place is needed. > yes, for this patch it might seem as overrated. In the shell code > initally TERM_GOOD is set to "good" while TERM_BAD is set to "bad". > Now there are a lot of instances (one of which is bisect_start() > function) where this can change. So if we keep it as "const char *", > it would be right to change the value of it after wards. And we cannot > keep it as "char []" because we don't know its size before hand. You are not making sense. Nobody would suggest to use a fixed-size char array in this structure. That wouldn't even work in the case where you are stuffing what comes in an argv[] element in there, e.g. terms.term_good = argv[3]; And you can of course support changing the value of the field without using strbuf. Just update the pointer to point at the piece of memory that holds the new value. In short, I do not see any good reason why the term_good field has to be anything other than "char *term_good" or "const char *term_good". Now, what you need to consider choosing between the two depends on where these strings can come from. If they are known to be always unchanging between the time you set it til the end of the program (e.g. using an element in argv[]), you can just assign without making any copy and you can use "const char *term_good". All other cases, the structure needs to take the ownership, and you would need to make a copy if you don't have the ownership, e.g. terms.term_good = xstrdup(argv[3]); You may be reading from a file, a line at a time and you may have a line's content in a strbuf. You do not (yet) own the buffer after reading it, e.g. strbuf_getline(&buf, fp); terms.term_good = strbuf_detach(&buf, NULL); Of course, if you need to take ownership of the memory, you would need to free(3) it as needed, which means the pattern to set the field would become free(terms.term_good); terms.term_good = ... some new value ...; Using strbuf as a local variable is good. It gives a higher level of abstraction when you are actually performing string operations. In most applications, however, a field in a struct is where the result of a step of computation is kept, not a scratch-pad to perform steps of computation in. When you are ready to update the value of a field, you _have_ a completed string, and you can just use "char *" field to point at it. There is no need for strbuf in the field. Don't look at the data structure used in trailer.[ch] as a model; it is an example of a terribly bad implementation taste, a pattern that should not be followed. Print it, not read it and burn it as a good symbolic gesture. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C
Hey Junio, On Thu, Aug 25, 2016 at 4:00 AM, Junio C Hamano wrote: > Pranit Bauva writes: > >> +struct bisect_terms { >> + struct strbuf term_good; >> + struct strbuf term_bad; >> +}; > > I think "struct strbuf" is overrated. For things like this, where > these fields will never change once it is set (and setting it is > done atomically, not incrementally), there is no good reason to use > define the fields as strbuf. > > Only because you chose to use strbuf for these two fields, you have > to make unnecessarily copies of argv[] in the command parser, and > you have to remember to discard these copies later. > > I think you can just say "const char *" in this case. Using struct strbuf is not really overrated but in fact required. But yes, for this patch it might seem as overrated. In the shell code initally TERM_GOOD is set to "good" while TERM_BAD is set to "bad". Now there are a lot of instances (one of which is bisect_start() function) where this can change. So if we keep it as "const char *", it would be right to change the value of it after wards. And we cannot keep it as "char []" because we don't know its size before hand. >> +static int bisect_write(const char *state, const char *rev, >> + const struct bisect_terms *terms, int nolog) >> +{ >> + struct strbuf tag = STRBUF_INIT; >> + struct strbuf commit_name = STRBUF_INIT; >> + struct object_id oid; >> + struct commit *commit; >> + struct pretty_print_context pp = {0}; >> + FILE *fp; >> + >> + if (!strcmp(state, terms->term_bad.buf)) >> + strbuf_addf(&tag, "refs/bisect/%s", state); >> + else if (one_of(state, terms->term_good.buf, "skip", NULL)) >> + strbuf_addf(&tag, "refs/bisect/%s-%s", state, rev); >> + else >> + return error(_("Bad bisect_write argument: %s"), state); > > OK. > >> + if (get_oid(rev, &oid)) { >> + strbuf_release(&tag); >> + return error(_("couldn't get the oid of the rev '%s'"), rev); >> + } >> + >> + if (update_ref(NULL, tag.buf, oid.hash, NULL, 0, >> +UPDATE_REFS_MSG_ON_ERR)) { >> + strbuf_release(&tag); >> + return -1; >> + } >> + strbuf_release(&tag); >> + >> + fp = fopen(git_path_bisect_log(), "a"); >> + if (!fp) >> + return error_errno(_("couldn't open the file '%s'"), >> git_path_bisect_log()); >> + >> + commit = lookup_commit_reference(oid.hash); >> + format_commit_message(commit, "%s", &commit_name, &pp); >> + fprintf(fp, "# %s: [%s] %s\n", state, sha1_to_hex(oid.hash), >> + commit_name.buf); >> + strbuf_release(&commit_name); >> + >> + if (!nolog) >> + fprintf(fp, "git bisect %s %s\n", state, rev); >> + >> + fclose(fp); >> + return 0; > > You seem to be _release()ing tag all over the place. > > Would it make sense to have a single clean-up label at the end of > function, introduce a "int retval" variable and set it to -1 (or > whatever) when an error is detected and "goto" to the label? It may > make it harder to make such a leak. That is, to end the function > more like: I think I could use goto for this function. > finish: > if (fp) > fclose(fp); > strbuf_release(&tag); > strbuf_release(&commit_name); > return retval; > } > > and have sites with potential errors do something like this: > > if (update_ref(...)) { > retval = -1; > goto finish; > } > >> + struct bisect_terms terms; >> + bisect_terms_init(&terms); > > With the type of "struct bisect_terms" members corrected, you do not > even need the _init() function. Discussed above. >> @@ -182,24 +251,38 @@ int cmd_bisect__helper(int argc, const char **argv, >> const char *prefix) >> usage_with_options(git_bisect_helper_usage, options); >> >> switch (cmdmode) { >> + int nolog; >> case NEXT_ALL: >> return bisect_next_all(prefix, no_checkout); >> case WRITE_TERMS: >> if (argc != 2) >> die(_("--write-terms requires two arguments")); >> - return write_terms(argv[0], argv[1]); >> + res = write_terms(argv[0], argv[1]); >> + break; >> case BISECT_CLEAN_STATE: >> if (argc != 0) >> die(_("--bisect-clean-state requires no arguments")); >> - return bisect_clean_state(); >> + res = bisect_clean_state(); >> + break; >> case BISECT_RESET: >> if (argc > 1) >> die(_("--bisect-reset requires either zero or one >> arguments")); >> - return bisect_reset(argc ? argv[0] : NULL); >> + res = bisect_reset(argc ? argv[0] : NULL); >> + break; >> case CHECK_EXPECTED_REVS: >> - re
Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C
Pranit Bauva writes: > +struct bisect_terms { > + struct strbuf term_good; > + struct strbuf term_bad; > +}; I think "struct strbuf" is overrated. For things like this, where these fields will never change once it is set (and setting it is done atomically, not incrementally), there is no good reason to use define the fields as strbuf. Only because you chose to use strbuf for these two fields, you have to make unnecessarily copies of argv[] in the command parser, and you have to remember to discard these copies later. I think you can just say "const char *" in this case. > +static int bisect_write(const char *state, const char *rev, > + const struct bisect_terms *terms, int nolog) > +{ > + struct strbuf tag = STRBUF_INIT; > + struct strbuf commit_name = STRBUF_INIT; > + struct object_id oid; > + struct commit *commit; > + struct pretty_print_context pp = {0}; > + FILE *fp; > + > + if (!strcmp(state, terms->term_bad.buf)) > + strbuf_addf(&tag, "refs/bisect/%s", state); > + else if (one_of(state, terms->term_good.buf, "skip", NULL)) > + strbuf_addf(&tag, "refs/bisect/%s-%s", state, rev); > + else > + return error(_("Bad bisect_write argument: %s"), state); OK. > + if (get_oid(rev, &oid)) { > + strbuf_release(&tag); > + return error(_("couldn't get the oid of the rev '%s'"), rev); > + } > + > + if (update_ref(NULL, tag.buf, oid.hash, NULL, 0, > +UPDATE_REFS_MSG_ON_ERR)) { > + strbuf_release(&tag); > + return -1; > + } > + strbuf_release(&tag); > + > + fp = fopen(git_path_bisect_log(), "a"); > + if (!fp) > + return error_errno(_("couldn't open the file '%s'"), > git_path_bisect_log()); > + > + commit = lookup_commit_reference(oid.hash); > + format_commit_message(commit, "%s", &commit_name, &pp); > + fprintf(fp, "# %s: [%s] %s\n", state, sha1_to_hex(oid.hash), > + commit_name.buf); > + strbuf_release(&commit_name); > + > + if (!nolog) > + fprintf(fp, "git bisect %s %s\n", state, rev); > + > + fclose(fp); > + return 0; You seem to be _release()ing tag all over the place. Would it make sense to have a single clean-up label at the end of function, introduce a "int retval" variable and set it to -1 (or whatever) when an error is detected and "goto" to the label? It may make it harder to make such a leak. That is, to end the function more like: finish: if (fp) fclose(fp); strbuf_release(&tag); strbuf_release(&commit_name); return retval; } and have sites with potential errors do something like this: if (update_ref(...)) { retval = -1; goto finish; } > + struct bisect_terms terms; > + bisect_terms_init(&terms); With the type of "struct bisect_terms" members corrected, you do not even need the _init() function. > @@ -182,24 +251,38 @@ int cmd_bisect__helper(int argc, const char **argv, > const char *prefix) > usage_with_options(git_bisect_helper_usage, options); > > switch (cmdmode) { > + int nolog; > case NEXT_ALL: > return bisect_next_all(prefix, no_checkout); > case WRITE_TERMS: > if (argc != 2) > die(_("--write-terms requires two arguments")); > - return write_terms(argv[0], argv[1]); > + res = write_terms(argv[0], argv[1]); > + break; > case BISECT_CLEAN_STATE: > if (argc != 0) > die(_("--bisect-clean-state requires no arguments")); > - return bisect_clean_state(); > + res = bisect_clean_state(); > + break; > case BISECT_RESET: > if (argc > 1) > die(_("--bisect-reset requires either zero or one > arguments")); > - return bisect_reset(argc ? argv[0] : NULL); > + res = bisect_reset(argc ? argv[0] : NULL); > + break; > case CHECK_EXPECTED_REVS: > - return check_expected_revs(argv, argc); > + res = check_expected_revs(argv, argc); > + break; > + case BISECT_WRITE: > + if (argc != 4 && argc != 5) > + die(_("--bisect-write requires either 4 or 5 > arguments")); > + nolog = (argc == 5) && !strcmp(argv[4], "nolog"); > + strbuf_addstr(&terms.term_good, argv[2]); > + strbuf_addstr(&terms.term_bad, argv[3]); Here, terms.term_good = argv[2]; terms.term_bad = argv[3]; and then you do not need bisect_terms_release() at all. > + res = bisect_write(argv[0], argv[1], &terms, nolog); > + break; > default: > die("BUG: unknown subcommand '%d'", cmd