Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C

2016-08-29 Thread Pranit Bauva
Hey Junio,

On Sun, Aug 28, 2016 at 2:52 AM, Junio C Hamano  wrote:
>
> Pranit Bauva  writes:
>
> >>> +struct bisect_terms {
> >>> + struct strbuf term_good;
> >>> + struct strbuf term_bad;
> >>> +};
> >>
> >> I think "struct strbuf" is overrated.  ...
> >> I think you can just say "const char *" in this case.
> >
> > Using struct strbuf is not really overrated but in fact required. But
>
> Nothing is required.
>
> I can make your life easier by requiring you to never use struct
> strbuf as a structure field type, though.  You will almost never
> need it unless you are building something exotic anyway.
>
> Step back and think what "strbuf" is good for.  It holds a pointer
> to a piece of memory the field owns, over-allocates and knows the
> size of allocation and usage.  That is good if you need to
>
>  (1) frequently find out the length of the string; without a
>  separate .len member you would have to run strlen().
>
>  (2) incrementally add to the string in-place; as it overallocates,
>  appending to the string would not have to involve realloc()
>  every time and the cost of it is amortized.
>
>  (3) make complex operations like splicing another string in,
>  trimming substring out, etc.
>
> You need to do none of the above to these fields.  term.term_good is
> either taken from an argv[] element, or you read from a line from a
> file and set it.  You may do some trivial computation and set it to
> the result, like "the other field uses 'old', so this one need to be
> set to 'new'".  The user of the field either has the string and sets
> it there, or reads the field's value as a whole string.  No string
> manipulation in the field in-place is needed.
>
> > yes, for this patch it might seem as overrated. In the shell code
> > initally TERM_GOOD is set to "good" while TERM_BAD is set to "bad".
> > Now there are a lot of instances (one of which is bisect_start()
> > function) where this can change. So if we keep it as "const char *",
> > it would be right to change the value of it after wards. And we cannot
> > keep it as "char []" because we don't know its size before hand.
>
> You are not making sense.  Nobody would suggest to use a fixed-size
> char array in this structure.  That wouldn't even work in the case
> where you are stuffing what comes in an argv[] element in there,
> e.g.
>
> terms.term_good = argv[3];
>
> And you can of course support changing the value of the field
> without using strbuf.  Just update the pointer to point at the piece
> of memory that holds the new value.
>
> In short, I do not see any good reason why the term_good field has
> to be anything other than "char *term_good" or "const char *term_good".
>
> Now, what you need to consider choosing between the two depends on
> where these strings can come from.  If they are known to be always
> unchanging between the time you set it til the end of the program
> (e.g. using an element in argv[]), you can just assign without
> making any copy and you can use "const char *term_good".  All other
> cases, the structure needs to take the ownership, and you would need
> to make a copy if you don't have the ownership, e.g.
>
> terms.term_good = xstrdup(argv[3]);
>
> You may be reading from a file, a line at a time and you may have a
> line's content in a strbuf.  You do not (yet) own the buffer after
> reading it, e.g.
>
> strbuf_getline(&buf, fp);
> terms.term_good = strbuf_detach(&buf, NULL);
>
> Of course, if you need to take ownership of the memory, you would
> need to free(3) it as needed, which means the pattern to set the
> field would become
>
> free(terms.term_good);
> terms.term_good = ... some new value ...;
>
> Using strbuf as a local variable is good.  It gives a higher level
> of abstraction when you are actually performing string operations.
> In most applications, however, a field in a struct is where the
> result of a step of computation is kept, not a scratch-pad to
> perform steps of computation in.  When you are ready to update the
> value of a field, you _have_ a completed string, and you can just
> use "char *" field to point at it.  There is no need for strbuf in
> the field.
>
> Don't look at the data structure used in trailer.[ch] as a model; it
> is an example of a terribly bad implementation taste, a pattern that
> should not be followed.  Print it, not read it and burn it as a good
> symbolic gesture.

Thanks for explaining when to use strbuf. I am convinced that the
thing I am aiming for can be done with the help of "const char *".
Though I will have to use strbuf in get_terms() and detach the string
buffer from there as you have mentioned previously.

Regards,
Pranit Bauva


Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C

2016-08-27 Thread Junio C Hamano
Pranit Bauva  writes:

>>> +struct bisect_terms {
>>> + struct strbuf term_good;
>>> + struct strbuf term_bad;
>>> +};
>>
>> I think "struct strbuf" is overrated.  ...
>> I think you can just say "const char *" in this case.
>
> Using struct strbuf is not really overrated but in fact required. But

Nothing is required.

I can make your life easier by requiring you to never use struct
strbuf as a structure field type, though.  You will almost never
need it unless you are building something exotic anyway.

Step back and think what "strbuf" is good for.  It holds a pointer
to a piece of memory the field owns, over-allocates and knows the
size of allocation and usage.  That is good if you need to

 (1) frequently find out the length of the string; without a
 separate .len member you would have to run strlen().

 (2) incrementally add to the string in-place; as it overallocates,
 appending to the string would not have to involve realloc()
 every time and the cost of it is amortized.

 (3) make complex operations like splicing another string in,
 trimming substring out, etc.

You need to do none of the above to these fields.  term.term_good is
either taken from an argv[] element, or you read from a line from a
file and set it.  You may do some trivial computation and set it to
the result, like "the other field uses 'old', so this one need to be
set to 'new'".  The user of the field either has the string and sets
it there, or reads the field's value as a whole string.  No string
manipulation in the field in-place is needed.

> yes, for this patch it might seem as overrated. In the shell code
> initally TERM_GOOD is set to "good" while TERM_BAD is set to "bad".
> Now there are a lot of instances (one of which is bisect_start()
> function) where this can change. So if we keep it as "const char *",
> it would be right to change the value of it after wards. And we cannot
> keep it as "char []" because we don't know its size before hand.

You are not making sense.  Nobody would suggest to use a fixed-size
char array in this structure.  That wouldn't even work in the case
where you are stuffing what comes in an argv[] element in there,
e.g.

terms.term_good = argv[3];

And you can of course support changing the value of the field
without using strbuf.  Just update the pointer to point at the piece
of memory that holds the new value.

In short, I do not see any good reason why the term_good field has
to be anything other than "char *term_good" or "const char *term_good".

Now, what you need to consider choosing between the two depends on
where these strings can come from.  If they are known to be always
unchanging between the time you set it til the end of the program
(e.g. using an element in argv[]), you can just assign without
making any copy and you can use "const char *term_good".  All other
cases, the structure needs to take the ownership, and you would need
to make a copy if you don't have the ownership, e.g.

terms.term_good = xstrdup(argv[3]);

You may be reading from a file, a line at a time and you may have a
line's content in a strbuf.  You do not (yet) own the buffer after
reading it, e.g.

strbuf_getline(&buf, fp);
terms.term_good = strbuf_detach(&buf, NULL);

Of course, if you need to take ownership of the memory, you would
need to free(3) it as needed, which means the pattern to set the
field would become

free(terms.term_good);
terms.term_good = ... some new value ...;

Using strbuf as a local variable is good.  It gives a higher level
of abstraction when you are actually performing string operations.
In most applications, however, a field in a struct is where the
result of a step of computation is kept, not a scratch-pad to
perform steps of computation in.  When you are ready to update the
value of a field, you _have_ a completed string, and you can just
use "char *" field to point at it.  There is no need for strbuf in
the field.

Don't look at the data structure used in trailer.[ch] as a model; it
is an example of a terribly bad implementation taste, a pattern that
should not be followed.  Print it, not read it and burn it as a good
symbolic gesture.


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C

2016-08-27 Thread Pranit Bauva
Hey Junio,

On Thu, Aug 25, 2016 at 4:00 AM, Junio C Hamano  wrote:
> Pranit Bauva  writes:
>
>> +struct bisect_terms {
>> + struct strbuf term_good;
>> + struct strbuf term_bad;
>> +};
>
> I think "struct strbuf" is overrated.  For things like this, where
> these fields will never change once it is set (and setting it is
> done atomically, not incrementally), there is no good reason to use
> define the fields as strbuf.
>
> Only because you chose to use strbuf for these two fields, you have
> to make unnecessarily copies of argv[] in the command parser, and
> you have to remember to discard these copies later.
>
> I think you can just say "const char *" in this case.

Using struct strbuf is not really overrated but in fact required. But
yes, for this patch it might seem as overrated. In the shell code
initally TERM_GOOD is set to "good" while TERM_BAD is set to "bad".
Now there are a lot of instances (one of which is bisect_start()
function) where this can change. So if we keep it as "const char *",
it would be right to change the value of it after wards. And we cannot
keep it as "char []" because we don't know its size before hand.

>> +static int bisect_write(const char *state, const char *rev,
>> + const struct bisect_terms *terms, int nolog)
>> +{
>> + struct strbuf tag = STRBUF_INIT;
>> + struct strbuf commit_name = STRBUF_INIT;
>> + struct object_id oid;
>> + struct commit *commit;
>> + struct pretty_print_context pp = {0};
>> + FILE *fp;
>> +
>> + if (!strcmp(state, terms->term_bad.buf))
>> + strbuf_addf(&tag, "refs/bisect/%s", state);
>> + else if (one_of(state, terms->term_good.buf, "skip", NULL))
>> + strbuf_addf(&tag, "refs/bisect/%s-%s", state, rev);
>> + else
>> + return error(_("Bad bisect_write argument: %s"), state);
>
> OK.
>
>> + if (get_oid(rev, &oid)) {
>> + strbuf_release(&tag);
>> + return error(_("couldn't get the oid of the rev '%s'"), rev);
>> + }
>> +
>> + if (update_ref(NULL, tag.buf, oid.hash, NULL, 0,
>> +UPDATE_REFS_MSG_ON_ERR)) {
>> + strbuf_release(&tag);
>> + return -1;
>> + }
>> + strbuf_release(&tag);
>> +
>> + fp = fopen(git_path_bisect_log(), "a");
>> + if (!fp)
>> + return error_errno(_("couldn't open the file '%s'"), 
>> git_path_bisect_log());
>> +
>> + commit = lookup_commit_reference(oid.hash);
>> + format_commit_message(commit, "%s", &commit_name, &pp);
>> + fprintf(fp, "# %s: [%s] %s\n", state, sha1_to_hex(oid.hash),
>> + commit_name.buf);
>> + strbuf_release(&commit_name);
>> +
>> + if (!nolog)
>> + fprintf(fp, "git bisect %s %s\n", state, rev);
>> +
>> + fclose(fp);
>> + return 0;
>
> You seem to be _release()ing tag all over the place.
>
> Would it make sense to have a single clean-up label at the end of
> function, introduce a "int retval" variable and set it to -1 (or
> whatever) when an error is detected and "goto" to the label?  It may
> make it harder to make such a leak.  That is, to end the function
> more like:

I think I could use goto for this function.

> finish:
> if (fp)
> fclose(fp);
> strbuf_release(&tag);
> strbuf_release(&commit_name);
> return retval;
> }
>
> and have sites with potential errors do something like this:
>
> if (update_ref(...)) {
> retval = -1;
> goto finish;
> }
>
>> + struct bisect_terms terms;
>> + bisect_terms_init(&terms);
>
> With the type of "struct bisect_terms" members corrected, you do not
> even need the _init() function.

Discussed above.

>> @@ -182,24 +251,38 @@ int cmd_bisect__helper(int argc, const char **argv, 
>> const char *prefix)
>>   usage_with_options(git_bisect_helper_usage, options);
>>
>>   switch (cmdmode) {
>> + int nolog;
>>   case NEXT_ALL:
>>   return bisect_next_all(prefix, no_checkout);
>>   case WRITE_TERMS:
>>   if (argc != 2)
>>   die(_("--write-terms requires two arguments"));
>> - return write_terms(argv[0], argv[1]);
>> + res = write_terms(argv[0], argv[1]);
>> + break;
>>   case BISECT_CLEAN_STATE:
>>   if (argc != 0)
>>   die(_("--bisect-clean-state requires no arguments"));
>> - return bisect_clean_state();
>> + res = bisect_clean_state();
>> + break;
>>   case BISECT_RESET:
>>   if (argc > 1)
>>   die(_("--bisect-reset requires either zero or one 
>> arguments"));
>> - return bisect_reset(argc ? argv[0] : NULL);
>> + res = bisect_reset(argc ? argv[0] : NULL);
>> + break;
>>   case CHECK_EXPECTED_REVS:
>> - re

Re: [PATCH v14 09/27] bisect--helper: `bisect_write` shell function in C

2016-08-24 Thread Junio C Hamano
Pranit Bauva  writes:

> +struct bisect_terms {
> + struct strbuf term_good;
> + struct strbuf term_bad;
> +};

I think "struct strbuf" is overrated.  For things like this, where
these fields will never change once it is set (and setting it is
done atomically, not incrementally), there is no good reason to use
define the fields as strbuf.

Only because you chose to use strbuf for these two fields, you have
to make unnecessarily copies of argv[] in the command parser, and
you have to remember to discard these copies later.

I think you can just say "const char *" in this case.  

> +static int bisect_write(const char *state, const char *rev,
> + const struct bisect_terms *terms, int nolog)
> +{
> + struct strbuf tag = STRBUF_INIT;
> + struct strbuf commit_name = STRBUF_INIT;
> + struct object_id oid;
> + struct commit *commit;
> + struct pretty_print_context pp = {0};
> + FILE *fp;
> +
> + if (!strcmp(state, terms->term_bad.buf))
> + strbuf_addf(&tag, "refs/bisect/%s", state);
> + else if (one_of(state, terms->term_good.buf, "skip", NULL))
> + strbuf_addf(&tag, "refs/bisect/%s-%s", state, rev);
> + else
> + return error(_("Bad bisect_write argument: %s"), state);

OK.

> + if (get_oid(rev, &oid)) {
> + strbuf_release(&tag);
> + return error(_("couldn't get the oid of the rev '%s'"), rev);
> + }
> +
> + if (update_ref(NULL, tag.buf, oid.hash, NULL, 0,
> +UPDATE_REFS_MSG_ON_ERR)) {
> + strbuf_release(&tag);
> + return -1;
> + }
> + strbuf_release(&tag);
> +
> + fp = fopen(git_path_bisect_log(), "a");
> + if (!fp)
> + return error_errno(_("couldn't open the file '%s'"), 
> git_path_bisect_log());
> +
> + commit = lookup_commit_reference(oid.hash);
> + format_commit_message(commit, "%s", &commit_name, &pp);
> + fprintf(fp, "# %s: [%s] %s\n", state, sha1_to_hex(oid.hash),
> + commit_name.buf);
> + strbuf_release(&commit_name);
> +
> + if (!nolog)
> + fprintf(fp, "git bisect %s %s\n", state, rev);
> +
> + fclose(fp);
> + return 0;

You seem to be _release()ing tag all over the place.

Would it make sense to have a single clean-up label at the end of
function, introduce a "int retval" variable and set it to -1 (or
whatever) when an error is detected and "goto" to the label?  It may
make it harder to make such a leak.  That is, to end the function
more like:

finish:
if (fp)
fclose(fp);
strbuf_release(&tag);
strbuf_release(&commit_name);
return retval;
}

and have sites with potential errors do something like this:

if (update_ref(...)) {
retval = -1;
goto finish;
}

> + struct bisect_terms terms;
> + bisect_terms_init(&terms);

With the type of "struct bisect_terms" members corrected, you do not
even need the _init() function.

> @@ -182,24 +251,38 @@ int cmd_bisect__helper(int argc, const char **argv, 
> const char *prefix)
>   usage_with_options(git_bisect_helper_usage, options);
>  
>   switch (cmdmode) {
> + int nolog;
>   case NEXT_ALL:
>   return bisect_next_all(prefix, no_checkout);
>   case WRITE_TERMS:
>   if (argc != 2)
>   die(_("--write-terms requires two arguments"));
> - return write_terms(argv[0], argv[1]);
> + res = write_terms(argv[0], argv[1]);
> + break;
>   case BISECT_CLEAN_STATE:
>   if (argc != 0)
>   die(_("--bisect-clean-state requires no arguments"));
> - return bisect_clean_state();
> + res = bisect_clean_state();
> + break;
>   case BISECT_RESET:
>   if (argc > 1)
>   die(_("--bisect-reset requires either zero or one 
> arguments"));
> - return bisect_reset(argc ? argv[0] : NULL);
> + res = bisect_reset(argc ? argv[0] : NULL);
> + break;
>   case CHECK_EXPECTED_REVS:
> - return check_expected_revs(argv, argc);
> + res = check_expected_revs(argv, argc);
> + break;
> + case BISECT_WRITE:
> + if (argc != 4 && argc != 5)
> + die(_("--bisect-write requires either 4 or 5 
> arguments"));
> + nolog = (argc == 5) && !strcmp(argv[4], "nolog");
> + strbuf_addstr(&terms.term_good, argv[2]);
> + strbuf_addstr(&terms.term_bad, argv[3]);

Here,

terms.term_good = argv[2];
terms.term_bad = argv[3];

and then you do not need bisect_terms_release() at all.

> + res = bisect_write(argv[0], argv[1], &terms, nolog);
> + break;
>   default:
>   die("BUG: unknown subcommand '%d'", cmd