[PATCH] describe: rewrite name_rev() iteratively
The git describe --contains command uses the name_rev() function which is currently a recursive function. This causes a Stack Overflow when the history is large enough. Rewrite name_rev iteratively using a stack on the heap. This slightly reduces performance due to the extra operations on the heap, but the function no longer overflows the stack. Reported-by: Sylvestre Ledru sylves...@mozilla.com Signed-off-by: Dragos Foianu dragos.foi...@gmail.com --- builtin/name-rev.c | 176 ++-- 1 file changed, 128 insertions(+), 48 deletions(-) diff --git a/builtin/name-rev.c b/builtin/name-rev.c index c824d4e..5848d81 100644 --- a/builtin/name-rev.c +++ b/builtin/name-rev.c @@ -19,66 +19,146 @@ static long cutoff = LONG_MAX; /* How many generations are maximally preferred over _one_ merge traversal? */ #define MERGE_TRAVERSAL_WEIGHT 65535 +typedef struct rev_data { + struct commit *commit; + const char *tip_name; + int generation; + int distance; + int deref; +} *rev_data; + +typedef struct rev_stack { + struct rev_data *rev; + struct rev_stack *next; +} *rev_stack; + +static void stack_push(rev_stack *stack, rev_data data) { + rev_stack new_node = xmalloc(sizeof(*new_node)); + + new_node-rev = data; + new_node-next = *stack; + *stack = new_node; +} + +static void stack_push_end(rev_stack *stack, rev_data data) { + rev_stack new_node = xmalloc(sizeof(*new_node)); + + while (*stack != NULL) + stack = (*stack)-next; + new_node-rev = data; + new_node-next = *stack; + *stack = new_node; +} + +static rev_data stack_pop(rev_stack *stack) { + rev_stack next = (*stack)-next; + rev_data rev = (*stack)-rev; + free(*stack); + + *stack = next; + return rev; +} + +static rev_data make_rev_data(struct commit *commit, + const char* tip_name, int generation, int distance, + int deref) +{ + rev_data data = xmalloc(sizeof(*data)); + + data-commit = commit; + data-tip_name = tip_name; + data-generation = generation; + data-distance = distance; + data-deref = deref; + + return data; +} + static void name_rev(struct commit *commit, const char *tip_name, int generation, int distance, int deref) { - struct rev_name *name = (struct rev_name *)commit-util; - struct commit_list *parents; - int parent_number = 1; + rev_stack stack = NULL; + rev_data data, next_rev; - parse_commit(commit); + data = make_rev_data(commit, tip_name, generation, distance, deref); + stack_push(stack, data); - if (commit-date cutoff) - return; + while (stack != NULL) { + rev_data rev = stack_pop(stack); - if (deref) { - char *new_name = xmalloc(strlen(tip_name)+3); - strcpy(new_name, tip_name); - strcat(new_name, ^0); - tip_name = new_name; + struct rev_name *name = (struct rev_name *) rev-commit-util; + struct commit_list *parents; + int parent_number = 1; - if (generation) - die(generation: %d, but deref?, generation); - } + parse_commit(rev-commit); + + if (rev-commit-date cutoff) + continue; + + if (rev-deref) { + char *new_name = xmalloc(strlen(rev-tip_name) + 3); + strcpy(new_name, rev-tip_name); + strcat(new_name, ^0); + rev-tip_name = new_name; - if (name == NULL) { - name = xmalloc(sizeof(rev_name)); - commit-util = name; - goto copy_data; - } else if (name-distance distance) { + if (rev-generation) + die(generation: %d, but deref?, + rev-generation); + } + + if (name == NULL) { + name = xmalloc(sizeof(rev_name)); + rev-commit-util = name; + goto copy_data; + } else if (name-distance rev-distance) { copy_data: - name-tip_name = tip_name; - name-generation = generation; - name-distance = distance; - } else - return; - - for (parents = commit-parents; - parents; - parents = parents-next, parent_number++) { - if (parent_number 1) { - int len = strlen(tip_name); - char *new_name = xmalloc(len + - 1 + decimal_length(generation) + /* ~n */ - 1 + 2 + /* ^NN
Re: [BUG] Segfault on git describe
Jeff King peff at peff.net writes: So I think we'd be happy to see it converted to an iterative process (probably with a stack on the heap). In addition to name-rev, I believe that tag --contains will recurse down the longest history path, too (I think there may have been experimental patches for the latter, but you'd have to search the list archive). -Peff Alright. I'll look into it and hopefully have a patch by the end of the weekend. -Dragos -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] Segfault on git describe
Sylvestre Ledru sylvestre at mozilla.com writes: Hello, Trying to do some stats using the Firefox git repository (https://github.com/mozilla/gecko-dev), I found a bug on git describe. The following command will segfault: git describe --contains a9ff31aebd6dbda82a3c733a72eeeaa0b0525b96 Please note that the Firefox history is a pretty long and this commit date is 2001. I experience this issue with the git version, and Debian packages (1.9.0-1 and 2.0~next.20140214-2) As attachment, the backtrace. I removed about 87250 calls to the name_rev function. I guess that is a potential source of problem. Full is available here: http://people.mozilla.org/~sledru/bt-git-on-ff.txt (21 MB) I am available to test patches if needed. Thanks, Sylvestre PS: I am not registered, please cc me. Hello, The name_rev function recursively calls itself which is why the backtrace is so big. Unfortunately, for repos with long histories it can lead to Stack Overflows. This is pretty much what happened in your case. I tested it on my computer and I get the same results. I managed to get it working by doubling my default stacksize: ulimit -S -s 16192 Considering your project is a very large one and merely allocating a few more resources fixes the problem, I'm not sure it warrants rewriting the function to use less stack. You will have to wait for one of the maintainers to give you a definitive answer. All the best, Dragos -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv2] branch.c: simplify chain of if statements
Eric Sunshine sunshine at sunshineco.com writes: Other submissions have computed this value mathematically without need for conditionals. For instance, we've seen: index = (!!origin 0) + (!!remote_is_branch 1) + (!!rebasing 2) as, well as the equivalent: index = !!origin + !!remote_is_branch * 2 + !!rebasing * 4 Although this works, it does place greater cognitive demands on the reader by requiring more effort to figure out what is going on and how it relates to table position. The original (ungainly) chain of 'if' statements in the original code does not suffer this problem. It likewise is harder to understand than merely indexing into a multi-dimension table where each variable is a key. I have seen other submissions using this logic, but I didn't think it accomplished the goal of the patch - simplifying the code. Not that my approach does either, but I found it a little easier to understand. Indexing into a table like this is always going to have this problem so this is probably not the right approach to accomplishing the microproject's goals. It's possible to simplify this logic and have only a single printf_ln() invocation. Hint: It's safe to pass in more arguments than there are %s directives in the format string. Indeed. It's a habit of mine to pass the exact number of arguments to printf functions and I can't seem to get away from it. You can use ARRAY_SIZE() in place of sizeof(...)/sizeof(...). Since an out-of-bound index would be a programmer bug, it would probably be more appropriate to use an assert(), just after 'index' is computed, rather than if+die(). The original code used die() because it couldn't detect the error until the end of the if-chain. Thank you for this hint. Using already defined helpers in the project is better and will prevent the need to patch the constructs later on. On Tue, Mar 18, 2014 at 6:31 PM, Eric Sunshine sunshine at sunshineco.com wrote: One other observation: You have a one-off error in your out-of-bounds check. It should be 'index = sizeof...' Well this is embarrasing. Thank you again for the feedback. It's incredibily helpful and I learned a lot from submitting these patches. Making the code simple is harder than it appears at first sight. I'm not sure it's worth pursuing the table approach further, especially since a solution has already been accepted and merged into the codebase. In this case, is it okay to try another microproject? I was thinking about trying #17 (finding bugs/inefficiencies in builtin/apply.c), but I've already had my one microproject. All the best, Dragos -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] diff: optimise parse_dirstat_params() to only compare strings when necessary
parse_dirstat_params() goes through a chain of if statements using strcmp to parse parameters. When the parameter is a digit, the value must go through all comparisons before the function realises it is a digit. Optimise this logic by only going through the chain of string compares when the parameter is not a digit. Signed-off-by: Dragos Foianu dragos.foi...@gmail.com --- diff.c | 37 +++-- 1 file changed, 19 insertions(+), 18 deletions(-) diff --git a/diff.c b/diff.c index e343191..733764e 100644 --- a/diff.c +++ b/diff.c @@ -84,20 +84,25 @@ static int parse_dirstat_params(struct diff_options *options, const char *params string_list_split_in_place(params, params_copy, ',', -1); for (i = 0; i params.nr; i++) { const char *p = params.items[i].string; - if (!strcmp(p, changes)) { - DIFF_OPT_CLR(options, DIRSTAT_BY_LINE); - DIFF_OPT_CLR(options, DIRSTAT_BY_FILE); - } else if (!strcmp(p, lines)) { - DIFF_OPT_SET(options, DIRSTAT_BY_LINE); - DIFF_OPT_CLR(options, DIRSTAT_BY_FILE); - } else if (!strcmp(p, files)) { - DIFF_OPT_CLR(options, DIRSTAT_BY_LINE); - DIFF_OPT_SET(options, DIRSTAT_BY_FILE); - } else if (!strcmp(p, noncumulative)) { - DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE); - } else if (!strcmp(p, cumulative)) { - DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE); - } else if (isdigit(*p)) { + if (!isdigit(*p)) { + if (!strcmp(p, changes)) { + DIFF_OPT_CLR(options, DIRSTAT_BY_LINE); + DIFF_OPT_CLR(options, DIRSTAT_BY_FILE); + } else if (!strcmp(p, lines)) { + DIFF_OPT_SET(options, DIRSTAT_BY_LINE); + DIFF_OPT_CLR(options, DIRSTAT_BY_FILE); + } else if (!strcmp(p, files)) { + DIFF_OPT_CLR(options, DIRSTAT_BY_LINE); + DIFF_OPT_SET(options, DIRSTAT_BY_FILE); + } else if (!strcmp(p, noncumulative)) { + DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE); + } else if (!strcmp(p, cumulative)) { + DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE); + } else { + strbuf_addf(errmsg, _( Unknown dirstat parameter '%s'\n), p); + ret++; + } + } else { char *end; int permille = strtoul(p, end, 10) * 10; if (*end == '.' isdigit(*++end)) { @@ -114,11 +119,7 @@ static int parse_dirstat_params(struct diff_options *options, const char *params p); ret++; } - } else { - strbuf_addf(errmsg, _( Unknown dirstat parameter '%s'\n), p); - ret++; } - } string_list_clear(params, 0); free(params_copy); -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] diff: optimise parse_dirstat_params() to only compare strings when necessary
I will send another version of this patch after review because there is an extra whitespace following the else statement. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] branch.c: simplify chain of if statements
Eric Sunshine sunshine at sunshineco.com writes: In fact, this change is not table-driven (emphasis on *driven*). It merely moves the strings into a table, but all the logic is still in the code. To be table-driven, the logic would be encoded in the table as well, and that logic would *drive* the code. This is not to say that the code must be table-driven. The GSoC microproject merely asks if doing so would make sense. Whether it would is for you to decide, and then to explain to reviewers the reason(s) why you did or did not make it so. The if-chain is just sufficiently complex that the print messages may actually help the reader understand the logic of the code, so this argument seems specious. I understand now after reading your review. I will try to fix this in a future attempt. Matthieu already mentioned [2] that this sort of lego string construction is not internationalization-friendly. See section 4.3 [3] of the gettext manual for details. I was hoping to get away with using less memory by having only four entries in the table. I suppose that is not possible. The rebasing check can still be moved outside of the four if statements and calculate the index correctly. The strings would then have to be arranged in such a way to make this work. Using a multiple-dimension array as suggested in other submissions for this particular microproject would probably be better, but it has already been done. These hard-coded indexing constants (0, 1, 2, 3) are fragile and convey little meaning to the reader. Try to consider how to compute the index into verbose_prints[] based upon the values of 'remote_is_branch' and 'origin'. What are the different ways you could do so? I was going to do something like this: if !remote_is_branch the index goes incremented by 2, because the first two entries are of no interest and if !origin, the index is incremented by 1. This would correctly compute the index. It should also work with the rebasing check if the four rebasing-specific messages are at the end of the table and when rebasing the index is set to start at those messages. The reason I did not go with this is because I would still need the four ifs in order to keep the bug check part of the code. I might be able to find a work-around for it on the second attempt. I have seen N_() used in other code but I wasn't sure what its purpose was. Thank you very much for the review. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push fail
shawn wilson ag4ve.us at gmail.com writes: How do I get more info here (and hopefully resolve this)? % git push To ssh://server/foo/repo.git ! [rejected]test - test (non-fast-forward) error: failed to push some refs to 'ssh://server/foo/repo.git' non-fast-forward means that someone else pushed to branch test before you did and your push would end up overwriting their changes. Make sure you merge your local branch with the remote branch: git pull origin test It might also be a result of local destructive changes made by git rebase. If you're absolutely certain you will not mess up the remote branch you can add the --force parameter when you push. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCHv2] branch.c: simplify chain of if statements
This patch uses a table to store the different messages that can be emitted by the verbose install_branch_config function. It computes an index based on the three flags and prints the message located at the specific index in the table of messages. If the index somehow is not within the table, we have a bug. Signed-off-by: Dragos Foianu dragos.foi...@gmail.com --- branch.c | 44 +--- 1 file changed, 25 insertions(+), 19 deletions(-) diff --git a/branch.c b/branch.c index 723a36b..95645d5 100644 --- a/branch.c +++ b/branch.c @@ -54,6 +54,18 @@ void install_branch_config(int flag, const char *local, const char *origin, cons struct strbuf key = STRBUF_INIT; int rebasing = should_setup_rebase(origin); + const char *messages[] = { + N_(Branch %s set up to track local ref %s.), + N_(Branch %s set up to track remote ref %s.), + N_(Branch %s set up to track local branch %s.), + N_(Branch %s set up to track remote branch %s from %s.), + N_(Branch %s set up to track local ref %s by rebasing.), + N_(Branch %s set up to track remote ref %s by rebasing.), + N_(Branch %s set up to track local branch %s by rebasing.), + N_(Branch %s set up to track remote branch %s from %s by rebasing.) + }; + int index = 0; + if (remote_is_branch !strcmp(local, shortname) !origin) { @@ -76,28 +88,22 @@ void install_branch_config(int flag, const char *local, const char *origin, cons } strbuf_release(key); + if (origin) + index += 1; + if (remote_is_branch) + index += 2; + if (rebasing) + index += 4; + if (flag BRANCH_CONFIG_VERBOSE) { if (remote_is_branch origin) - printf_ln(rebasing ? - _(Branch %s set up to track remote branch %s from %s by rebasing.) : - _(Branch %s set up to track remote branch %s from %s.), - local, shortname, origin); - else if (remote_is_branch !origin) - printf_ln(rebasing ? - _(Branch %s set up to track local branch %s by rebasing.) : - _(Branch %s set up to track local branch %s.), - local, shortname); - else if (!remote_is_branch origin) - printf_ln(rebasing ? - _(Branch %s set up to track remote ref %s by rebasing.) : - _(Branch %s set up to track remote ref %s.), - local, remote); - else if (!remote_is_branch !origin) - printf_ln(rebasing ? - _(Branch %s set up to track local ref %s by rebasing.) : - _(Branch %s set up to track local ref %s.), - local, remote); + printf_ln(_(messages[index]), + local, shortname, origin); else + printf_ln(_(messages[index]), + local, (!remote_is_branch) ? remote : shortname); + + if (index 0 || index sizeof(messages) / sizeof(*messages)) die(BUG: impossible combination of %d and %p, remote_is_branch, origin); } -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] branch.c: simplify chain of if statements
This patch uses a table-driven approach in order to make the code cleaner. Although not necessary, it helps code reability by not forcing the user to read the print message when trying to understand what the code does. The rebase check has been moved to the verbose if statement to avoid making the same check in each of the four if statements. Signed-off-by: Dragos Foianu dragos.foi...@gmail.com --- branch.c | 32 1 file changed, 16 insertions(+), 16 deletions(-) diff --git a/branch.c b/branch.c index 723a36b..e2fe455 100644 --- a/branch.c +++ b/branch.c @@ -54,6 +54,14 @@ void install_branch_config(int flag, const char *local, const char *origin, cons struct strbuf key = STRBUF_INIT; int rebasing = should_setup_rebase(origin); + const char *verbose_prints[4] = { + Branch %s set up to track remote branch %s from %s%s, + Branch %s set up to track local branch %s%s, + Branch %s set up to track remote ref %s%s, + Branch %s set up to track local ref %s%s + }; + char *verbose_rebasing = rebasing ? by rebasing. : .; + if (remote_is_branch !strcmp(local, shortname) !origin) { @@ -78,25 +86,17 @@ void install_branch_config(int flag, const char *local, const char *origin, cons if (flag BRANCH_CONFIG_VERBOSE) { if (remote_is_branch origin) - printf_ln(rebasing ? - _(Branch %s set up to track remote branch %s from %s by rebasing.) : - _(Branch %s set up to track remote branch %s from %s.), - local, shortname, origin); + printf_ln(_(verbose_prints[0]), + local, shortname, origin, verbose_rebasing); else if (remote_is_branch !origin) - printf_ln(rebasing ? - _(Branch %s set up to track local branch %s by rebasing.) : - _(Branch %s set up to track local branch %s.), - local, shortname); + printf_ln(_(verbose_prints[1]), + local, shortname, verbose_rebasing); else if (!remote_is_branch origin) - printf_ln(rebasing ? - _(Branch %s set up to track remote ref %s by rebasing.) : - _(Branch %s set up to track remote ref %s.), - local, remote); + printf_ln(_(verbose_prints[2]), + local, remote, verbose_rebasing); else if (!remote_is_branch !origin) - printf_ln(rebasing ? - _(Branch %s set up to track local ref %s by rebasing.) : - _(Branch %s set up to track local ref %s.), - local, remote); + printf_ln(_(verbose_prints[3]), + local, remote, verbose_rebasing); else die(BUG: impossible combination of %d and %p, remote_is_branch, origin); -- 1.8.3.2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[GSoC 2014] git config API improvements
Hello, My name is Dragos Foianu and I am an undergraduate student at University Politehnica of Bucharest in Romania. This is my final year and I'm planning on doing something more exciting than the simple assignments I get from the university. I have been working with git for quite some time now and I'm currently using it for my diploma project. It was annoying at first but now I love it and it has helped me many times in the past. I wanted to contribute to the project and I feel that GSoC 2014 is the perfect opportunity for this. I am primarily interested in the git config API improvements project. I have glanced over the code in order to get an idea of how git-config currently works. The project idea page gives a very good description of what is desired. Caching and retrieving values by name sounds to me like a hint to use a hashtable-like data structure. Conveniently, there is already a hashmap implementation in git and even more conveniently, there is a cache implementation that uses that data structure. So that part is fairly straightforward. I have a question, however: how would I go about detecting when a cache invalidation is necessary? Considering I have read a configuration file and cached the configuration in memory, what can trigger one of the existing cache entries to be invalidated? That is all I have to ask for now. I will look over the code during the next few days to get the bigger picture and submit my application. At [1] you can find my microproject patch. I am eagerly awaiting for any questions you might have. All the best, Dragos [1] http://thread.gmane.org/gmane.comp.version-control.git/244210 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html