[PATCH] describe: rewrite name_rev() iteratively

2014-04-06 Thread Dragos Foianu
The git describe --contains command uses the name_rev() function which
is currently a recursive function. This causes a Stack Overflow when the
history is large enough.

Rewrite name_rev iteratively using a stack on the heap. This slightly
reduces performance due to the extra operations on the heap, but the
function no longer overflows the stack.

Reported-by: Sylvestre Ledru sylves...@mozilla.com
Signed-off-by: Dragos Foianu dragos.foi...@gmail.com
---
 builtin/name-rev.c |  176 ++--
 1 file changed, 128 insertions(+), 48 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index c824d4e..5848d81 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -19,66 +19,146 @@ static long cutoff = LONG_MAX;
 /* How many generations are maximally preferred over _one_ merge traversal? */
 #define MERGE_TRAVERSAL_WEIGHT 65535
 
+typedef struct rev_data {
+   struct commit *commit;
+   const char *tip_name;
+   int generation;
+   int distance;
+   int deref;
+} *rev_data;
+
+typedef struct rev_stack {
+   struct rev_data *rev;
+   struct rev_stack *next;
+} *rev_stack;
+
+static void stack_push(rev_stack *stack, rev_data data) {
+   rev_stack new_node = xmalloc(sizeof(*new_node));
+
+   new_node-rev = data;
+   new_node-next = *stack;
+   *stack = new_node;
+}
+
+static void stack_push_end(rev_stack *stack, rev_data data) {
+   rev_stack new_node = xmalloc(sizeof(*new_node));
+
+   while (*stack != NULL)
+   stack = (*stack)-next;
+   new_node-rev = data;
+   new_node-next = *stack;
+   *stack = new_node;
+}
+
+static rev_data stack_pop(rev_stack *stack) {
+   rev_stack next = (*stack)-next;
+   rev_data rev = (*stack)-rev;
+   free(*stack);
+
+   *stack = next;
+   return rev;
+}
+
+static rev_data make_rev_data(struct commit *commit,
+   const char* tip_name, int generation, int distance,
+   int deref)
+{
+   rev_data data = xmalloc(sizeof(*data));
+
+   data-commit = commit;
+   data-tip_name = tip_name;
+   data-generation = generation;
+   data-distance = distance;
+   data-deref = deref;
+
+   return data;
+}
+
 static void name_rev(struct commit *commit,
const char *tip_name, int generation, int distance,
int deref)
 {
-   struct rev_name *name = (struct rev_name *)commit-util;
-   struct commit_list *parents;
-   int parent_number = 1;
+   rev_stack stack = NULL;
+   rev_data data, next_rev;
 
-   parse_commit(commit);
+   data = make_rev_data(commit, tip_name, generation, distance, deref);
+   stack_push(stack, data);
 
-   if (commit-date  cutoff)
-   return;
+   while (stack != NULL) {
+   rev_data rev = stack_pop(stack);
 
-   if (deref) {
-   char *new_name = xmalloc(strlen(tip_name)+3);
-   strcpy(new_name, tip_name);
-   strcat(new_name, ^0);
-   tip_name = new_name;
+   struct rev_name *name = (struct rev_name *) rev-commit-util;
+   struct commit_list *parents;
+   int parent_number = 1;
 
-   if (generation)
-   die(generation: %d, but deref?, generation);
-   }
+   parse_commit(rev-commit);
+
+   if (rev-commit-date  cutoff)
+   continue;
+
+   if (rev-deref) {
+   char *new_name = xmalloc(strlen(rev-tip_name) + 3);
+   strcpy(new_name, rev-tip_name);
+   strcat(new_name, ^0);
+   rev-tip_name = new_name;
 
-   if (name == NULL) {
-   name = xmalloc(sizeof(rev_name));
-   commit-util = name;
-   goto copy_data;
-   } else if (name-distance  distance) {
+   if (rev-generation)
+   die(generation: %d, but deref?,
+   rev-generation);
+   }
+
+   if (name == NULL) {
+   name = xmalloc(sizeof(rev_name));
+   rev-commit-util = name;
+   goto copy_data;
+   } else if (name-distance  rev-distance) {
 copy_data:
-   name-tip_name = tip_name;
-   name-generation = generation;
-   name-distance = distance;
-   } else
-   return;
-
-   for (parents = commit-parents;
-   parents;
-   parents = parents-next, parent_number++) {
-   if (parent_number  1) {
-   int len = strlen(tip_name);
-   char *new_name = xmalloc(len +
-   1 + decimal_length(generation) +  /* ~n */
-   1 + 2 +   /* ^NN

Re: [BUG] Segfault on git describe

2014-03-22 Thread Dragos Foianu
Jeff King peff at peff.net writes:

 
 So I think we'd be happy to see it converted to an iterative process
 (probably with a stack on the heap). In addition to name-rev, I believe
 that tag --contains will recurse down the longest history path, too (I
 think there may have been experimental patches for the latter, but you'd
 have to search the list archive).
 
 -Peff
 

Alright. I'll look into it and hopefully have a patch by the end of the weekend.

-Dragos

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [BUG] Segfault on git describe

2014-03-19 Thread Dragos Foianu
Sylvestre Ledru sylvestre at mozilla.com writes:

 
 Hello,
 
 Trying to do some stats using the Firefox git repository
 (https://github.com/mozilla/gecko-dev), I found a bug
 on git describe. The following command will segfault:
 git describe --contains a9ff31aebd6dbda82a3c733a72eeeaa0b0525b96
 
 Please note that the Firefox history is a pretty long and this commit
 date is 2001.
 
 I experience this issue with the git version, and Debian packages
 (1.9.0-1 and 2.0~next.20140214-2)
 
 As attachment, the backtrace. I removed about 87250 calls to the
 name_rev function. I guess that is a potential source of problem.
 
 Full is available here:
 http://people.mozilla.org/~sledru/bt-git-on-ff.txt (21 MB)
 
 I am available to test patches if needed.
 
 Thanks,
 Sylvestre
 PS: I am not registered, please cc me.

Hello,

The name_rev function recursively calls itself which is why the backtrace is
so big. Unfortunately, for repos with long histories it can lead to Stack
Overflows. This is pretty much what happened in your case.

I tested it on my computer and I get the same results. I managed to get it
working by doubling my default stacksize:

ulimit -S -s 16192

Considering your project is a very large one and merely allocating a few
more resources fixes the problem, I'm not sure it warrants rewriting the
function to use less stack. You will have to wait for one of the maintainers
to give you a definitive answer.

All the best,
Dragos

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCHv2] branch.c: simplify chain of if statements

2014-03-19 Thread Dragos Foianu
Eric Sunshine sunshine at sunshineco.com writes:

 
 Other submissions have computed this value mathematically without need
 for conditionals. For instance, we've seen:
 
 index = (!!origin  0) + (!!remote_is_branch  1) + (!!rebasing  2)
 
 as, well as the equivalent:
 
 index = !!origin + !!remote_is_branch * 2 + !!rebasing * 4
 
 Although this works, it does place greater cognitive demands on the
 reader by requiring more effort to figure out what is going on and how
 it relates to table position. The original (ungainly) chain of 'if'
 statements in the original code does not suffer this problem. It
 likewise is harder to understand than merely indexing into a
 multi-dimension table where each variable is a key.

I have seen other submissions using this logic, but I didn't think it
accomplished the goal of the patch - simplifying the code. Not that my
approach does either, but I found it a little easier to understand.

Indexing into a table like this is always going to have this problem so this
is probably not the right approach to accomplishing the microproject's goals.

 It's possible to simplify this logic and have only a single
 printf_ln() invocation. Hint: It's safe to pass in more arguments than
 there are %s directives in the format string.

Indeed. It's a habit of mine to pass the exact number of arguments to printf
functions and I can't seem to get away from it.

 You can use ARRAY_SIZE() in place of sizeof(...)/sizeof(...).
 
 Since an out-of-bound index would be a programmer bug, it would
 probably be more appropriate to use an assert(), just after 'index' is
 computed, rather than if+die(). The original code used die() because
 it couldn't detect the error until the end of the if-chain.

Thank you for this hint. Using already defined helpers in the project is
better and will prevent the need to patch the constructs later on.

 On Tue, Mar 18, 2014 at 6:31 PM, Eric Sunshine sunshine at
sunshineco.com wrote:
 
 One other observation: You have a one-off error in your out-of-bounds
 check. It should be 'index = sizeof...'

Well this is embarrasing.

Thank you again for the feedback. It's incredibily helpful and I learned a
lot from submitting these patches. Making the code simple is harder than it
appears at first sight.

I'm not sure it's worth pursuing the table approach further, especially
since a solution has already been accepted and merged into the codebase.

In this case, is it okay to try another microproject? I was thinking about
trying #17 (finding bugs/inefficiencies in builtin/apply.c), but I've
already had my one microproject.

All the best,
Dragos

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] diff: optimise parse_dirstat_params() to only compare strings when necessary

2014-03-19 Thread Dragos Foianu
parse_dirstat_params() goes through a chain of if statements using
strcmp to parse parameters. When the parameter is a digit, the
value must go through all comparisons before the function realises
it is a digit. Optimise this logic by only going through the chain
of string compares when the parameter is not a digit.

Signed-off-by: Dragos Foianu dragos.foi...@gmail.com
---
 diff.c | 37 +++--
 1 file changed, 19 insertions(+), 18 deletions(-)

diff --git a/diff.c b/diff.c
index e343191..733764e 100644
--- a/diff.c
+++ b/diff.c
@@ -84,20 +84,25 @@ static int parse_dirstat_params(struct diff_options 
*options, const char *params
string_list_split_in_place(params, params_copy, ',', -1);
for (i = 0; i  params.nr; i++) {
const char *p = params.items[i].string;
-   if (!strcmp(p, changes)) {
-   DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
-   DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-   } else if (!strcmp(p, lines)) {
-   DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
-   DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
-   } else if (!strcmp(p, files)) {
-   DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
-   DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
-   } else if (!strcmp(p, noncumulative)) {
-   DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
-   } else if (!strcmp(p, cumulative)) {
-   DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
-   } else if (isdigit(*p)) {
+   if (!isdigit(*p)) {
+   if (!strcmp(p, changes)) {
+   DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+   DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+   } else if (!strcmp(p, lines)) {
+   DIFF_OPT_SET(options, DIRSTAT_BY_LINE);
+   DIFF_OPT_CLR(options, DIRSTAT_BY_FILE);
+   } else if (!strcmp(p, files)) {
+   DIFF_OPT_CLR(options, DIRSTAT_BY_LINE);
+   DIFF_OPT_SET(options, DIRSTAT_BY_FILE);
+   } else if (!strcmp(p, noncumulative)) {
+   DIFF_OPT_CLR(options, DIRSTAT_CUMULATIVE);
+   } else if (!strcmp(p, cumulative)) {
+   DIFF_OPT_SET(options, DIRSTAT_CUMULATIVE);
+   } else {
+   strbuf_addf(errmsg, _(  Unknown dirstat 
parameter '%s'\n), p);
+   ret++;
+   }
+   } else  {
char *end;
int permille = strtoul(p, end, 10) * 10;
if (*end == '.'  isdigit(*++end)) {
@@ -114,11 +119,7 @@ static int parse_dirstat_params(struct diff_options 
*options, const char *params
p);
ret++;
}
-   } else {
-   strbuf_addf(errmsg, _(  Unknown dirstat parameter 
'%s'\n), p);
-   ret++;
}
-
}
string_list_clear(params, 0);
free(params_copy);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] diff: optimise parse_dirstat_params() to only compare strings when necessary

2014-03-19 Thread Dragos Foianu
I will send another version of this patch after review because there is an
extra whitespace following the else statement.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] branch.c: simplify chain of if statements

2014-03-17 Thread Dragos Foianu
Eric Sunshine sunshine at sunshineco.com writes:

 In fact, this change is not table-driven (emphasis on *driven*). It
 merely moves the strings into a table, but all the logic is still in
 the code. To be table-driven, the logic would be encoded in the table
 as well, and that logic would *drive* the code.
 
 This is not to say that the code must be table-driven. The GSoC
 microproject merely asks if doing so would make sense. Whether it
 would is for you to decide, and then to explain to reviewers the
 reason(s) why you did or did not make it so.

 The if-chain is just sufficiently complex that the print messages may
 actually help the reader understand the logic of the code, so this
 argument seems specious.

I understand now after reading your review. I will try to fix this in a
future attempt.
 
 Matthieu already mentioned [2] that this sort of lego string
 construction is not internationalization-friendly. See section 4.3 [3]
 of the gettext manual for details.

I was hoping to get away with using less memory by having only four entries
in the table. I suppose that is not possible. The rebasing check can still
be moved outside of the four if statements and calculate the index
correctly. The strings would then have to be arranged in such a way to make
this work.

Using a multiple-dimension array as suggested in other submissions for this
particular microproject would probably be better, but it has already been done.

 These hard-coded indexing constants (0, 1, 2, 3) are fragile and
 convey little meaning to the reader. Try to consider how to compute
 the index into verbose_prints[] based upon the values of
 'remote_is_branch' and 'origin'. What are the different ways you could
 do so?

I was going to do something like this: if !remote_is_branch the index goes
incremented by 2, because the first two entries are of no interest and if
!origin, the index is incremented by 1. This would correctly compute the
index. It should also work with the rebasing check if the four
rebasing-specific messages are at the end of the table and when rebasing the
index is set to start at those messages.

The reason I did not go with this is because I would still need the four ifs
in order to keep the bug check part of the code. I might be able to find a
work-around for it on the second attempt.

I have seen N_() used in other code but I wasn't sure what its purpose was.

Thank you very much for the review.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: push fail

2014-03-17 Thread Dragos Foianu
shawn wilson ag4ve.us at gmail.com writes:

 
 How do I get more info here (and hopefully resolve this)?
 
  % git push
 To ssh://server/foo/repo.git
  ! [rejected]test - test (non-fast-forward)
 error: failed to push some refs to 'ssh://server/foo/repo.git'
 

non-fast-forward means that someone else pushed to branch test before you
did and your push would end up overwriting their changes. Make sure you
merge your local branch with the remote branch:

git pull origin test

It might also be a result of local destructive changes made by git rebase.
If you're absolutely certain you will not mess up the remote branch you can
add the --force parameter when you push.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCHv2] branch.c: simplify chain of if statements

2014-03-17 Thread Dragos Foianu
This patch uses a table to store the different messages that can
be emitted by the verbose install_branch_config function. It
computes an index based on the three flags and prints the message
located at the specific index in the table of messages. If the
index somehow is not within the table, we have a bug.

Signed-off-by: Dragos Foianu dragos.foi...@gmail.com
---
 branch.c | 44 +---
 1 file changed, 25 insertions(+), 19 deletions(-)

diff --git a/branch.c b/branch.c
index 723a36b..95645d5 100644
--- a/branch.c
+++ b/branch.c
@@ -54,6 +54,18 @@ void install_branch_config(int flag, const char *local, 
const char *origin, cons
struct strbuf key = STRBUF_INIT;
int rebasing = should_setup_rebase(origin);
 
+   const char *messages[] = {
+   N_(Branch %s set up to track local ref %s.),
+   N_(Branch %s set up to track remote ref %s.),
+   N_(Branch %s set up to track local branch %s.),
+   N_(Branch %s set up to track remote branch %s from %s.),
+   N_(Branch %s set up to track local ref %s by rebasing.),
+   N_(Branch %s set up to track remote ref %s by rebasing.),
+   N_(Branch %s set up to track local branch %s by rebasing.),
+   N_(Branch %s set up to track remote branch %s from %s by 
rebasing.)
+   };
+   int index = 0;
+
if (remote_is_branch
 !strcmp(local, shortname)
 !origin) {
@@ -76,28 +88,22 @@ void install_branch_config(int flag, const char *local, 
const char *origin, cons
}
strbuf_release(key);
 
+   if (origin)
+   index += 1;
+   if (remote_is_branch)
+   index += 2;
+   if (rebasing)
+   index += 4;
+
if (flag  BRANCH_CONFIG_VERBOSE) {
if (remote_is_branch  origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track remote branch %s 
from %s by rebasing.) :
- _(Branch %s set up to track remote branch %s 
from %s.),
- local, shortname, origin);
-   else if (remote_is_branch  !origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track local branch %s 
by rebasing.) :
- _(Branch %s set up to track local branch 
%s.),
- local, shortname);
-   else if (!remote_is_branch  origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track remote ref %s by 
rebasing.) :
- _(Branch %s set up to track remote ref %s.),
- local, remote);
-   else if (!remote_is_branch  !origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track local ref %s by 
rebasing.) :
- _(Branch %s set up to track local ref %s.),
- local, remote);
+   printf_ln(_(messages[index]),
+   local, shortname, origin);
else
+   printf_ln(_(messages[index]),
+   local, (!remote_is_branch) ? remote : 
shortname);
+
+   if (index  0 || index  sizeof(messages) / sizeof(*messages))
die(BUG: impossible combination of %d and %p,
remote_is_branch, origin);
}
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] branch.c: simplify chain of if statements

2014-03-16 Thread Dragos Foianu
This patch uses a table-driven approach in order to make the code
cleaner. Although not necessary, it helps code reability by not
forcing the user to read the print message when trying to
understand what the code does. The rebase check has been moved to
the verbose if statement to avoid making the same check in each of
the four if statements.

Signed-off-by: Dragos Foianu dragos.foi...@gmail.com
---
 branch.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/branch.c b/branch.c
index 723a36b..e2fe455 100644
--- a/branch.c
+++ b/branch.c
@@ -54,6 +54,14 @@ void install_branch_config(int flag, const char *local, 
const char *origin, cons
struct strbuf key = STRBUF_INIT;
int rebasing = should_setup_rebase(origin);
 
+   const char *verbose_prints[4] = {
+   Branch %s set up to track remote branch %s from %s%s,
+   Branch %s set up to track local branch %s%s,
+   Branch %s set up to track remote ref %s%s,
+   Branch %s set up to track local ref %s%s
+   };
+   char *verbose_rebasing = rebasing ?  by rebasing. : .;
+
if (remote_is_branch
 !strcmp(local, shortname)
 !origin) {
@@ -78,25 +86,17 @@ void install_branch_config(int flag, const char *local, 
const char *origin, cons
 
if (flag  BRANCH_CONFIG_VERBOSE) {
if (remote_is_branch  origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track remote branch %s 
from %s by rebasing.) :
- _(Branch %s set up to track remote branch %s 
from %s.),
- local, shortname, origin);
+   printf_ln(_(verbose_prints[0]),
+   local, shortname, origin, verbose_rebasing);
else if (remote_is_branch  !origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track local branch %s 
by rebasing.) :
- _(Branch %s set up to track local branch 
%s.),
- local, shortname);
+   printf_ln(_(verbose_prints[1]),
+   local, shortname, verbose_rebasing);
else if (!remote_is_branch  origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track remote ref %s by 
rebasing.) :
- _(Branch %s set up to track remote ref %s.),
- local, remote);
+   printf_ln(_(verbose_prints[2]),
+   local, remote, verbose_rebasing);
else if (!remote_is_branch  !origin)
-   printf_ln(rebasing ?
- _(Branch %s set up to track local ref %s by 
rebasing.) :
- _(Branch %s set up to track local ref %s.),
- local, remote);
+   printf_ln(_(verbose_prints[3]),
+   local, remote, verbose_rebasing);
else
die(BUG: impossible combination of %d and %p,
remote_is_branch, origin);
-- 
1.8.3.2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[GSoC 2014] git config API improvements

2014-03-16 Thread Dragos Foianu
Hello,

My name is Dragos Foianu and I am an undergraduate student at University 
Politehnica of Bucharest in Romania. This is my final year and I'm planning on 
doing something more exciting than the simple 
assignments I get from the university.

I have been working with git for quite some time now and I'm currently using it 
for my diploma project. It was annoying at first but now I love it and it has 
helped me many times in the past. I wanted to 
contribute to the project and I feel that GSoC 2014 is the perfect opportunity 
for this.

I am primarily interested in the git config API improvements project. I have 
glanced over the code in order to get an idea of how git-config currently 
works. The project idea page gives a very good 
description of what is desired. Caching and retrieving values by name sounds to 
me like a hint to use a hashtable-like data structure. Conveniently, there is 
already a hashmap implementation in git and 
even more conveniently, there is a cache implementation that uses that data 
structure. So that part is fairly straightforward.

I have a question, however: how would I go about detecting when a cache 
invalidation is necessary? Considering I have read a configuration file and 
cached the configuration in memory, what can trigger 
one of the existing cache entries to be invalidated?

That is all I have to ask for now. I will look over the code during the next 
few days to get the bigger picture and submit my application. At [1] you can 
find my microproject patch. I am eagerly awaiting 
for any questions you might have.

All the best,
Dragos

[1] http://thread.gmane.org/gmane.comp.version-control.git/244210
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html