Re: [PATCH 05/23] pack v4: add commit object parsing
Nicolas Pitre writes: > Hmmm The problem I have with split_ident_line() right now is about > the fact that it is too liberal with whitespaces. Here I must be sure I > can deconstruct a commit object and be sure I still can regenerate it > byte for byte in order to match its SHA1 signature. Yeah, I now see. It's the same "accept with leniency" get_sha1_hex() does, which is not appropriate for the purpose of this codepath. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/23] pack v4: add commit object parsing
On Tue, 27 Aug 2013, Junio C Hamano wrote: > Nicolas Pitre writes: > > > Let's create another dictionary table to hold the author and committer > > entries. We use the same table format used for tree entries where the > > 16 bit data prefix is conveniently used to store the timezone value. > > > > In order to copy straight from a commit object buffer, dict_add_entry() > > is modified to get the string length as the provided string pointer is > > not always be null terminated. > > > > Signed-off-by: Nicolas Pitre > > --- > > @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct > > dict_table *t) > > rehash_entries(t); > > } > > > > +static struct dict_table *commit_name_table; > > static struct dict_table *tree_path_table; > > > > +/* > > + * Parse the author/committer line from a canonical commit object. > > + * The 'from' argument points right after the "author " or "committer " > > + * string. The time zone is parsed and stored in *tz_val. The returned > > + * pointer is right after the end of the email address which is also just > > + * before the time value, or NULL if a parsing error is encountered. > > + */ > > +static char *get_nameend_and_tz(char *from, int *tz_val) > > +{ > > + char *end, *tz; > > + > > + tz = strchr(from, '\n'); > > + /* let's assume the smallest possible string to be "x 0 +\n" */ > > + if (!tz || tz - from < 13) > > + return NULL; > > + tz -= 4; > > + end = tz - 4; > > + while (end - from > 5 && *end != ' ') > > + end--; > > + if (end[-1] != '>' || end[0] != ' ' || tz[-2] != ' ') > > + return NULL; > > + *tz_val = (tz[0] - '0') * 1000 + > > + (tz[1] - '0') * 100 + > > + (tz[2] - '0') * 10 + > > + (tz[3] - '0'); > > + switch (tz[-1]) { > > + default:return NULL; > > + case '+': break; > > + case '-': *tz_val = -*tz_val; > > + } > > + return end; > > +} > > This may want to share code with ident.c::split_ident_line(), as we > have been trying to reduce the number of ident-line parsers. Hmmm The problem I have with split_ident_line() right now is about the fact that it is too liberal with whitespaces. Here I must be sure I can deconstruct a commit object and be sure I still can regenerate it byte for byte in order to match its SHA1 signature. So there _must_ always be only one space between the email closing bracket and the time stamp, only one space between the time stamp and the time zone value, and no space after the time zone. Is there a reason why split_ident_line() is not stricter in that regard? Nicolas -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 05/23] pack v4: add commit object parsing
Nicolas Pitre writes: > Let's create another dictionary table to hold the author and committer > entries. We use the same table format used for tree entries where the > 16 bit data prefix is conveniently used to store the timezone value. > > In order to copy straight from a commit object buffer, dict_add_entry() > is modified to get the string length as the provided string pointer is > not always be null terminated. > > Signed-off-by: Nicolas Pitre > --- > @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table > *t) > rehash_entries(t); > } > > +static struct dict_table *commit_name_table; > static struct dict_table *tree_path_table; > > +/* > + * Parse the author/committer line from a canonical commit object. > + * The 'from' argument points right after the "author " or "committer " > + * string. The time zone is parsed and stored in *tz_val. The returned > + * pointer is right after the end of the email address which is also just > + * before the time value, or NULL if a parsing error is encountered. > + */ > +static char *get_nameend_and_tz(char *from, int *tz_val) > +{ > + char *end, *tz; > + > + tz = strchr(from, '\n'); > + /* let's assume the smallest possible string to be "x 0 +\n" */ > + if (!tz || tz - from < 13) > + return NULL; > + tz -= 4; > + end = tz - 4; > + while (end - from > 5 && *end != ' ') > + end--; > + if (end[-1] != '>' || end[0] != ' ' || tz[-2] != ' ') > + return NULL; > + *tz_val = (tz[0] - '0') * 1000 + > + (tz[1] - '0') * 100 + > + (tz[2] - '0') * 10 + > + (tz[3] - '0'); > + switch (tz[-1]) { > + default:return NULL; > + case '+': break; > + case '-': *tz_val = -*tz_val; > + } > + return end; > +} This may want to share code with ident.c::split_ident_line(), as we have been trying to reduce the number of ident-line parsers. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 05/23] pack v4: add commit object parsing
Let's create another dictionary table to hold the author and committer entries. We use the same table format used for tree entries where the 16 bit data prefix is conveniently used to store the timezone value. In order to copy straight from a commit object buffer, dict_add_entry() is modified to get the string length as the provided string pointer is not always be null terminated. Signed-off-by: Nicolas Pitre --- packv4-create.c | 98 +++-- 1 file changed, 89 insertions(+), 9 deletions(-) diff --git a/packv4-create.c b/packv4-create.c index eccd9fc..5c08871 100644 --- a/packv4-create.c +++ b/packv4-create.c @@ -1,5 +1,5 @@ /* - * packv4-create.c: management of dictionary tables used in pack v4 + * packv4-create.c: creation of dictionary tables and objects used in pack v4 * * (C) Nicolas Pitre * @@ -80,9 +80,9 @@ static void rehash_entries(struct dict_table *t) } } -int dict_add_entry(struct dict_table *t, int val, const char *str) +int dict_add_entry(struct dict_table *t, int val, const char *str, int str_len) { - int i, val_len = 2, str_len = strlen(str) + 1; + int i, val_len = 2; if (t->ptr + val_len + str_len > t->size) { t->size = (t->size + val_len + str_len + 1024) * 3 / 2; @@ -92,6 +92,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str) t->data[t->ptr] = val >> 8; t->data[t->ptr + 1] = val; memcpy(t->data + t->ptr + val_len, str, str_len); + t->data[t->ptr + val_len + str_len] = 0; i = (t->nb_entries) ? locate_entry(t, t->data + t->ptr, val_len + str_len) : -1; @@ -107,7 +108,7 @@ int dict_add_entry(struct dict_table *t, int val, const char *str) t->entry[t->nb_entries].offset = t->ptr; t->entry[t->nb_entries].size = val_len + str_len; t->entry[t->nb_entries].hits = 1; - t->ptr += val_len + str_len; + t->ptr += val_len + str_len + 1; t->nb_entries++; if (t->hash_size * 3 <= t->nb_entries * 4) @@ -135,8 +136,73 @@ static void sort_dict_entries_by_hits(struct dict_table *t) rehash_entries(t); } +static struct dict_table *commit_name_table; static struct dict_table *tree_path_table; +/* + * Parse the author/committer line from a canonical commit object. + * The 'from' argument points right after the "author " or "committer " + * string. The time zone is parsed and stored in *tz_val. The returned + * pointer is right after the end of the email address which is also just + * before the time value, or NULL if a parsing error is encountered. + */ +static char *get_nameend_and_tz(char *from, int *tz_val) +{ + char *end, *tz; + + tz = strchr(from, '\n'); + /* let's assume the smallest possible string to be "x 0 +\n" */ + if (!tz || tz - from < 13) + return NULL; + tz -= 4; + end = tz - 4; + while (end - from > 5 && *end != ' ') + end--; + if (end[-1] != '>' || end[0] != ' ' || tz[-2] != ' ') + return NULL; + *tz_val = (tz[0] - '0') * 1000 + + (tz[1] - '0') * 100 + + (tz[2] - '0') * 10 + + (tz[3] - '0'); + switch (tz[-1]) { + default:return NULL; + case '+': break; + case '-': *tz_val = -*tz_val; + } + return end; +} + +static int add_commit_dict_entries(void *buf, unsigned long size) +{ + char *name, *end = NULL; + int tz_val; + + if (!commit_name_table) + commit_name_table = create_dict_table(); + + /* parse and add author info */ + name = strstr(buf, "\nauthor "); + if (name) { + name += 8; + end = get_nameend_and_tz(name, &tz_val); + } + if (!name || !end) + return -1; + dict_add_entry(commit_name_table, tz_val, name, end - name); + + /* parse and add committer info */ + name = strstr(end, "\ncommitter "); + if (name) { + name += 11; + end = get_nameend_and_tz(name, &tz_val); + } + if (!name || !end) + return -1; + dict_add_entry(commit_name_table, tz_val, name, end - name); + + return 0; +} + static int add_tree_dict_entries(void *buf, unsigned long size) { struct tree_desc desc; @@ -146,13 +212,16 @@ static int add_tree_dict_entries(void *buf, unsigned long size) tree_path_table = create_dict_table(); init_tree_desc(&desc, buf, size); - while (tree_entry(&desc, &name_entry)) + while (tree_entry(&desc, &name_entry)) { + int pathlen = tree_entry_len(&name_entry); dict_add_entry(tree_path_table, name_entry.mode, - name_entry.path); + name_entry.path, pathlen); + } + return 0; } -void dict_dump(struct dict_