Re: [PATCH v3 18/24] read-cache: write index-v5

2013-11-25 Thread Thomas Gummerer

[sorry about taking so much time to get back to you, was too busy with
other stuff to work on git]

Duy Nguyen  writes:

> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer  wrote:
>> +char *super_directory(const char *filename)
>> +{
>> +   char *slash;
>> +
>> +   slash = strrchr(filename, '/');
>> +   if (slash)
>> +   return xmemdupz(filename, slash-filename);
>> +   return NULL;
>> +}
>> +
>
> Why is this function not static? There are a few other in this patch
> (I did not notice in others, but I wasn't looking for them..)

That's just an oversight, it should be static.  Will look for the others
and change them to static too.

> But isn't this what dirname() is?
>
> Another point about this function is it returns a new allocated
> string, but I see no free() anywhere in this patch. Leak alert!

Yes, you're right, I'll use dirname instead.  That also solves the free
problem, as dirname modifies the string.  I'll still use a wrapper
around it, because dirname returns "." when there is no super_directory,
but using NULL for that makes it simpler to check it.

>> +struct directory_entry *init_directory_entry(char *pathname, int len)
>> +{
>> +   struct directory_entry *de = xmalloc(directory_entry_size(len));
>> +
>> +   memcpy(de->pathname, pathname, len);
>> +   de->pathname[len] = '\0';
>> +   de->de_flags  = 0;
>> +   de->de_foffset= 0;
>> +   de->de_cr = 0;
>> +   de->de_ncr= 0;
>> +   de->de_nsubtrees  = 0;
>> +   de->de_nfiles = 0;
>> +   de->de_nentries   = 0;
>> +   memset(de->sha1, 0, 20);
>> +   de->de_pathlen= len;
>> +   de->next  = NULL;
>> +   de->next_hash = NULL;
>> +   de->ce= NULL;
>> +   de->ce_last   = NULL;
>> +   de->conflict  = NULL;
>> +   de->conflict_last = NULL;
>> +   de->conflict_size = 0;
>> +   return de;
>> +}
>
> I think this function could be shortened to
>
> struct directory_entry *de = xcalloc(1, directory_entry_size(len));
> memcpy(de->pathname, pathname, len);
> de->de_pathlen = len;
> return de;

Makes sense, thanks.

>> +static struct directory_entry *get_directory(char *dir, unsigned int 
>> dir_len,
>> +struct hash_table *table,
>> +unsigned int *total_dir_len,
>> +unsigned int *ndir,
>> +struct directory_entry 
>> **current)
>> +{
>> +   struct directory_entry *tmp = NULL, *search, *new, *ret;
>> +   uint32_t crc;
>> +
>> +   search = find_directory(dir, dir_len, &crc, table);
>> +   if (search)
>> +   return search;
>> +   while (!search) {
>> +   new = init_directory_entry(dir, dir_len);
>> +   insert_directory_entry(new, table, total_dir_len, ndir, crc);
>> +   if (!tmp)
>> +   ret = new;
>> +   else
>> +   new->de_nsubtrees = 1;
>> +   new->next = tmp;
>> +   tmp = new;
>> +   dir = super_directory(dir);
>
> It feels more natural to create directory_entry(s) from parent to
> subdir. If you do so you could reset dir to the remaining of directory
> and perform strchr() and do not need to allocate new memory everytime
> you call super_directory (because it relies on NUL at the end of the
> string).

I'm creating them from subdir to parent because it saves a few calls
whenever a superdir is already in the hash_table.  And I'm using dirname
so the new memory allocations are not a problem anymore.

>> +   dir_len = dir ? strlen(dir) : 0;
>> +   search = find_directory(dir, dir_len, &crc, table);
>> +   }
>> +   search->de_nsubtrees++;
>> +   (*current)->next = tmp;
>> +   while ((*current)->next)
>> +   *current = (*current)->next;
>> +
>> +   return ret;
>> +}
>
>> +static struct directory_entry *compile_directory_data(struct index_state 
>> *istate,
>> + int nfile,
>> + unsigned int *ndir,
>> + unsigned int 
>> *total_dir_len,
>> + unsigned int 
>> *total_file_len)
>> +{
>> +   int i, dir_len = -1;
>> +   char *dir;
>> +   struct directory_entry *de, *current, *search;
>> +   struct cache_entry **cache = istate->cache;
>> +   struct conflict_entry *conflict_entry;
>> +   struct hash_table table;
>> +   uint32_t crc;
>> +
>> +   init_hash(&table);
>> +   de = init_directory_entry("", 0);
>> +   current = de;
>> +   *ndir = 1;
>> +   *total_dir_len = 1;
>> +   crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
>> +   insert_hash(crc, de, &table);
>> + 

Re: [PATCH v3 18/24] read-cache: write index-v5

2013-08-24 Thread Duy Nguyen
On Sat, Aug 24, 2013 at 11:07 AM, Duy Nguyen  wrote:
> On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer  wrote:
>> Write the index version 5 file format to disk. This version doesn't
>> write the cache-tree data and resolve-undo data to the file.
>
> I keep having things to add after sending my emails. Now that we have
> all conflicted entries in file block, the conflict data block becomes
> optional, it functions exactly (I think) like resolve-undo extension,
> which makes me think it might make sense to make conflict data block
> an extension.
>
> If we make it so, we might want to move "cr" and "ncr" fields out of
> direntries. I don't see a solution yet, but I think it's interesting
> because future extensions might want to attach stuff to direntries,
> just like "cr"/"ncr" from conflict extension. We may want to think now
> how that might be done (and conflict extension is a good exercise to
> see how it works out)

And the solution is pretty obvious: keep resolve-undo extension as
_extension_ just like in v2 (and no "cr/ncr" in direntries). The only
difference is the time resolve-undo extension is updated: in v2, new
entries are added at remove_index_entry_at(), in v5 new entries are
added when new stage entries  are detected at write_index().
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 18/24] read-cache: write index-v5

2013-08-23 Thread Duy Nguyen
On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer  wrote:
> Write the index version 5 file format to disk. This version doesn't
> write the cache-tree data and resolve-undo data to the file.

I keep having things to add after sending my emails. Now that we have
all conflicted entries in file block, the conflict data block becomes
optional, it functions exactly (I think) like resolve-undo extension,
which makes me think it might make sense to make conflict data block
an extension.

If we make it so, we might want to move "cr" and "ncr" fields out of
direntries. I don't see a solution yet, but I think it's interesting
because future extensions might want to attach stuff to direntries,
just like "cr"/"ncr" from conflict extension. We may want to think now
how that might be done (and conflict extension is a good exercise to
see how it works out)
-- 
Duy
I
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3 18/24] read-cache: write index-v5

2013-08-23 Thread Duy Nguyen
On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer  wrote:
> +char *super_directory(const char *filename)
> +{
> +   char *slash;
> +
> +   slash = strrchr(filename, '/');
> +   if (slash)
> +   return xmemdupz(filename, slash-filename);
> +   return NULL;
> +}
> +

Why is this function not static? There are a few other in this patch
(I did not notice in others, but I wasn't looking for them..)

But isn't this what dirname() is?

Another point about this function is it returns a new allocated
string, but I see no free() anywhere in this patch. Leak alert!

> +struct directory_entry *init_directory_entry(char *pathname, int len)
> +{
> +   struct directory_entry *de = xmalloc(directory_entry_size(len));
> +
> +   memcpy(de->pathname, pathname, len);
> +   de->pathname[len] = '\0';
> +   de->de_flags  = 0;
> +   de->de_foffset= 0;
> +   de->de_cr = 0;
> +   de->de_ncr= 0;
> +   de->de_nsubtrees  = 0;
> +   de->de_nfiles = 0;
> +   de->de_nentries   = 0;
> +   memset(de->sha1, 0, 20);
> +   de->de_pathlen= len;
> +   de->next  = NULL;
> +   de->next_hash = NULL;
> +   de->ce= NULL;
> +   de->ce_last   = NULL;
> +   de->conflict  = NULL;
> +   de->conflict_last = NULL;
> +   de->conflict_size = 0;
> +   return de;
> +}

I think this function could be shortened to

struct directory_entry *de = xcalloc(1, directory_entry_size(len));
memcpy(de->pathname, pathname, len);
de->de_pathlen = len;
return de;

> +static struct directory_entry *get_directory(char *dir, unsigned int dir_len,
> +struct hash_table *table,
> +unsigned int *total_dir_len,
> +unsigned int *ndir,
> +struct directory_entry **current)
> +{
> +   struct directory_entry *tmp = NULL, *search, *new, *ret;
> +   uint32_t crc;
> +
> +   search = find_directory(dir, dir_len, &crc, table);
> +   if (search)
> +   return search;
> +   while (!search) {
> +   new = init_directory_entry(dir, dir_len);
> +   insert_directory_entry(new, table, total_dir_len, ndir, crc);
> +   if (!tmp)
> +   ret = new;
> +   else
> +   new->de_nsubtrees = 1;
> +   new->next = tmp;
> +   tmp = new;
> +   dir = super_directory(dir);

It feels more natural to create directory_entry(s) from parent to
subdir. If you do so you could reset dir to the remaining of directory
and perform strchr() and do not need to allocate new memory everytime
you call super_directory (because it relies on NUL at the end of the
string).

> +   dir_len = dir ? strlen(dir) : 0;
> +   search = find_directory(dir, dir_len, &crc, table);
> +   }
> +   search->de_nsubtrees++;
> +   (*current)->next = tmp;
> +   while ((*current)->next)
> +   *current = (*current)->next;
> +
> +   return ret;
> +}

> +static struct directory_entry *compile_directory_data(struct index_state 
> *istate,
> + int nfile,
> + unsigned int *ndir,
> + unsigned int 
> *total_dir_len,
> + unsigned int 
> *total_file_len)
> +{
> +   int i, dir_len = -1;
> +   char *dir;
> +   struct directory_entry *de, *current, *search;
> +   struct cache_entry **cache = istate->cache;
> +   struct conflict_entry *conflict_entry;
> +   struct hash_table table;
> +   uint32_t crc;
> +
> +   init_hash(&table);
> +   de = init_directory_entry("", 0);
> +   current = de;
> +   *ndir = 1;
> +   *total_dir_len = 1;
> +   crc = crc32(0, (Bytef*)de->pathname, de->de_pathlen);
> +   insert_hash(crc, de, &table);
> +   conflict_entry = NULL;
> +   for (i = 0; i < nfile; i++) {
> +   if (cache[i]->ce_flags & CE_REMOVE)
> +   continue;
> +
> +   if (dir_len < 0
> +   || cache[i]->name[dir_len] != '/'

Need a check to make sure name[dir_len] is not out of bound

> +   || strchr(cache[i]->name + dir_len + 1, '/')
> +   || cache_name_compare(cache[i]->name, 
> ce_namelen(cache[i]),
> + dir, dir_len)) {

In my opinon, "if (dir_len < 0 || !(must && be && a && subdirectory))"
is easier to read..

> +   dir = super_directory(cache[i]->name);
> +   dir_len = dir ? strlen(dir) : 0;
> +   search = get_directory(dir, dir_len, &table,
> +