Re: [PATCH v3 15/24] read-cache: read index-v5
Nit, add_part_to_conflict_entry(), create_new_conflict() and related structures/macros are not used in this patch. The first caller is in the next patch (read resolve-undo data). -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 15/24] read-cache: read index-v5
On Wed, Aug 21, 2013 at 3:59 AM, Thomas Gummerer wrote: >>> +static int read_entries(struct index_state *istate, struct directory_entry >>> *de, >>> + unsigned int first_entry_offset, void *mmap, >>> + unsigned long mmap_size, unsigned int *nr, >>> + unsigned int foffsetblock) >>> +{ >>> + struct cache_entry *ce; >>> + int i, subdir = 0; >>> + >>> + for (i = 0; i < de->de_nfiles; i++) { >>> + unsigned int subdir_foffsetblock = de->de_foffset + >>> foffsetblock + (i * 4); >>> + if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, >>> mmap_size, >>> + first_entry_offset, subdir_foffsetblock) < 0) >>> + return -1; >> >> You read one file entry, say abc/def... > > You're not quite right here. I'm reading def here, de is the root > directory and de->sub[subdir] is the first sub directory, named abc/ > >>> + while (subdir < de->de_nsubtrees && >>> + cache_name_compare(ce->name + de->de_pathlen, >>> + ce_namelen(ce) - de->de_pathlen, >>> + de->sub[subdir]->pathname + >>> de->de_pathlen, >>> + de->sub[subdir]->de_pathlen - >>> de->de_pathlen) > 0) { >> >> Oh right the entry belongs the the substree "abc" so.. > > abc/ comes before def, so lets read everything in that directory first. > >>> + read_entries(istate, de->sub[subdir], >>> first_entry_offset, mmap, >>> +mmap_size, nr, foffsetblock); >> >> you recurse in, which will add following entries like abc/def and abc/xyz... > > Recurse in, add abc/def and abc/xyz, and increase nr in the recursion, > so the new entry gets added at the right place. > >>> + subdir++; >>> + } >>> + if (!ce) >>> + continue; >>> + set_index_entry(istate, (*nr)++, ce); >> >> then back here after recusion and add abc/def, again, after abc/xyz. >> Did I read this code correctly? > > After the recursion add def to at the 3rd position in the index. After > that it looks like: > abc/def > abc/xyz > def > > I hope that makes it a little clearer. It does. Thanks. >>> + de = root_directory; >>> + last_de = de; >>> + while (de) { >>> + if (need_root || >>> + match_pathspec_depth(&adjusted_pathspec, de->pathname, >>> de->de_pathlen, 0, NULL)) { >>> + if (read_entries(istate, de, entry_offset, >>> +mmap, mmap_size, &nr, >>> +foffsetblock) < 0) >>> + return -1; >>> + } else { >>> + for (i = 0; i < de->de_nsubtrees; i++) { >>> + last_de->next = de->sub[i]; >>> + last_de = last_de->next; >>> + } >>> + } >>> + de = de->next; >> >> I'm missing something here. read_entries is a function that reads all >> entries inside "de" including subdirectories and the first "de" is >> root_directory, which makes it read the whole index in. > > It does, except when the adjusted_pathspec doesn't match the > root_directory. In that case all the subdirectories of the > root_directory are added to a queue, which will then be iterated over > and tried to match with the adjusted_pathspec. That's what I missed. Thanks. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 15/24] read-cache: read index-v5
Duy Nguyen writes: > On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer wrote: >> +static int read_entry(struct cache_entry **ce, char *pathname, size_t >> pathlen, >> + void *mmap, unsigned long mmap_size, >> + unsigned int first_entry_offset, >> + unsigned int foffsetblock) >> +{ >> + int len, offset_to_offset; >> + char *name; >> + uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset; >> + struct ondisk_cache_entry *disk_ce; >> + >> + beginning = ptr_add(mmap, foffsetblock); >> + end = ptr_add(mmap, foffsetblock + 4); >> + len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct >> ondisk_cache_entry) - 5; >> + entry_offset = first_entry_offset + ntoh_l(*beginning); >> + name = ptr_add(mmap, entry_offset); >> + disk_ce = ptr_add(mmap, entry_offset + len + 1); >> + *ce = cache_entry_from_ondisk(disk_ce, pathname, name, len, pathlen); >> + filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce)); >> + offset_to_offset = htonl(foffsetblock); >> + foffsetblockcrc = crc32(0, (Bytef*)&offset_to_offset, 4); >> + if (!check_crc32(foffsetblockcrc, >> + ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce), >> + ntoh_l(*filecrc))) >> + return -1; >> + >> + return 0; >> +} > > Last thought before book+bed time. I wonder if moving the name part to > the end of the entry (i.e. chaging on disk format) would simplify this > code. The new ondisk_cache_entry would be something like this > > struct ondisk_cache_entry { >uint16_t flags; >uint16_t mode; >struct cache_time mtime; >uint32_t size; >int stat_crc; >unsigned char sha1[20]; >char name[FLEX_ARRAY]; > }; I think it simplifies it a bit, but not too much, the only thing I see avoiding the use of the name variable. I think it will also simplify the writing code a bit. The only negative part would be for bisecting the index, but that would still be possible, and only slightly more complicated. I'll give it a try tomorrow and check if it's worth it. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 15/24] read-cache: read index-v5
Duy Nguyen writes: > General comment: a short comment before each function describing what > the function does would be helpful. This only applies for complex > functions (read_* ones). Of course verify_hdr does not require extra > explanantion. Yes, makes sense, I'll do that in the re-roll. > On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer > wrote: >> +static struct directory_entry *directory_entry_from_ondisk(struct >> ondisk_directory_entry *ondisk, >> + const char *name, >> + size_t len) >> +{ >> + struct directory_entry *de = xmalloc(directory_entry_size(len)); >> + >> + memcpy(de->pathname, name, len); >> + de->pathname[len] = '\0'; >> + de->de_flags = ntoh_s(ondisk->flags); >> + de->de_foffset= ntoh_l(ondisk->foffset); >> + de->de_cr = ntoh_l(ondisk->cr); >> + de->de_ncr= ntoh_l(ondisk->ncr); >> + de->de_nsubtrees = ntoh_l(ondisk->nsubtrees); >> + de->de_nfiles = ntoh_l(ondisk->nfiles); >> + de->de_nentries = ntoh_l(ondisk->nentries); >> + de->de_pathlen= len; >> + hashcpy(de->sha1, ondisk->sha1); >> + return de; >> +} > > This function leaves a lot of fields uninitialized.. > >> +static struct directory_entry *read_directories(unsigned int *dir_offset, >> + unsigned int *dir_table_offset, >> + void *mmap, >> + int mmap_size) >> +{ >> >> + de = directory_entry_from_ondisk(disk_de, name, len); >> + de->next = NULL; >> + de->sub = NULL; > > ..and two of them are set to NULL here. Maybe > directory_entry_from_ondisk() could be made to call > init_directory_entry() instead so that we don't need to manually reset > some fields here, which leaves me wondering why other fields are not > important to reset. init_directory_entry() is introduced later in > "write index-v5" patch, you so may want to move it up a few patches. The rest of the fields are only used for compiling the data that will be written. Using init_directory_entry() here makes sense anyway though, thanks for the suggestion. >> +static int read_entry(struct cache_entry **ce, char *pathname, size_t >> pathlen, >> + void *mmap, unsigned long mmap_size, >> + unsigned int first_entry_offset, >> + unsigned int foffsetblock) >> +{ >> + int len, offset_to_offset; >> + char *name; >> + uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset; >> + struct ondisk_cache_entry *disk_ce; >> + >> + beginning = ptr_add(mmap, foffsetblock); >> + end = ptr_add(mmap, foffsetblock + 4); >> + len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct >> ondisk_cache_entry) - 5; > > It took me a while to check and figure out " - 5" here means minus NUL > and the crc. A short comment would help. I think there's also another > -5 in read_directories(). Or maybe just rename len to namelen. Will add a short comment. >> +struct conflict_entry *create_new_conflict(char *name, int len, int pathlen) >> +{ >> + struct conflict_entry *conflict_entry; >> + >> + if (pathlen) >> + pathlen++; >> + conflict_entry = xmalloc(conflict_entry_size(len)); >> + conflict_entry->entries = NULL; >> + conflict_entry->nfileconflicts = 0; >> + conflict_entry->namelen = len; >> + memcpy(conflict_entry->name, name, len); >> + conflict_entry->name[len] = '\0'; >> + conflict_entry->pathlen = pathlen; >> + conflict_entry->next = NULL; > > A memset followed by memcpy and conflict_entry->pathlen = pathlen > would make this shorter and won't miss new fields added in future. Makes sense, thanks. >> +static int read_entries(struct index_state *istate, struct directory_entry >> *de, >> + unsigned int first_entry_offset, void *mmap, >> + unsigned long mmap_size, unsigned int *nr, >> + unsigned int foffsetblock) >> +{ >> + struct cache_entry *ce; >> + int i, subdir = 0; >> + >> + for (i = 0; i < de->de_nfiles; i++) { >> + unsigned int subdir_foffsetblock = de->de_foffset + >> foffsetblock + (i * 4); >> + if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, >> mmap_size, >> + first_entry_offset, subdir_foffsetblock) < 0) >> + return -1; > > You read one file entry, say abc/def... You're not quite right here. I'm reading def here, de is the root directory and de->sub[subdir] is the first sub directory, named abc/ >> + while (subdir < de->de_nsubtrees && >> + cache_name_compare(ce->name + de->de_pathlen, >> + ce_namelen(ce) - de->de_pathlen, >> +
Re: [PATCH v3 15/24] read-cache: read index-v5
On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer wrote: > +static int read_entry(struct cache_entry **ce, char *pathname, size_t > pathlen, > + void *mmap, unsigned long mmap_size, > + unsigned int first_entry_offset, > + unsigned int foffsetblock) > +{ > + int len, offset_to_offset; > + char *name; > + uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset; > + struct ondisk_cache_entry *disk_ce; > + > + beginning = ptr_add(mmap, foffsetblock); > + end = ptr_add(mmap, foffsetblock + 4); > + len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct > ondisk_cache_entry) - 5; > + entry_offset = first_entry_offset + ntoh_l(*beginning); > + name = ptr_add(mmap, entry_offset); > + disk_ce = ptr_add(mmap, entry_offset + len + 1); > + *ce = cache_entry_from_ondisk(disk_ce, pathname, name, len, pathlen); > + filecrc = ptr_add(mmap, entry_offset + len + 1 + sizeof(*disk_ce)); > + offset_to_offset = htonl(foffsetblock); > + foffsetblockcrc = crc32(0, (Bytef*)&offset_to_offset, 4); > + if (!check_crc32(foffsetblockcrc, > + ptr_add(mmap, entry_offset), len + 1 + sizeof(*disk_ce), > + ntoh_l(*filecrc))) > + return -1; > + > + return 0; > +} Last thought before book+bed time. I wonder if moving the name part to the end of the entry (i.e. chaging on disk format) would simplify this code. The new ondisk_cache_entry would be something like this struct ondisk_cache_entry { uint16_t flags; uint16_t mode; struct cache_time mtime; uint32_t size; int stat_crc; unsigned char sha1[20]; char name[FLEX_ARRAY]; }; -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 15/24] read-cache: read index-v5
General comment: a short comment before each function describing what the function does would be helpful. This only applies for complex functions (read_* ones). Of course verify_hdr does not require extra explanantion. On Mon, Aug 19, 2013 at 2:42 AM, Thomas Gummerer wrote: > +static struct directory_entry *directory_entry_from_ondisk(struct > ondisk_directory_entry *ondisk, > + const char *name, > + size_t len) > +{ > + struct directory_entry *de = xmalloc(directory_entry_size(len)); > + > + memcpy(de->pathname, name, len); > + de->pathname[len] = '\0'; > + de->de_flags = ntoh_s(ondisk->flags); > + de->de_foffset= ntoh_l(ondisk->foffset); > + de->de_cr = ntoh_l(ondisk->cr); > + de->de_ncr= ntoh_l(ondisk->ncr); > + de->de_nsubtrees = ntoh_l(ondisk->nsubtrees); > + de->de_nfiles = ntoh_l(ondisk->nfiles); > + de->de_nentries = ntoh_l(ondisk->nentries); > + de->de_pathlen= len; > + hashcpy(de->sha1, ondisk->sha1); > + return de; > +} This function leaves a lot of fields uninitialized.. > +static struct directory_entry *read_directories(unsigned int *dir_offset, > + unsigned int *dir_table_offset, > + void *mmap, > + int mmap_size) > +{ > > + de = directory_entry_from_ondisk(disk_de, name, len); > + de->next = NULL; > + de->sub = NULL; ..and two of them are set to NULL here. Maybe directory_entry_from_ondisk() could be made to call init_directory_entry() instead so that we don't need to manually reset some fields here, which leaves me wondering why other fields are not important to reset. init_directory_entry() is introduced later in "write index-v5" patch, you so may want to move it up a few patches. > +static int read_entry(struct cache_entry **ce, char *pathname, size_t > pathlen, > + void *mmap, unsigned long mmap_size, > + unsigned int first_entry_offset, > + unsigned int foffsetblock) > +{ > + int len, offset_to_offset; > + char *name; > + uint32_t foffsetblockcrc, *filecrc, *beginning, *end, entry_offset; > + struct ondisk_cache_entry *disk_ce; > + > + beginning = ptr_add(mmap, foffsetblock); > + end = ptr_add(mmap, foffsetblock + 4); > + len = ntoh_l(*end) - ntoh_l(*beginning) - sizeof(struct > ondisk_cache_entry) - 5; It took me a while to check and figure out " - 5" here means minus NUL and the crc. A short comment would help. I think there's also another -5 in read_directories(). Or maybe just rename len to namelen. > +struct conflict_entry *create_new_conflict(char *name, int len, int pathlen) > +{ > + struct conflict_entry *conflict_entry; > + > + if (pathlen) > + pathlen++; > + conflict_entry = xmalloc(conflict_entry_size(len)); > + conflict_entry->entries = NULL; > + conflict_entry->nfileconflicts = 0; > + conflict_entry->namelen = len; > + memcpy(conflict_entry->name, name, len); > + conflict_entry->name[len] = '\0'; > + conflict_entry->pathlen = pathlen; > + conflict_entry->next = NULL; A memset followed by memcpy and conflict_entry->pathlen = pathlen would make this shorter and won't miss new fields added in future. > +static int read_entries(struct index_state *istate, struct directory_entry > *de, > + unsigned int first_entry_offset, void *mmap, > + unsigned long mmap_size, unsigned int *nr, > + unsigned int foffsetblock) > +{ > + struct cache_entry *ce; > + int i, subdir = 0; > + > + for (i = 0; i < de->de_nfiles; i++) { > + unsigned int subdir_foffsetblock = de->de_foffset + > foffsetblock + (i * 4); > + if (read_entry(&ce, de->pathname, de->de_pathlen, mmap, > mmap_size, > + first_entry_offset, subdir_foffsetblock) < 0) > + return -1; You read one file entry, say abc/def... > + while (subdir < de->de_nsubtrees && > + cache_name_compare(ce->name + de->de_pathlen, > + ce_namelen(ce) - de->de_pathlen, > + de->sub[subdir]->pathname + > de->de_pathlen, > + de->sub[subdir]->de_pathlen - > de->de_pathlen) > 0) { Oh right the entry belongs the the substree "abc" so.. > + read_entries(istate, de->sub[subdir], > first_entry_offset, mmap, > +mmap_size, nr, foffsetblock); you recurse in, which will add following entries like abc/def and abc/xyz... > + subdir++; > + } > +
Re: [PATCH v3 15/24] read-cache: read index-v5
On Sun, Aug 18, 2013 at 3:42 PM, Thomas Gummerer wrote: > Make git read the index file version 5 without complaining. > > This version of the reader doesn't read neither the cache-tree > nor the resolve undo data, but doesn't choke on an index that > includes such data. The double-negatives are difficult to digest. Grammatical fixup: -->8-- This version of the reader reads neither the cache-tree nor the resolve undo data, however, it won't choke on an index that includes such data. -->8-- > Helped-by: Junio C Hamano > Helped-by: Nguyen Thai Ngoc Duy > Helped-by: Thomas Rast > Signed-off-by: Thomas Gummerer -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v3 15/24] read-cache: read index-v5
Make git read the index file version 5 without complaining. This version of the reader doesn't read neither the cache-tree nor the resolve undo data, but doesn't choke on an index that includes such data. Helped-by: Junio C Hamano Helped-by: Nguyen Thai Ngoc Duy Helped-by: Thomas Rast Signed-off-by: Thomas Gummerer --- Makefile| 1 + cache.h | 32 +++- read-cache-v5.c | 473 read-cache.h| 1 + 4 files changed, 506 insertions(+), 1 deletion(-) create mode 100644 read-cache-v5.c diff --git a/Makefile b/Makefile index afae23e..a55206d 100644 --- a/Makefile +++ b/Makefile @@ -857,6 +857,7 @@ LIB_OBJS += quote.o LIB_OBJS += reachable.o LIB_OBJS += read-cache.o LIB_OBJS += read-cache-v2.o +LIB_OBJS += read-cache-v5.o LIB_OBJS += reflog-walk.o LIB_OBJS += refs.o LIB_OBJS += remote.o diff --git a/cache.h b/cache.h index 714a334..89f556b 100644 --- a/cache.h +++ b/cache.h @@ -99,7 +99,7 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long); #define CACHE_SIGNATURE 0x44495243 /* "DIRC" */ #define INDEX_FORMAT_LB 2 -#define INDEX_FORMAT_UB 4 +#define INDEX_FORMAT_UB 5 /* * The "cache_time" is just the low 32 bits of the @@ -121,6 +121,15 @@ struct stat_data { unsigned int sd_size; }; +/* + * The *next pointer is used in read_entries_v5 for holding + * all the elements of a directory, and points to the next + * cache_entry in a directory. + * + * It is reset by the add_name_hash call in set_index_entry + * to set it to point to the next cache_entry in the + * correct in-memory format ordering. + */ struct cache_entry { struct stat_data ce_stat_data; unsigned int ce_mode; @@ -132,11 +141,17 @@ struct cache_entry { char name[FLEX_ARRAY]; /* more */ }; +#define CE_NAMEMASK (0x0fff) #define CE_STAGEMASK (0x3000) #define CE_EXTENDED (0x4000) #define CE_VALID (0x8000) +#define CE_SMUDGED (0x0400) /* index v5 only flag */ #define CE_STAGESHIFT 12 +#define CONFLICT_CONFLICTED (0x8000) +#define CONFLICT_STAGESHIFT 13 +#define CONFLICT_STAGEMASK (0x6000) + /* * Range 0x in ce_flags is divided into * two parts: in-memory flags and on-disk ones. @@ -173,6 +188,19 @@ struct cache_entry { #define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE) /* + * Representation of the extended on-disk flags in the v5 format. + * They must not collide with the ordinary on-disk flags, and need to + * fit in 16 bits. Note however that v5 does not save the name + * length. + */ +#define CE_INTENT_TO_ADD_V5 (0x4000) +#define CE_SKIP_WORKTREE_V5 (0x0800) +#define CE_INVALID_V5(0x0200) +#if (CE_VALID|CE_STAGEMASK) & (CE_INTENTTOADD_V5|CE_SKIPWORKTREE_V5|CE_INVALID_V5) +#error "v5 on-disk flags collide with ordinary on-disk flags" +#endif + +/* * Safeguard to avoid saving wrong flags: * - CE_EXTENDED2 won't get saved until its semantic is known * - Bits in 0x have been saved in ce_flags already @@ -213,6 +241,8 @@ static inline unsigned create_ce_flags(unsigned stage) #define ce_skip_worktree(ce) ((ce)->ce_flags & CE_SKIP_WORKTREE) #define ce_mark_uptodate(ce) ((ce)->ce_flags |= CE_UPTODATE) +#define conflict_stage(c) ((CONFLICT_STAGEMASK & (c)->flags) >> CONFLICT_STAGESHIFT) + #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644) static inline unsigned int create_ce_mode(unsigned int mode) { diff --git a/read-cache-v5.c b/read-cache-v5.c new file mode 100644 index 000..799b8e7 --- /dev/null +++ b/read-cache-v5.c @@ -0,0 +1,473 @@ +#include "cache.h" +#include "read-cache.h" +#include "resolve-undo.h" +#include "cache-tree.h" +#include "dir.h" +#include "pathspec.h" + +#define ptr_add(x,y) ((void *)(((char *)(x)) + (y))) + +struct cache_header_v5 { + uint32_t hdr_ndir; + uint32_t hdr_fblockoffset; + uint32_t hdr_nextension; +}; + +struct directory_entry { + struct directory_entry **sub; + struct directory_entry *next; + struct directory_entry *next_hash; + struct cache_entry *ce; + struct cache_entry *ce_last; + struct conflict_entry *conflict; + struct conflict_entry *conflict_last; + uint32_t conflict_size; + uint32_t de_foffset; + uint32_t de_cr; + uint32_t de_ncr; + uint32_t de_nsubtrees; + uint32_t de_nfiles; + uint32_t de_nentries; + unsigned char sha1[20]; + uint16_t de_flags; + uint32_t de_pathlen; + char pathname[FLEX_ARRAY]; +}; + +struct conflict_part { + struct conflict_part *next; + uint16_t flags; + uint16_t entry_mode; + unsigned char sha1[20]; +}; + +struct conflict_entry { + struct conflict_entry *next; + uint32_t nfileconflicts; + struct conflict_part *entries; + uint32_t namelen; + uint32_t pathlen; + char name[FLEX_ARRAY]; +}; + +#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) +