Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
On Mon, Jan 28, 2013 at 1:36 PM, Junio C Hamano wrote: > Duy Nguyen writes: > >> On Mon, Jan 28, 2013 at 12:48 PM, Duy Nguyen wrote: >>> Regardless the submodule odb issue, I think we should prefer >>> reading local loose objects over alternate packed ones. >> >> I think I went from one problem to another and did not make it clear. >> The reason behind this preference is security. > > Reading from ours first would not help you at all if you are lacking > some that you do need from your local repository and the only > solution is not to borrow from untrustworthy sources, I think. > > You borrow only from a trusted source in the first place, no? Hmm.. yeah. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
Duy Nguyen writes: > On Mon, Jan 28, 2013 at 12:48 PM, Duy Nguyen wrote: >> Regardless the submodule odb issue, I think we should prefer >> reading local loose objects over alternate packed ones. > > I think I went from one problem to another and did not make it clear. > The reason behind this preference is security. Reading from ours first would not help you at all if you are lacking some that you do need from your local repository and the only solution is not to borrow from untrustworthy sources, I think. You borrow only from a trusted source in the first place, no? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
On Mon, Jan 28, 2013 at 12:48 PM, Duy Nguyen wrote: > Regardless the submodule odb issue, I think we should prefer > reading local loose objects over alternate packed ones. I think I went from one problem to another and did not make it clear. The reason behind this preference is security. With "all packs first" reading order, someone can create a pack in the alternate source and that pack will override the same local loose objects (local packed ones are safe). If someone can create a malicious version with the same SHA-1, we (well, the user) are in trouble. If the user uses this repository directly then the malicious object can be used without detected, even if it does not match the original SHA-1, as we do not always verify content against its SHA-1 for common commands. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
Duy Nguyen writes: >>> Another thing needs to be done for this to work. The current reading >> >> For *what* to work??? > > The "forbid making our repository depend on objects we do not have but > we know about afterwe peek submodule odb" With your "when our object database is contaminated, check objects we base our new object on are available in local or our alternates" together with the "when we try write_object(), do not bypass it with has_sha1_file() check because that may find the object in submodule odb we should *not* have access to; instead check with the same 'local or our alternates' test" I brought up in the message you were responding to, I do not think object read order does not make a difference to our effort to prevent the object database breakage due to temporary contamination by submodule objects. >>> Regardless the submodule odb issue, I think we should prefer >>> reading local loose objects over alternate packed ones. I suspect you are alluding to make write_object() check with has_sha1_file_locally() so that we can wean our repository from existing alternates, but I do not think it is a right approach (instead, you just fully repack locally if you want to dissociate yourself from your alternates). What I was suggesting was to change it to check with has_sha1_file_proper(), to make sure we do not omit writing an object we need to access to, when we know it will vanish once we stop temporarily borrowing from the submodule object store. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
On Mon, Jan 28, 2013 at 12:54 PM, Junio C Hamano wrote: > Duy Nguyen writes: > >> On Sat, Jan 26, 2013 at 2:00 AM, Junio C Hamano wrote: >>> This is not a tangent, but if you want to go this "forbid making our >>> repository depend on objects we do not have but we know about after >>> we peek submodule odb" route [*1*], write_sha1_file() needs to be >>> told about has_sha1_file_proper(). We may "git add" a new blob in >>> the superproject, the blob may not yet exist in *our* repository, >>> but may happen to already exist in the submodue odb. In such a >>> case, write_sha1_file() has to write that blob in our repository, >>> without the existing has_sha1_file() check bypassing it. Otherwise >>> our attempt to create a tree that contains that blob will fail, >>> saying that the blob only seems to exist to us via submodule odb but >>> not in our repository. >> >> Another thing needs to be done for this to work. The current reading > > For *what* to work??? The "forbid making our repository depend on objects we do not have but we know about afterwe peek submodule odb" > I think the performance consideration is the > only thing that should drive the read-order; correctness should not > be affected. > >> order is packs first, loose objects next. If we create a local loose >> duplicate of an alternate packed object, our local version will never >> be read. Regardless the submodule odb issue, I think we should prefer >> reading local loose objects over alternate packed ones. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
Duy Nguyen writes: > On Sat, Jan 26, 2013 at 2:00 AM, Junio C Hamano wrote: >> This is not a tangent, but if you want to go this "forbid making our >> repository depend on objects we do not have but we know about after >> we peek submodule odb" route [*1*], write_sha1_file() needs to be >> told about has_sha1_file_proper(). We may "git add" a new blob in >> the superproject, the blob may not yet exist in *our* repository, >> but may happen to already exist in the submodue odb. In such a >> case, write_sha1_file() has to write that blob in our repository, >> without the existing has_sha1_file() check bypassing it. Otherwise >> our attempt to create a tree that contains that blob will fail, >> saying that the blob only seems to exist to us via submodule odb but >> not in our repository. > > Another thing needs to be done for this to work. The current reading For *what* to work??? I think the performance consideration is the only thing that should drive the read-order; correctness should not be affected. > order is packs first, loose objects next. If we create a local loose > duplicate of an alternate packed object, our local version will never > be read. Regardless the submodule odb issue, I think we should prefer > reading local loose objects over alternate packed ones. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
On Sat, Jan 26, 2013 at 2:00 AM, Junio C Hamano wrote: > This is not a tangent, but if you want to go this "forbid making our > repository depend on objects we do not have but we know about after > we peek submodule odb" route [*1*], write_sha1_file() needs to be > told about has_sha1_file_proper(). We may "git add" a new blob in > the superproject, the blob may not yet exist in *our* repository, > but may happen to already exist in the submodue odb. In such a > case, write_sha1_file() has to write that blob in our repository, > without the existing has_sha1_file() check bypassing it. Otherwise > our attempt to create a tree that contains that blob will fail, > saying that the blob only seems to exist to us via submodule odb but > not in our repository. Another thing needs to be done for this to work. The current reading order is packs first, loose objects next. If we create a local loose duplicate of an alternate packed object, our local version will never be read. Regardless the submodule odb issue, I think we should prefer reading local loose objects over alternate packed ones. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
Duy Nguyen writes: > Even > when cache-tree is not involved, I do not want the index to point to > an non-existing SHA-1 ("git diff --cached" may fail next time, for > example). I think we have tests that explicitly add SHA-1 that names an object that does not exist to the index and check failures; you may have to think what to do with them. >> This is a tangent, but in addition, you may want to add an even >> narrower variant that checks the same but ignoring _all_ alternate >> object databases, "external" or not: >> >> int has_sha1_file_local(const unsigned char *sha1); >> >> That may be useful if we want to later make the alternate safer to >> use; instead of letting the codepath to create an object ask >> has_sha1_file() to see if it already exists and refrain from writing >> a new copy, we switch to ask has_sha1_file_locally() and even if an >> alternate object database we borrow from has it, we keep our own >> copy in our repository. This is not a tangent, but if you want to go this "forbid making our repository depend on objects we do not have but we know about after we peek submodule odb" route [*1*], write_sha1_file() needs to be told about has_sha1_file_proper(). We may "git add" a new blob in the superproject, the blob may not yet exist in *our* repository, but may happen to already exist in the submodue odb. In such a case, write_sha1_file() has to write that blob in our repository, without the existing has_sha1_file() check bypassing it. Otherwise our attempt to create a tree that contains that blob will fail, saying that the blob only seems to exist to us via submodule odb but not in our repository. [Footnote] *1* which I do not necessarily agree with---I am in favor of getting rid of add_submodule_odb() to fix these issues at the root cause of them. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
On Fri, Jan 25, 2013 at 2:15 AM, Junio C Hamano wrote: > Nguyễn Thái Ngọc Duy writes: > >> Signed-off-by: Nguyễn Thái Ngọc Duy >> --- >> read-cache.c | 20 >> 1 file changed, 20 insertions(+) >> >> diff --git a/read-cache.c b/read-cache.c >> index fda78bc..4579215 100644 >> --- a/read-cache.c >> +++ b/read-cache.c >> @@ -1720,6 +1720,26 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, >> struct cache_entry *ce, >> ce->name + common, ce_namelen(ce) - common); >> } >> >> + if (object_database_contaminated) { >> + struct object_info oi; >> + switch (ce->ce_mode) { >> + case S_IFGITLINK: >> + break; >> + case S_IFDIR: >> + if (sha1_object_info_extended(ce->sha1, &oi) != >> OBJ_TREE || > > This case should never happen. Do we store any tree object in the > index in the first place? No it was copy/paste mistake (from cache-tree.c, before I deleted it and added check_sha1_file_for_external_source in 3/7) >> + (oi.alt && oi.alt->external)) >> + die("cannot create index referring to an >> external tree %s", >> + sha1_to_hex(ce->sha1)); >> + break; >> + default: >> + if (sha1_object_info_extended(ce->sha1, &oi) != >> OBJ_BLOB || >> + (oi.alt && oi.alt->external)) >> + die("cannot create index referring to an >> external blob %s", >> + sha1_to_hex(ce->sha1)); >> + break; >> + } >> + } >> + >> result = ce_write(c, fd, ondisk, size); >> free(ondisk); >> return result; > > I think the check you want to add is to the cache-tree code; before > writing the new index out, if you suspect you might have called > cache_tree_update() before, invalidate any hierarchy that is > recorded to produce a tree object that refers to an object that is > only available through external object store. cache-tree is covered by check_sha1_file_for_external_source() when it actually writes a tree (dry-run mode goes through unchecked). Even when cache-tree is not involved, I do not want the index to point to an non-existing SHA-1 ("git diff --cached" may fail next time, for example). > As to the logic to check if a object is something that we should > refuse to create a new dependent, I think you should not butcher > sha1_object_info_extended(), and instead add a new call that checks > if a given SHA-1 yields an object if you ignore alternate object > databases that are marked as "external", whose signature may > resemble: > > int has_sha1_file_proper(const unsigned char *sha1); > > or something. Agreed. > This is a tangent, but in addition, you may want to add an even > narrower variant that checks the same but ignoring _all_ alternate > object databases, "external" or not: > > int has_sha1_file_local(const unsigned char *sha1); > > That may be useful if we want to later make the alternate safer to > use; instead of letting the codepath to create an object ask > has_sha1_file() to see if it already exists and refrain from writing > a new copy, we switch to ask has_sha1_file_locally() and even if an > alternate object database we borrow from has it, we keep our own > copy in our repository. > > Not tested beyond syntax, but rebasing something like this patch on > top of your "mark external sources" change may take us closer to > that. Thanks, will incorporate in the reroll. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects
Nguyễn Thái Ngọc Duy writes: > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > read-cache.c | 20 > 1 file changed, 20 insertions(+) > > diff --git a/read-cache.c b/read-cache.c > index fda78bc..4579215 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1720,6 +1720,26 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, > struct cache_entry *ce, > ce->name + common, ce_namelen(ce) - common); > } > > + if (object_database_contaminated) { > + struct object_info oi; > + switch (ce->ce_mode) { > + case S_IFGITLINK: > + break; > + case S_IFDIR: > + if (sha1_object_info_extended(ce->sha1, &oi) != > OBJ_TREE || This case should never happen. Do we store any tree object in the index in the first place? > + (oi.alt && oi.alt->external)) > + die("cannot create index referring to an > external tree %s", > + sha1_to_hex(ce->sha1)); > + break; > + default: > + if (sha1_object_info_extended(ce->sha1, &oi) != > OBJ_BLOB || > + (oi.alt && oi.alt->external)) > + die("cannot create index referring to an > external blob %s", > + sha1_to_hex(ce->sha1)); > + break; > + } > + } > + > result = ce_write(c, fd, ondisk, size); > free(ondisk); > return result; I think the check you want to add is to the cache-tree code; before writing the new index out, if you suspect you might have called cache_tree_update() before, invalidate any hierarchy that is recorded to produce a tree object that refers to an object that is only available through external object store. As to the logic to check if a object is something that we should refuse to create a new dependent, I think you should not butcher sha1_object_info_extended(), and instead add a new call that checks if a given SHA-1 yields an object if you ignore alternate object databases that are marked as "external", whose signature may resemble: int has_sha1_file_proper(const unsigned char *sha1); or something. This is a tangent, but in addition, you may want to add an even narrower variant that checks the same but ignoring _all_ alternate object databases, "external" or not: int has_sha1_file_local(const unsigned char *sha1); That may be useful if we want to later make the alternate safer to use; instead of letting the codepath to create an object ask has_sha1_file() to see if it already exists and refrain from writing a new copy, we switch to ask has_sha1_file_locally() and even if an alternate object database we borrow from has it, we keep our own copy in our repository. Not tested beyond syntax, but rebasing something like this patch on top of your "mark external sources" change may take us closer to that. cache.h | 2 ++ sha1_file.c | 60 +++- 2 files changed, 57 insertions(+), 5 deletions(-) diff --git a/cache.h b/cache.h index 1f96f65..8d651b6 100644 --- a/cache.h +++ b/cache.h @@ -766,6 +766,8 @@ extern int move_temp_to_file(const char *tmpfile, const char *filename); extern int has_sha1_pack(const unsigned char *sha1); extern int has_sha1_file(const unsigned char *sha1); +extern int has_sha1_file_proper(const unsigned char *sha1); +extern int has_sha1_file_local(const unsigned char *sha1); extern int has_loose_object_nonlocal(const unsigned char *sha1); extern int has_pack_index(const unsigned char *sha1); diff --git a/sha1_file.c b/sha1_file.c index 40b2329..1a3192a 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -420,11 +420,18 @@ static int has_loose_object_local(const unsigned char *sha1) return !access(name, F_OK); } -int has_loose_object_nonlocal(const unsigned char *sha1) +enum odb_origin { + odb_default = 0, odb_proper, odb_local +}; + +static int has_loose_object_nonlocal_limited(const unsigned char *sha1, +enum odb_origin origin) { struct alternate_object_database *alt; prepare_alt_odb(); for (alt = alt_odb_list; alt; alt = alt->next) { + if (origin == odb_proper && 0 /* alt->external */) + continue; fill_sha1_path(alt->name, sha1); if (!access(alt->base, F_OK)) return 1; @@ -432,6 +439,11 @@ int has_loose_object_nonlocal(const unsigned char *sha1) return 0; } +int has_loose_object_nonlocal(const unsigned char *sha1) +{ + return has_loose_object_nonlocal_limited(sha1, odb_default); +} + static int has_loose_object(const unsigned char *sha1) { return has_loose_object_local(sha1) || @@ -2062,12 +2074,28 @@ int is_pack_valid(struct packed_git
[PATCH 6/7] read-cache: refuse to create index referring to external objects
Signed-off-by: Nguyễn Thái Ngọc Duy --- read-cache.c | 20 1 file changed, 20 insertions(+) diff --git a/read-cache.c b/read-cache.c index fda78bc..4579215 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1720,6 +1720,26 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct cache_entry *ce, ce->name + common, ce_namelen(ce) - common); } + if (object_database_contaminated) { + struct object_info oi; + switch (ce->ce_mode) { + case S_IFGITLINK: + break; + case S_IFDIR: + if (sha1_object_info_extended(ce->sha1, &oi) != OBJ_TREE || + (oi.alt && oi.alt->external)) + die("cannot create index referring to an external tree %s", + sha1_to_hex(ce->sha1)); + break; + default: + if (sha1_object_info_extended(ce->sha1, &oi) != OBJ_BLOB || + (oi.alt && oi.alt->external)) + die("cannot create index referring to an external blob %s", + sha1_to_hex(ce->sha1)); + break; + } + } + result = ce_write(c, fd, ondisk, size); free(ondisk); return result; -- 1.8.0.rc3.18.g0d9b108 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html