Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Duy Nguyen
On Mon, Jan 28, 2013 at 1:36 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Mon, Jan 28, 2013 at 12:48 PM, Duy Nguyen  wrote:
>>> Regardless the submodule odb issue, I think we should prefer
>>> reading local loose objects over alternate packed ones.
>>
>> I think I went from one problem to another and did not make it clear.
>> The reason behind this preference is security.
>
> Reading from ours first would not help you at all if you are lacking
> some that you do need from your local repository and the only
> solution is not to borrow from untrustworthy sources, I think.
>
> You borrow only from a trusted source in the first place, no?

Hmm.. yeah.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Junio C Hamano
Duy Nguyen  writes:

> On Mon, Jan 28, 2013 at 12:48 PM, Duy Nguyen  wrote:
>> Regardless the submodule odb issue, I think we should prefer
>> reading local loose objects over alternate packed ones.
>
> I think I went from one problem to another and did not make it clear.
> The reason behind this preference is security.

Reading from ours first would not help you at all if you are lacking
some that you do need from your local repository and the only
solution is not to borrow from untrustworthy sources, I think.

You borrow only from a trusted source in the first place, no?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Duy Nguyen
On Mon, Jan 28, 2013 at 12:48 PM, Duy Nguyen  wrote:
> Regardless the submodule odb issue, I think we should prefer
> reading local loose objects over alternate packed ones.

I think I went from one problem to another and did not make it clear.
The reason behind this preference is security. With "all packs first"
reading order, someone can create a pack in the alternate source and
that pack will override the same local loose objects (local packed
ones are safe). If someone can create a malicious version with the
same SHA-1, we (well, the user) are in trouble. If the user uses this
repository directly then the malicious object can be used without
detected, even if it does not match the original SHA-1, as we do not
always verify content against its SHA-1 for common commands.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Junio C Hamano
Duy Nguyen  writes:

>>> Another thing needs to be done for this to work. The current reading
>>
>> For *what* to work???
>
> The "forbid making our repository depend on objects we do not have but
> we know about afterwe peek submodule odb"

With your "when our object database is contaminated, check objects
we base our new object on are available in local or our alternates"
together with the "when we try write_object(), do not bypass it with
has_sha1_file() check because that may find the object in submodule odb
we should *not* have access to; instead check with the same 'local
or our alternates' test" I brought up in the message you were
responding to, I do not think object read order does not make a
difference to our effort to prevent the object database breakage due
to temporary contamination by submodule objects.

>>> Regardless the submodule odb issue, I think we should prefer
>>> reading local loose objects over alternate packed ones.

I suspect you are alluding to make write_object() check with
has_sha1_file_locally() so that we can wean our repository from
existing alternates, but I do not think it is a right approach
(instead, you just fully repack locally if you want to dissociate
yourself from your alternates).  What I was suggesting was to change
it to check with has_sha1_file_proper(), to make sure we do not omit
writing an object we need to access to, when we know it will vanish
once we stop temporarily borrowing from the submodule object store.

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Duy Nguyen
On Mon, Jan 28, 2013 at 12:54 PM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> On Sat, Jan 26, 2013 at 2:00 AM, Junio C Hamano  wrote:
>>> This is not a tangent, but if you want to go this "forbid making our
>>> repository depend on objects we do not have but we know about after
>>> we peek submodule odb" route [*1*], write_sha1_file() needs to be
>>> told about has_sha1_file_proper().  We may "git add" a new blob in
>>> the superproject, the blob may not yet exist in *our* repository,
>>> but may happen to already exist in the submodue odb.  In such a
>>> case, write_sha1_file() has to write that blob in our repository,
>>> without the existing has_sha1_file() check bypassing it.  Otherwise
>>> our attempt to create a tree that contains that blob will fail,
>>> saying that the blob only seems to exist to us via submodule odb but
>>> not in our repository.
>>
>> Another thing needs to be done for this to work. The current reading
>
> For *what* to work???

The "forbid making our repository depend on objects we do not have but
we know about afterwe peek submodule odb"

> I think the performance consideration is the
> only thing that should drive the read-order; correctness should not
> be affected.
>
>> order is packs first, loose objects next. If we create a local loose
>> duplicate of an alternate packed object, our local version will never
>> be read. Regardless the submodule odb issue, I think we should prefer
>> reading local loose objects over alternate packed ones.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Junio C Hamano
Duy Nguyen  writes:

> On Sat, Jan 26, 2013 at 2:00 AM, Junio C Hamano  wrote:
>> This is not a tangent, but if you want to go this "forbid making our
>> repository depend on objects we do not have but we know about after
>> we peek submodule odb" route [*1*], write_sha1_file() needs to be
>> told about has_sha1_file_proper().  We may "git add" a new blob in
>> the superproject, the blob may not yet exist in *our* repository,
>> but may happen to already exist in the submodue odb.  In such a
>> case, write_sha1_file() has to write that blob in our repository,
>> without the existing has_sha1_file() check bypassing it.  Otherwise
>> our attempt to create a tree that contains that blob will fail,
>> saying that the blob only seems to exist to us via submodule odb but
>> not in our repository.
>
> Another thing needs to be done for this to work. The current reading

For *what* to work???  I think the performance consideration is the
only thing that should drive the read-order; correctness should not
be affected.

> order is packs first, loose objects next. If we create a local loose
> duplicate of an alternate packed object, our local version will never
> be read. Regardless the submodule odb issue, I think we should prefer
> reading local loose objects over alternate packed ones.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-27 Thread Duy Nguyen
On Sat, Jan 26, 2013 at 2:00 AM, Junio C Hamano  wrote:
> This is not a tangent, but if you want to go this "forbid making our
> repository depend on objects we do not have but we know about after
> we peek submodule odb" route [*1*], write_sha1_file() needs to be
> told about has_sha1_file_proper().  We may "git add" a new blob in
> the superproject, the blob may not yet exist in *our* repository,
> but may happen to already exist in the submodue odb.  In such a
> case, write_sha1_file() has to write that blob in our repository,
> without the existing has_sha1_file() check bypassing it.  Otherwise
> our attempt to create a tree that contains that blob will fail,
> saying that the blob only seems to exist to us via submodule odb but
> not in our repository.

Another thing needs to be done for this to work. The current reading
order is packs first, loose objects next. If we create a local loose
duplicate of an alternate packed object, our local version will never
be read. Regardless the submodule odb issue, I think we should prefer
reading local loose objects over alternate packed ones.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-25 Thread Junio C Hamano
Duy Nguyen  writes:

>  Even
> when cache-tree is not involved, I do not want the index to point to
> an non-existing SHA-1 ("git diff --cached" may fail next time, for
> example).

I think we have tests that explicitly add SHA-1 that names an object
that does not exist to the index and check failures; you may have to
think what to do with them.

>> This is a tangent, but in addition, you may want to add an even
>> narrower variant that checks the same but ignoring _all_ alternate
>> object databases, "external" or not:
>>
>> int has_sha1_file_local(const unsigned char *sha1);
>>
>> That may be useful if we want to later make the alternate safer to
>> use; instead of letting the codepath to create an object ask
>> has_sha1_file() to see if it already exists and refrain from writing
>> a new copy, we switch to ask has_sha1_file_locally() and even if an
>> alternate object database we borrow from has it, we keep our own
>> copy in our repository.

This is not a tangent, but if you want to go this "forbid making our
repository depend on objects we do not have but we know about after
we peek submodule odb" route [*1*], write_sha1_file() needs to be
told about has_sha1_file_proper().  We may "git add" a new blob in
the superproject, the blob may not yet exist in *our* repository,
but may happen to already exist in the submodue odb.  In such a
case, write_sha1_file() has to write that blob in our repository,
without the existing has_sha1_file() check bypassing it.  Otherwise
our attempt to create a tree that contains that blob will fail,
saying that the blob only seems to exist to us via submodule odb but
not in our repository.


[Footnote]

*1* which I do not necessarily agree with---I am in favor of getting
rid of add_submodule_odb() to fix these issues at the root cause of
them.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-24 Thread Duy Nguyen
On Fri, Jan 25, 2013 at 2:15 AM, Junio C Hamano  wrote:
> Nguyễn Thái Ngọc Duy   writes:
>
>> Signed-off-by: Nguyễn Thái Ngọc Duy 
>> ---
>>  read-cache.c | 20 
>>  1 file changed, 20 insertions(+)
>>
>> diff --git a/read-cache.c b/read-cache.c
>> index fda78bc..4579215 100644
>> --- a/read-cache.c
>> +++ b/read-cache.c
>> @@ -1720,6 +1720,26 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, 
>> struct cache_entry *ce,
>> ce->name + common, ce_namelen(ce) - common);
>>   }
>>
>> + if (object_database_contaminated) {
>> + struct object_info oi;
>> + switch (ce->ce_mode) {
>> + case S_IFGITLINK:
>> + break;
>> + case S_IFDIR:
>> + if (sha1_object_info_extended(ce->sha1, &oi) != 
>> OBJ_TREE ||
>
> This case should never happen.  Do we store any tree object in the
> index in the first place?

No it was copy/paste mistake (from cache-tree.c, before I deleted it
and added check_sha1_file_for_external_source in 3/7)

>> + (oi.alt && oi.alt->external))
>> + die("cannot create index referring to an 
>> external tree %s",
>> + sha1_to_hex(ce->sha1));
>> + break;
>> + default:
>> + if (sha1_object_info_extended(ce->sha1, &oi) != 
>> OBJ_BLOB ||
>> + (oi.alt && oi.alt->external))
>> + die("cannot create index referring to an 
>> external blob %s",
>> + sha1_to_hex(ce->sha1));
>> + break;
>> + }
>> + }
>> +
>>   result = ce_write(c, fd, ondisk, size);
>>   free(ondisk);
>>   return result;
>
> I think the check you want to add is to the cache-tree code; before
> writing the new index out, if you suspect you might have called
> cache_tree_update() before, invalidate any hierarchy that is
> recorded to produce a tree object that refers to an object that is
> only available through external object store.

cache-tree is covered by check_sha1_file_for_external_source() when it
actually writes a tree (dry-run mode goes through unchecked). Even
when cache-tree is not involved, I do not want the index to point to
an non-existing SHA-1 ("git diff --cached" may fail next time, for
example).

> As to the logic to check if a object is something that we should
> refuse to create a new dependent, I think you should not butcher
> sha1_object_info_extended(), and instead add a new call that checks
> if a given SHA-1 yields an object if you ignore alternate object
> databases that are marked as "external", whose signature may
> resemble:
>
> int has_sha1_file_proper(const unsigned char *sha1);
>
> or something.

Agreed.

> This is a tangent, but in addition, you may want to add an even
> narrower variant that checks the same but ignoring _all_ alternate
> object databases, "external" or not:
>
> int has_sha1_file_local(const unsigned char *sha1);
>
> That may be useful if we want to later make the alternate safer to
> use; instead of letting the codepath to create an object ask
> has_sha1_file() to see if it already exists and refrain from writing
> a new copy, we switch to ask has_sha1_file_locally() and even if an
> alternate object database we borrow from has it, we keep our own
> copy in our repository.
>
> Not tested beyond syntax, but rebasing something like this patch on
> top of your "mark external sources" change may take us closer to
> that.

Thanks, will incorporate in the reroll.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-24 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  read-cache.c | 20 
>  1 file changed, 20 insertions(+)
>
> diff --git a/read-cache.c b/read-cache.c
> index fda78bc..4579215 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1720,6 +1720,26 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, 
> struct cache_entry *ce,
> ce->name + common, ce_namelen(ce) - common);
>   }
>  
> + if (object_database_contaminated) {
> + struct object_info oi;
> + switch (ce->ce_mode) {
> + case S_IFGITLINK:
> + break;
> + case S_IFDIR:
> + if (sha1_object_info_extended(ce->sha1, &oi) != 
> OBJ_TREE ||

This case should never happen.  Do we store any tree object in the
index in the first place?

> + (oi.alt && oi.alt->external))
> + die("cannot create index referring to an 
> external tree %s",
> + sha1_to_hex(ce->sha1));
> + break;
> + default:
> + if (sha1_object_info_extended(ce->sha1, &oi) != 
> OBJ_BLOB ||
> + (oi.alt && oi.alt->external))
> + die("cannot create index referring to an 
> external blob %s",
> + sha1_to_hex(ce->sha1));
> + break;
> + }
> + }
> +
>   result = ce_write(c, fd, ondisk, size);
>   free(ondisk);
>   return result;

I think the check you want to add is to the cache-tree code; before
writing the new index out, if you suspect you might have called
cache_tree_update() before, invalidate any hierarchy that is
recorded to produce a tree object that refers to an object that is
only available through external object store.

As to the logic to check if a object is something that we should
refuse to create a new dependent, I think you should not butcher
sha1_object_info_extended(), and instead add a new call that checks
if a given SHA-1 yields an object if you ignore alternate object
databases that are marked as "external", whose signature may
resemble:

int has_sha1_file_proper(const unsigned char *sha1);

or something.

This is a tangent, but in addition, you may want to add an even
narrower variant that checks the same but ignoring _all_ alternate
object databases, "external" or not:

int has_sha1_file_local(const unsigned char *sha1);

That may be useful if we want to later make the alternate safer to
use; instead of letting the codepath to create an object ask
has_sha1_file() to see if it already exists and refrain from writing
a new copy, we switch to ask has_sha1_file_locally() and even if an
alternate object database we borrow from has it, we keep our own
copy in our repository.

Not tested beyond syntax, but rebasing something like this patch on
top of your "mark external sources" change may take us closer to
that.

 cache.h |  2 ++
 sha1_file.c | 60 +++-
 2 files changed, 57 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index 1f96f65..8d651b6 100644
--- a/cache.h
+++ b/cache.h
@@ -766,6 +766,8 @@ extern int move_temp_to_file(const char *tmpfile, const 
char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
 extern int has_sha1_file(const unsigned char *sha1);
+extern int has_sha1_file_proper(const unsigned char *sha1);
+extern int has_sha1_file_local(const unsigned char *sha1);
 extern int has_loose_object_nonlocal(const unsigned char *sha1);
 
 extern int has_pack_index(const unsigned char *sha1);
diff --git a/sha1_file.c b/sha1_file.c
index 40b2329..1a3192a 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -420,11 +420,18 @@ static int has_loose_object_local(const unsigned char 
*sha1)
return !access(name, F_OK);
 }
 
-int has_loose_object_nonlocal(const unsigned char *sha1)
+enum odb_origin {
+   odb_default = 0, odb_proper, odb_local
+};
+
+static int has_loose_object_nonlocal_limited(const unsigned char *sha1,
+enum odb_origin origin)
 {
struct alternate_object_database *alt;
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt->next) {
+   if (origin == odb_proper && 0 /* alt->external */)
+   continue;
fill_sha1_path(alt->name, sha1);
if (!access(alt->base, F_OK))
return 1;
@@ -432,6 +439,11 @@ int has_loose_object_nonlocal(const unsigned char *sha1)
return 0;
 }
 
+int has_loose_object_nonlocal(const unsigned char *sha1)
+{
+   return has_loose_object_nonlocal_limited(sha1, odb_default);
+}
+
 static int has_loose_object(const unsigned char *sha1)
 {
return has_loose_object_local(sha1) ||
@@ -2062,12 +2074,28 @@ int is_pack_valid(struct packed_git 

[PATCH 6/7] read-cache: refuse to create index referring to external objects

2013-01-24 Thread Nguyễn Thái Ngọc Duy

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 read-cache.c | 20 
 1 file changed, 20 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index fda78bc..4579215 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1720,6 +1720,26 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct 
cache_entry *ce,
  ce->name + common, ce_namelen(ce) - common);
}
 
+   if (object_database_contaminated) {
+   struct object_info oi;
+   switch (ce->ce_mode) {
+   case S_IFGITLINK:
+   break;
+   case S_IFDIR:
+   if (sha1_object_info_extended(ce->sha1, &oi) != 
OBJ_TREE ||
+   (oi.alt && oi.alt->external))
+   die("cannot create index referring to an 
external tree %s",
+   sha1_to_hex(ce->sha1));
+   break;
+   default:
+   if (sha1_object_info_extended(ce->sha1, &oi) != 
OBJ_BLOB ||
+   (oi.alt && oi.alt->external))
+   die("cannot create index referring to an 
external blob %s",
+   sha1_to_hex(ce->sha1));
+   break;
+   }
+   }
+
result = ce_write(c, fd, ondisk, size);
free(ondisk);
return result;
-- 
1.8.0.rc3.18.g0d9b108

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html