Re: [PATCH] Allow hideRefs to match refs outside the namespace

2015-11-01 Thread Lukas Fleischer
On Sun, 01 Nov 2015 at 00:40:39, Lukas Fleischer wrote:
> On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote:
> > [...]
> > You earlier (re)discovered a good approach to introduce a new
> > feature without breaking settings of existing users when we
> > discussed a "whitelist".  Since setting the configuration to an
> > empty string did not do anything in the old code, an empty string
> > was an invalid and non-working setting.  By taking advantage of that
> > fact, you safely can say "if you start with an empty that would
> > match everything, we'll treat all the others differently from the
> > way we did before" if you wanted to.  I think you can follow the
> > same principle here.  For example, I can imagine that the rule for
> > the "ref-is-hidden" can be updated to:
> > 
> >  * it now takes refname and also the fullname before stripping the
> >namespace;
> > 
> >  * hide patterns that is prefixed with '!' means negative, just as
> >before;
> > 
> >  * (after possibly '!' is stripped), hide patterns that is prefixed
> >with '^', which was invalid before, means check the fullname with
> >namespace prefix, which is a new rule;
> > 
> >  * otherwise, check the refname after stripping the namespace.
> > 
> > Such an update would allow a new feature "we now allow you to write
> > a pattern that determines the match before stripping the namespace
> > prefix" without breaking the existing repositories, no?
> > 
> 
> Yes. If I understood you correctly, this is exactly what I suggested in
> the last paragraph of my previous email (the only difference being that
> I suggested to use "/" as full name indicator instead of "^" but that is
> just an implementation detail). I will look into implementing this if
> that is the way we want to go.
> [...]

There are two more things I noticed.

Firstly, while looking for other callers of ref_is_hidden(), I realized
that send_ref() in upload-pack.c contains these lines of code:

const char *refname_nons = strip_namespace(refname);
struct object_id peeled;

if (mark_our_ref(refname, oid)) 
return 0;   

where mark_our_ref() performs the ref_is_hidden() check on its first
parameter. So, in contrast to receive-pack, we already match the
original full reference (and not the stripped one) against the hideRefs
pattern there. In particular, when using transfer.hideRefs, the same
pattern does different things when receiving and uploading.

Now, this cannot be intended behavior and I do not think this is
something we want to retain when improving that feature. My suggestion
is:

1. Define the (current) semantics of hideRefs pattern. It either needs
   to be defined to match full references or stripped references. Both
   definitions are equivalent when Git namespaces are not used.
   
   It probably makes sense to define hideRefs patterns to match stripped
   references. If anybody relied on the upload-pack behavior of patterns
   matching full references, it may happen that more refs are hidden
   when that behavior is adjusted to match the new hideRefs semantics.
   The administrator would become aware of that change soon if it
   affects anything (i.e. hides things that should not be hidden). But I
   am pretty sure that this behavior isn't currently being relied on
   either way. Both Git namespaces and hideRefs aren't very popular
   features and anybody using that combination would probably have
   noticed the inconsistency and reported a bug earlier.

2. Improve the documentation and describe the hideRefs semantics better.
   Include details on the choice we made in (1).

3. Fix the send_ref() code in either receive-pack or upload-pack,
   depending on which is buggy according to our new definition.

4. Improve hideRefs patterns and allow to match both full references and
   stripped references by using a special indicator as suggested
   earlier.

5. Add a note on the change in behavior to the release notes of the
   release that "breaks backwards compatibility". Putting it in quotes
   because I actually think that we are fixing a bug rather than
   breaking compatibility. But since there was no documentation on the
   correct behavior, the former implementation was, technically, the
   only specification of "correct" behavior that existed at that
   point...

The second thing I noticed is that having syntax for allowing matches
against both full references and stripped references is extremely handy
and desirable, even if we would not have to introduce it for backwards
compatibility. For example, using the syntax Junio described earlier, my
initial use case could be solved by

receive.hideRefs=^refs/
receive.hideRefs=!refs/

which means "Hide all references but do not hide references from 

Re: [PATCH] Allow hideRefs to match refs outside the namespace

2015-11-01 Thread Junio C Hamano
Lukas Fleischer  writes:

> Now, this cannot be intended behavior and I do not think this is
> something we want to retain when improving that feature.

Yup, that makes me suspect that namespace support with hiderefs was
done without giving much thought even stronger than before, and the
fact that nobody has brought it up so far suggests it would be much
smaller deal than usual if a fix brings in incompatibilities to
those who use namespaces.

> 1. Define the (current) semantics of hideRefs pattern. It either needs
>to be defined to match full references or stripped references. Both
>definitions are equivalent when Git namespaces are not used.
>
>It probably makes sense to define hideRefs patterns to match stripped
>references.

OK.

> 2. Improve the documentation and describe the hideRefs semantics better.
> 3. Fix the send_ref() code in either receive-pack or upload-pack,
> 4. Improve hideRefs patterns and allow to match both full references and
> 5. Add a note on the change in behavior to the release notes of the

All OK.

> The second thing I noticed is that having syntax for allowing matches
> against both full references and stripped references is extremely handy
> and desirable, even if we would not have to introduce it for backwards
> compatibility. For example, using the syntax Junio described earlier, my
> initial use case could be solved by
>
> receive.hideRefs=^refs/
> receive.hideRefs=!refs/
>
> which means "Hide all references but do not hide references from the
> current namespace." Here, I am assuming that patterns for stripped refs
> never match anything outside the current namespace because those
> patterns become NULL after stripping.

I would instead assume that the presence of ^ (or !^) in front would
signal "do not strip before checking".  !refs/ would mean "after
stripping, does it begin with refs/?  If so then do not hide it".

But that does not change the conclusion.  With ^refs/ that says
"hide everything that matches refs/ before stripping" (i.e. do not
include anything from anywhere), that is overriden by !refs/ that
says "but do not hide anything that matches refs/ after stripping"
(do include everything from my namespace), I'd think that you'd get
your desired behaviour.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow hideRefs to match refs outside the namespace

2015-10-31 Thread Lukas Fleischer
I wrote this email on Thursday but it seems like it did not make it
through the mailing list. Resubmitting...

On Wed, 28 Oct 2015 at 17:21:59, Junio C Hamano wrote:
> Lukas Fleischer  writes:
> 
> > Right now, refs with a path outside the current namespace are replaced
> > by ".have" before passing them to show_ref() which in turn checks
> > whether the ref matches the hideRefs pattern. Move the check before the
> > path substitution in show_ref_cb() such that the hideRefs feature can be
> > used to hide specific refs outside the current namespace.
> >
> > Signed-off-by: Lukas Fleischer 
> > ---
> > The other show_ref() call sites are in show_one_alternate_sha1() and in
> > write_head_info(). The call site in show_one_alternate_sha1() is for
> > alternates and passes ".have". The other one is
> >
> > show_ref("capabilities^{}", null_sha1);
> >
> > and is not relevant to the hideRefs feature. Note that this kind of
> > breaks backwards compatibility since the "magic" hideRefs patterns
> > ".have" and "capabilities^{}" no longer work, as explained in the
> > discussion.
> 
> If somebody is using namespaces and has "refs/frotz/" in the
> hiderefs configuration, we hide refs/frotz/ no matter which
> namespace is being accessed.  With this change, with the removal the
> check from show_ref(), wouldn't such a repository suddenly see a
> behaviour change?
> [...]

It would indeed. However, we cannot stay 100% backwards compatible when
adding support for matching refs outside the current namespace without
introducing new syntax. For example, if Git namespaces are in use (i.e.
GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would
not have hidden refs/namespaces/foo/refs/bar before the change but it
does afterwards. You might argue that nobody would have added
"refs/namespaces/foo/refs/bar" to hideRefs in the first place but
namespaces can be nested and it might be that the user meant to hide
refs/namespaces/bar/refs/namespaces/foo/refs/bar instead. Yes, those are
weird corner cases. But then again, I think that using hideRefs with
namespaces already is a corner case as well. I also think that using the
same syntax to match both original and stripped refs is bad design. It
makes things complicated and the resulting feature doesn't have the full
expressive power of the simpler version only matching original refs.

So, we can either intentionally break backwards compatibility for some
rare corner cases, or keep the current behavior and introduce some new
syntax for matching the original (unstripped) refs. For the latter, we
could either introduce a new option ("hideUnstrippedRefs"?) or special
syntax inside hideRefs ("/refs/foo" instead of "refs/foo" for matching
the unstripped version only?) What do you think?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow hideRefs to match refs outside the namespace

2015-10-31 Thread Junio C Hamano
Lukas Fleischer  writes:

>> If somebody is using namespaces and has "refs/frotz/" in the
>> hiderefs configuration, we hide refs/frotz/ no matter which
>> namespace is being accessed.  With this change, with the removal the
>> check from show_ref(), wouldn't such a repository suddenly see a
>> behaviour change?
>> [...]
>
> It would indeed. However, we cannot stay 100% backwards compatible when
> adding support for matching refs outside the current namespace without
> introducing new syntax. For example, if Git namespaces are in use (i.e.
> GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would
> not have hidden refs/namespaces/foo/refs/bar before the change but it
> does afterwards. You might argue that nobody would have added
> "refs/namespaces/foo/refs/bar" to hideRefs in the first place...

I won't.  To the current users, when they say they want to exclude
"refs/foo", they mean they do not want to advertise the fact that a
ref "refs/foo/*" exists in their repository (either the whole thing
if that is how it is accessed, or in the namespace being
accessed). and you can replace "foo" with any string, including the
ones that contain "/namespaces/", i.e. the user wanted to exclude
refs from nested ones.

I suspect what you wrote in the above is being a bit too defeatist,
though.  We only need to prevent regressions to user with existing
and valid configurations.

You earlier (re)discovered a good approach to introduce a new
feature without breaking settings of existing users when we
discussed a "whitelist".  Since setting the configuration to an
empty string did not do anything in the old code, an empty string
was an invalid and non-working setting.  By taking advantage of that
fact, you safely can say "if you start with an empty that would
match everything, we'll treat all the others differently from the
way we did before" if you wanted to.  I think you can follow the
same principle here.  For example, I can imagine that the rule for
the "ref-is-hidden" can be updated to:

 * it now takes refname and also the fullname before stripping the
   namespace;

 * hide patterns that is prefixed with '!' means negative, just as
   before;

 * (after possibly '!' is stripped), hide patterns that is prefixed
   with '^', which was invalid before, means check the fullname with
   namespace prefix, which is a new rule;

 * otherwise, check the refname after stripping the namespace.

Such an update would allow a new feature "we now allow you to write
a pattern that determines the match before stripping the namespace
prefix" without breaking the existing repositories, no?

After looking at the current code, I have to say that the way
ref-is-hidden and show_ref_cb() interact with each other is not very
well designed when namespaces are in use.  I suspect that this is
because the "namespace" stuff was bolted on to the system without
thinking things through.  For example, people may want to hide
refs/changes/* and with the current code, refs/changes/* from your
own namespace will be filtered out, but the corresponding hierarchy
from other namespaces will be included after getting turned into
".have".  And that cannot be a useful behaviour.  Tips of
refs/changes/* would be closely related to the corresponding
branches, which means that it would help reducing the object
transfer if they are included, and the fact that the user hides them
is that the user values it more to reduce the size of the initial
ref advertisement more than the potential reduction of the object
transfer cost.  If other pseudo repositories (aka namespaces) are
projects totally unrelated to yours, inluding their refs/changes/*
(or any of their refs, for that matter) would not help the later
object transfer cost, and including them in the initial ref
advertisement would not achieve anything.  Even if other namespaces
are projects that are closely related to yours, if you are excluding
refs/changes/* from your own, that is a strong sign that you do not
want their refs/changes/*, either.

Assuming other namespaces are forks of the same project as yours
(and otherwise the repository management strategy needs to be
rethought--using namespace for them is not gaining anything other
than making your repack more costly), it is likely that all of them
share a lot of refs that point at the same object (think "tags").
Do we end up sending a lot of ".have" for exactly the same object
number of times?  Even though we cannot dedup show_ref() lines that
talk about concrete refs (because they talk about what refs exist at
which value, and the sending side would use them to locally reject
non-ff pushes, for example), ".have" lines that talk about the same
object can be safely deduped.  This is not directly related to your
topic of "what should be included in the advertisement", but a
potentially good thing to fix, if it indeed turns out that we are
sending a lot of duplicate ".have"s.  A fix in that would make
things better for 

Re: [PATCH] Allow hideRefs to match refs outside the namespace

2015-10-31 Thread Lukas Fleischer
On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote:
> [...]
> You earlier (re)discovered a good approach to introduce a new
> feature without breaking settings of existing users when we
> discussed a "whitelist".  Since setting the configuration to an
> empty string did not do anything in the old code, an empty string
> was an invalid and non-working setting.  By taking advantage of that
> fact, you safely can say "if you start with an empty that would
> match everything, we'll treat all the others differently from the
> way we did before" if you wanted to.  I think you can follow the
> same principle here.  For example, I can imagine that the rule for
> the "ref-is-hidden" can be updated to:
> 
>  * it now takes refname and also the fullname before stripping the
>namespace;
> 
>  * hide patterns that is prefixed with '!' means negative, just as
>before;
> 
>  * (after possibly '!' is stripped), hide patterns that is prefixed
>with '^', which was invalid before, means check the fullname with
>namespace prefix, which is a new rule;
> 
>  * otherwise, check the refname after stripping the namespace.
> 
> Such an update would allow a new feature "we now allow you to write
> a pattern that determines the match before stripping the namespace
> prefix" without breaking the existing repositories, no?
> 

Yes. If I understood you correctly, this is exactly what I suggested in
the last paragraph of my previous email (the only difference being that
I suggested to use "/" as full name indicator instead of "^" but that is
just an implementation detail). I will look into implementing this if
that is the way we want to go.

> [...]
> Assuming other namespaces are forks of the same project as yours
> (and otherwise the repository management strategy needs to be
> rethought--using namespace for them is not gaining anything other
> than making your repack more costly), it is likely that all of them
> share a lot of refs that point at the same object (think "tags").
> Do we end up sending a lot of ".have" for exactly the same object
> number of times?  Even though we cannot dedup show_ref() lines that
> talk about concrete refs (because they talk about what refs exist at
> which value, and the sending side would use them to locally reject
> non-ff pushes, for example), ".have" lines that talk about the same
> object can be safely deduped.  This is not directly related to your
> topic of "what should be included in the advertisement", but a
> potentially good thing to fix, if it indeed turns out that we are
> sending a lot of duplicate ".have"s.  A fix in that would make
> things better for everybody (not just namespace users, but those who
> show the ".have" lines from the refs in the repository we borrow
> objects from).

Yes, I think we currently send a lot of duplicate lines. Would be nice
to have that fixed as well.

Note that we do use Git namespaces to store a lot of different but
similar pseudo repositories (i.e. they do not share any history but the
objects have huge similarities). Even though the pseudo repositories
itself are tiny, having the objects in a shared object storage reduces
the size significantly. Other people probably use separate repositories,
combined with something like GIT_OBJECT_DIRECTORY and preciousObjects
for that. Using Git namespaces, however, allows to run `git gc`/`git
repack` without needing to take care of maintaining back references to
the pseudo repositories and, more importantly, allows for storing all
the refs in a single "packed-refs" file which did reduce the size the
size by another factor of 10 in our tests. That massive difference in
size is probably mostly due to the fact that the actual content of each
repository is just some 100 bytes. Not sure if saving that much space
can currently be achieved with any other approach.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Allow hideRefs to match refs outside the namespace

2015-10-28 Thread Lukas Fleischer
Right now, refs with a path outside the current namespace are replaced
by ".have" before passing them to show_ref() which in turn checks
whether the ref matches the hideRefs pattern. Move the check before the
path substitution in show_ref_cb() such that the hideRefs feature can be
used to hide specific refs outside the current namespace.

Signed-off-by: Lukas Fleischer 
---
The other show_ref() call sites are in show_one_alternate_sha1() and in
write_head_info(). The call site in show_one_alternate_sha1() is for
alternates and passes ".have". The other one is

show_ref("capabilities^{}", null_sha1);

and is not relevant to the hideRefs feature. Note that this kind of
breaks backwards compatibility since the "magic" hideRefs patterns
".have" and "capabilities^{}" no longer work, as explained in the
discussion.

 builtin/receive-pack.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index bcb624b..4a5d0ae 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -195,9 +195,6 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
 
 static void show_ref(const char *path, const unsigned char *sha1)
 {
-   if (ref_is_hidden(path))
-   return;
-
if (sent_capabilities) {
packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
} else {
@@ -221,6 +218,9 @@ static void show_ref(const char *path, const unsigned char 
*sha1)
 
 static int show_ref_cb(const char *path, const struct object_id *oid, int 
flag, void *unused)
 {
+   if (ref_is_hidden(path))
+   return 0;
+
path = strip_namespace(path);
/*
 * Advertise refs outside our current namespace as ".have"
-- 
2.6.2

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Allow hideRefs to match refs outside the namespace

2015-10-28 Thread Junio C Hamano
Lukas Fleischer  writes:

> Right now, refs with a path outside the current namespace are replaced
> by ".have" before passing them to show_ref() which in turn checks
> whether the ref matches the hideRefs pattern. Move the check before the
> path substitution in show_ref_cb() such that the hideRefs feature can be
> used to hide specific refs outside the current namespace.
>
> Signed-off-by: Lukas Fleischer 
> ---
> The other show_ref() call sites are in show_one_alternate_sha1() and in
> write_head_info(). The call site in show_one_alternate_sha1() is for
> alternates and passes ".have". The other one is
>
> show_ref("capabilities^{}", null_sha1);
>
> and is not relevant to the hideRefs feature. Note that this kind of
> breaks backwards compatibility since the "magic" hideRefs patterns
> ".have" and "capabilities^{}" no longer work, as explained in the
> discussion.

If somebody is using namespaces and has "refs/frotz/" in the
hiderefs configuration, we hide refs/frotz/ no matter which
namespace is being accessed.  With this change, with the removal the
check from show_ref(), wouldn't such a repository suddenly see a
behaviour change?

>  builtin/receive-pack.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
> index bcb624b..4a5d0ae 100644
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -195,9 +195,6 @@ static int receive_pack_config(const char *var, const 
> char *value, void *cb)
>  
>  static void show_ref(const char *path, const unsigned char *sha1)
>  {
> - if (ref_is_hidden(path))
> - return;
> -
>   if (sent_capabilities) {
>   packet_write(1, "%s %s\n", sha1_to_hex(sha1), path);
>   } else {
> @@ -221,6 +218,9 @@ static void show_ref(const char *path, const unsigned 
> char *sha1)
>  
>  static int show_ref_cb(const char *path, const struct object_id *oid, int 
> flag, void *unused)
>  {
> + if (ref_is_hidden(path))
> + return 0;
> +
>   path = strip_namespace(path);
>   /*
>* Advertise refs outside our current namespace as ".have"
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html