Re: [PATCH] Allow hideRefs to match refs outside the namespace
On Sun, 01 Nov 2015 at 00:40:39, Lukas Fleischer wrote: > On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote: > > [...] > > You earlier (re)discovered a good approach to introduce a new > > feature without breaking settings of existing users when we > > discussed a "whitelist". Since setting the configuration to an > > empty string did not do anything in the old code, an empty string > > was an invalid and non-working setting. By taking advantage of that > > fact, you safely can say "if you start with an empty that would > > match everything, we'll treat all the others differently from the > > way we did before" if you wanted to. I think you can follow the > > same principle here. For example, I can imagine that the rule for > > the "ref-is-hidden" can be updated to: > > > > * it now takes refname and also the fullname before stripping the > >namespace; > > > > * hide patterns that is prefixed with '!' means negative, just as > >before; > > > > * (after possibly '!' is stripped), hide patterns that is prefixed > >with '^', which was invalid before, means check the fullname with > >namespace prefix, which is a new rule; > > > > * otherwise, check the refname after stripping the namespace. > > > > Such an update would allow a new feature "we now allow you to write > > a pattern that determines the match before stripping the namespace > > prefix" without breaking the existing repositories, no? > > > > Yes. If I understood you correctly, this is exactly what I suggested in > the last paragraph of my previous email (the only difference being that > I suggested to use "/" as full name indicator instead of "^" but that is > just an implementation detail). I will look into implementing this if > that is the way we want to go. > [...] There are two more things I noticed. Firstly, while looking for other callers of ref_is_hidden(), I realized that send_ref() in upload-pack.c contains these lines of code: const char *refname_nons = strip_namespace(refname); struct object_id peeled; if (mark_our_ref(refname, oid)) return 0; where mark_our_ref() performs the ref_is_hidden() check on its first parameter. So, in contrast to receive-pack, we already match the original full reference (and not the stripped one) against the hideRefs pattern there. In particular, when using transfer.hideRefs, the same pattern does different things when receiving and uploading. Now, this cannot be intended behavior and I do not think this is something we want to retain when improving that feature. My suggestion is: 1. Define the (current) semantics of hideRefs pattern. It either needs to be defined to match full references or stripped references. Both definitions are equivalent when Git namespaces are not used. It probably makes sense to define hideRefs patterns to match stripped references. If anybody relied on the upload-pack behavior of patterns matching full references, it may happen that more refs are hidden when that behavior is adjusted to match the new hideRefs semantics. The administrator would become aware of that change soon if it affects anything (i.e. hides things that should not be hidden). But I am pretty sure that this behavior isn't currently being relied on either way. Both Git namespaces and hideRefs aren't very popular features and anybody using that combination would probably have noticed the inconsistency and reported a bug earlier. 2. Improve the documentation and describe the hideRefs semantics better. Include details on the choice we made in (1). 3. Fix the send_ref() code in either receive-pack or upload-pack, depending on which is buggy according to our new definition. 4. Improve hideRefs patterns and allow to match both full references and stripped references by using a special indicator as suggested earlier. 5. Add a note on the change in behavior to the release notes of the release that "breaks backwards compatibility". Putting it in quotes because I actually think that we are fixing a bug rather than breaking compatibility. But since there was no documentation on the correct behavior, the former implementation was, technically, the only specification of "correct" behavior that existed at that point... The second thing I noticed is that having syntax for allowing matches against both full references and stripped references is extremely handy and desirable, even if we would not have to introduce it for backwards compatibility. For example, using the syntax Junio described earlier, my initial use case could be solved by receive.hideRefs=^refs/ receive.hideRefs=!refs/ which means "Hide all references but do not hide references from
Re: [PATCH] Allow hideRefs to match refs outside the namespace
Lukas Fleischerwrites: > Now, this cannot be intended behavior and I do not think this is > something we want to retain when improving that feature. Yup, that makes me suspect that namespace support with hiderefs was done without giving much thought even stronger than before, and the fact that nobody has brought it up so far suggests it would be much smaller deal than usual if a fix brings in incompatibilities to those who use namespaces. > 1. Define the (current) semantics of hideRefs pattern. It either needs >to be defined to match full references or stripped references. Both >definitions are equivalent when Git namespaces are not used. > >It probably makes sense to define hideRefs patterns to match stripped >references. OK. > 2. Improve the documentation and describe the hideRefs semantics better. > 3. Fix the send_ref() code in either receive-pack or upload-pack, > 4. Improve hideRefs patterns and allow to match both full references and > 5. Add a note on the change in behavior to the release notes of the All OK. > The second thing I noticed is that having syntax for allowing matches > against both full references and stripped references is extremely handy > and desirable, even if we would not have to introduce it for backwards > compatibility. For example, using the syntax Junio described earlier, my > initial use case could be solved by > > receive.hideRefs=^refs/ > receive.hideRefs=!refs/ > > which means "Hide all references but do not hide references from the > current namespace." Here, I am assuming that patterns for stripped refs > never match anything outside the current namespace because those > patterns become NULL after stripping. I would instead assume that the presence of ^ (or !^) in front would signal "do not strip before checking". !refs/ would mean "after stripping, does it begin with refs/? If so then do not hide it". But that does not change the conclusion. With ^refs/ that says "hide everything that matches refs/ before stripping" (i.e. do not include anything from anywhere), that is overriden by !refs/ that says "but do not hide anything that matches refs/ after stripping" (do include everything from my namespace), I'd think that you'd get your desired behaviour. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow hideRefs to match refs outside the namespace
I wrote this email on Thursday but it seems like it did not make it through the mailing list. Resubmitting... On Wed, 28 Oct 2015 at 17:21:59, Junio C Hamano wrote: > Lukas Fleischerwrites: > > > Right now, refs with a path outside the current namespace are replaced > > by ".have" before passing them to show_ref() which in turn checks > > whether the ref matches the hideRefs pattern. Move the check before the > > path substitution in show_ref_cb() such that the hideRefs feature can be > > used to hide specific refs outside the current namespace. > > > > Signed-off-by: Lukas Fleischer > > --- > > The other show_ref() call sites are in show_one_alternate_sha1() and in > > write_head_info(). The call site in show_one_alternate_sha1() is for > > alternates and passes ".have". The other one is > > > > show_ref("capabilities^{}", null_sha1); > > > > and is not relevant to the hideRefs feature. Note that this kind of > > breaks backwards compatibility since the "magic" hideRefs patterns > > ".have" and "capabilities^{}" no longer work, as explained in the > > discussion. > > If somebody is using namespaces and has "refs/frotz/" in the > hiderefs configuration, we hide refs/frotz/ no matter which > namespace is being accessed. With this change, with the removal the > check from show_ref(), wouldn't such a repository suddenly see a > behaviour change? > [...] It would indeed. However, we cannot stay 100% backwards compatible when adding support for matching refs outside the current namespace without introducing new syntax. For example, if Git namespaces are in use (i.e. GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would not have hidden refs/namespaces/foo/refs/bar before the change but it does afterwards. You might argue that nobody would have added "refs/namespaces/foo/refs/bar" to hideRefs in the first place but namespaces can be nested and it might be that the user meant to hide refs/namespaces/bar/refs/namespaces/foo/refs/bar instead. Yes, those are weird corner cases. But then again, I think that using hideRefs with namespaces already is a corner case as well. I also think that using the same syntax to match both original and stripped refs is bad design. It makes things complicated and the resulting feature doesn't have the full expressive power of the simpler version only matching original refs. So, we can either intentionally break backwards compatibility for some rare corner cases, or keep the current behavior and introduce some new syntax for matching the original (unstripped) refs. For the latter, we could either introduce a new option ("hideUnstrippedRefs"?) or special syntax inside hideRefs ("/refs/foo" instead of "refs/foo" for matching the unstripped version only?) What do you think? -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow hideRefs to match refs outside the namespace
Lukas Fleischerwrites: >> If somebody is using namespaces and has "refs/frotz/" in the >> hiderefs configuration, we hide refs/frotz/ no matter which >> namespace is being accessed. With this change, with the removal the >> check from show_ref(), wouldn't such a repository suddenly see a >> behaviour change? >> [...] > > It would indeed. However, we cannot stay 100% backwards compatible when > adding support for matching refs outside the current namespace without > introducing new syntax. For example, if Git namespaces are in use (i.e. > GIT_NAMESPACE is set), "refs/namespaces/foo/refs/bar" in hideRefs would > not have hidden refs/namespaces/foo/refs/bar before the change but it > does afterwards. You might argue that nobody would have added > "refs/namespaces/foo/refs/bar" to hideRefs in the first place... I won't. To the current users, when they say they want to exclude "refs/foo", they mean they do not want to advertise the fact that a ref "refs/foo/*" exists in their repository (either the whole thing if that is how it is accessed, or in the namespace being accessed). and you can replace "foo" with any string, including the ones that contain "/namespaces/", i.e. the user wanted to exclude refs from nested ones. I suspect what you wrote in the above is being a bit too defeatist, though. We only need to prevent regressions to user with existing and valid configurations. You earlier (re)discovered a good approach to introduce a new feature without breaking settings of existing users when we discussed a "whitelist". Since setting the configuration to an empty string did not do anything in the old code, an empty string was an invalid and non-working setting. By taking advantage of that fact, you safely can say "if you start with an empty that would match everything, we'll treat all the others differently from the way we did before" if you wanted to. I think you can follow the same principle here. For example, I can imagine that the rule for the "ref-is-hidden" can be updated to: * it now takes refname and also the fullname before stripping the namespace; * hide patterns that is prefixed with '!' means negative, just as before; * (after possibly '!' is stripped), hide patterns that is prefixed with '^', which was invalid before, means check the fullname with namespace prefix, which is a new rule; * otherwise, check the refname after stripping the namespace. Such an update would allow a new feature "we now allow you to write a pattern that determines the match before stripping the namespace prefix" without breaking the existing repositories, no? After looking at the current code, I have to say that the way ref-is-hidden and show_ref_cb() interact with each other is not very well designed when namespaces are in use. I suspect that this is because the "namespace" stuff was bolted on to the system without thinking things through. For example, people may want to hide refs/changes/* and with the current code, refs/changes/* from your own namespace will be filtered out, but the corresponding hierarchy from other namespaces will be included after getting turned into ".have". And that cannot be a useful behaviour. Tips of refs/changes/* would be closely related to the corresponding branches, which means that it would help reducing the object transfer if they are included, and the fact that the user hides them is that the user values it more to reduce the size of the initial ref advertisement more than the potential reduction of the object transfer cost. If other pseudo repositories (aka namespaces) are projects totally unrelated to yours, inluding their refs/changes/* (or any of their refs, for that matter) would not help the later object transfer cost, and including them in the initial ref advertisement would not achieve anything. Even if other namespaces are projects that are closely related to yours, if you are excluding refs/changes/* from your own, that is a strong sign that you do not want their refs/changes/*, either. Assuming other namespaces are forks of the same project as yours (and otherwise the repository management strategy needs to be rethought--using namespace for them is not gaining anything other than making your repack more costly), it is likely that all of them share a lot of refs that point at the same object (think "tags"). Do we end up sending a lot of ".have" for exactly the same object number of times? Even though we cannot dedup show_ref() lines that talk about concrete refs (because they talk about what refs exist at which value, and the sending side would use them to locally reject non-ff pushes, for example), ".have" lines that talk about the same object can be safely deduped. This is not directly related to your topic of "what should be included in the advertisement", but a potentially good thing to fix, if it indeed turns out that we are sending a lot of duplicate ".have"s. A fix in that would make things better for
Re: [PATCH] Allow hideRefs to match refs outside the namespace
On Sat, 31 Oct 2015 at 18:31:23, Junio C Hamano wrote: > [...] > You earlier (re)discovered a good approach to introduce a new > feature without breaking settings of existing users when we > discussed a "whitelist". Since setting the configuration to an > empty string did not do anything in the old code, an empty string > was an invalid and non-working setting. By taking advantage of that > fact, you safely can say "if you start with an empty that would > match everything, we'll treat all the others differently from the > way we did before" if you wanted to. I think you can follow the > same principle here. For example, I can imagine that the rule for > the "ref-is-hidden" can be updated to: > > * it now takes refname and also the fullname before stripping the >namespace; > > * hide patterns that is prefixed with '!' means negative, just as >before; > > * (after possibly '!' is stripped), hide patterns that is prefixed >with '^', which was invalid before, means check the fullname with >namespace prefix, which is a new rule; > > * otherwise, check the refname after stripping the namespace. > > Such an update would allow a new feature "we now allow you to write > a pattern that determines the match before stripping the namespace > prefix" without breaking the existing repositories, no? > Yes. If I understood you correctly, this is exactly what I suggested in the last paragraph of my previous email (the only difference being that I suggested to use "/" as full name indicator instead of "^" but that is just an implementation detail). I will look into implementing this if that is the way we want to go. > [...] > Assuming other namespaces are forks of the same project as yours > (and otherwise the repository management strategy needs to be > rethought--using namespace for them is not gaining anything other > than making your repack more costly), it is likely that all of them > share a lot of refs that point at the same object (think "tags"). > Do we end up sending a lot of ".have" for exactly the same object > number of times? Even though we cannot dedup show_ref() lines that > talk about concrete refs (because they talk about what refs exist at > which value, and the sending side would use them to locally reject > non-ff pushes, for example), ".have" lines that talk about the same > object can be safely deduped. This is not directly related to your > topic of "what should be included in the advertisement", but a > potentially good thing to fix, if it indeed turns out that we are > sending a lot of duplicate ".have"s. A fix in that would make > things better for everybody (not just namespace users, but those who > show the ".have" lines from the refs in the repository we borrow > objects from). Yes, I think we currently send a lot of duplicate lines. Would be nice to have that fixed as well. Note that we do use Git namespaces to store a lot of different but similar pseudo repositories (i.e. they do not share any history but the objects have huge similarities). Even though the pseudo repositories itself are tiny, having the objects in a shared object storage reduces the size significantly. Other people probably use separate repositories, combined with something like GIT_OBJECT_DIRECTORY and preciousObjects for that. Using Git namespaces, however, allows to run `git gc`/`git repack` without needing to take care of maintaining back references to the pseudo repositories and, more importantly, allows for storing all the refs in a single "packed-refs" file which did reduce the size the size by another factor of 10 in our tests. That massive difference in size is probably mostly due to the fact that the actual content of each repository is just some 100 bytes. Not sure if saving that much space can currently be achieved with any other approach. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Allow hideRefs to match refs outside the namespace
Right now, refs with a path outside the current namespace are replaced by ".have" before passing them to show_ref() which in turn checks whether the ref matches the hideRefs pattern. Move the check before the path substitution in show_ref_cb() such that the hideRefs feature can be used to hide specific refs outside the current namespace. Signed-off-by: Lukas Fleischer--- The other show_ref() call sites are in show_one_alternate_sha1() and in write_head_info(). The call site in show_one_alternate_sha1() is for alternates and passes ".have". The other one is show_ref("capabilities^{}", null_sha1); and is not relevant to the hideRefs feature. Note that this kind of breaks backwards compatibility since the "magic" hideRefs patterns ".have" and "capabilities^{}" no longer work, as explained in the discussion. builtin/receive-pack.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index bcb624b..4a5d0ae 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -195,9 +195,6 @@ static int receive_pack_config(const char *var, const char *value, void *cb) static void show_ref(const char *path, const unsigned char *sha1) { - if (ref_is_hidden(path)) - return; - if (sent_capabilities) { packet_write(1, "%s %s\n", sha1_to_hex(sha1), path); } else { @@ -221,6 +218,9 @@ static void show_ref(const char *path, const unsigned char *sha1) static int show_ref_cb(const char *path, const struct object_id *oid, int flag, void *unused) { + if (ref_is_hidden(path)) + return 0; + path = strip_namespace(path); /* * Advertise refs outside our current namespace as ".have" -- 2.6.2 -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Allow hideRefs to match refs outside the namespace
Lukas Fleischerwrites: > Right now, refs with a path outside the current namespace are replaced > by ".have" before passing them to show_ref() which in turn checks > whether the ref matches the hideRefs pattern. Move the check before the > path substitution in show_ref_cb() such that the hideRefs feature can be > used to hide specific refs outside the current namespace. > > Signed-off-by: Lukas Fleischer > --- > The other show_ref() call sites are in show_one_alternate_sha1() and in > write_head_info(). The call site in show_one_alternate_sha1() is for > alternates and passes ".have". The other one is > > show_ref("capabilities^{}", null_sha1); > > and is not relevant to the hideRefs feature. Note that this kind of > breaks backwards compatibility since the "magic" hideRefs patterns > ".have" and "capabilities^{}" no longer work, as explained in the > discussion. If somebody is using namespaces and has "refs/frotz/" in the hiderefs configuration, we hide refs/frotz/ no matter which namespace is being accessed. With this change, with the removal the check from show_ref(), wouldn't such a repository suddenly see a behaviour change? > builtin/receive-pack.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c > index bcb624b..4a5d0ae 100644 > --- a/builtin/receive-pack.c > +++ b/builtin/receive-pack.c > @@ -195,9 +195,6 @@ static int receive_pack_config(const char *var, const > char *value, void *cb) > > static void show_ref(const char *path, const unsigned char *sha1) > { > - if (ref_is_hidden(path)) > - return; > - > if (sent_capabilities) { > packet_write(1, "%s %s\n", sha1_to_hex(sha1), path); > } else { > @@ -221,6 +218,9 @@ static void show_ref(const char *path, const unsigned > char *sha1) > > static int show_ref_cb(const char *path, const struct object_id *oid, int > flag, void *unused) > { > + if (ref_is_hidden(path)) > + return 0; > + > path = strip_namespace(path); > /* >* Advertise refs outside our current namespace as ".have" -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html