Re: [RFC 0/3] Reflogs for deleted refs: fix breakage and suggest namespace change
On 20 Aug 2012, at 13:32, Alexey Muranov wrote: The problem of mapping branch names to file paths looks to me very similar to the problem of mapping URLs to file paths for static web sites, so i would propose to use the same solution: add a special extension to distinguish a file from a directory, for example .branch and .tag (like .html in the case of URL). This would allow having both branches next and next/foo with refs stored in files next.branch and next/foo.branch. This will look very clear and familiar to people not specialist in Git, but familiar with static web sites. The only limitation this would introduces is that branch names foo.branch would need to be forbidden. If the extension is optional, this makes the new rule almost compatible with the current one, except if somebody is currently using branches named like foo.branch or next.branch/foo. Another possible choice for the extensions: .~br and .~tg (to keep readability of file names and allow all currently allowed branch names).-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/3] Reflogs for deleted refs: fix breakage and suggest namespace change
On 19 Aug 2012, at 02:02, Junio C Hamano wrote: Alexey Muranov alexey.mura...@gmail.com writes: I hope my opinion might be useful because i do not know anything about the actual implementation of Git,... That sounds like contradiction. I think that the implementation (the code), the model, and the interface are independent. On the top level, for example, one does not need to know how commit storage is optimized, it is enough to understand that each commit is a snapshot of a subtree in a file directory. To just give a quick idea of my ideas, i thought that 'fetching' in Git was an inevitable evil that stands apart from other operations and is necessary only because the computer communication on Earth is not sufficiently developed to keep all Git repositories constantly in sync,... It is a feature, not a symptom of an insufficiently developed technology, that I do not have to know what random tweaks and experiments are done in repositories of 47 thousands people who clone from me, and I can sync with any one of them only when I know there is something worth looking at when I say git fetch. Currently, one of the main functions of 'fetch', apart from changing the remote tracking branches, is downloading the remote objects. This is necessary because of an insufficiently developed technology. The other main function is changing the local copies of remote branches (changing the remote tracking branches), this is what i described as taking a snapshot. I did not understand what you meant by I do not have to know what random tweaks and experiments are done in repositories of 47 thousands people who clone from me, and I can sync with any one of them only when I know there is something worth looking at when I say git fetch. How is it possible to know and not to know what is going on in the remote repository in the same time? -Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/3] Reflogs for deleted refs: fix breakage and suggest namespace change
On 19 Aug 2012, at 02:02, Junio C Hamano wrote: Alexey Muranov alexey.mura...@gmail.com writes: I hope my opinion might be useful because i do not know anything about the actual implementation of Git,... That sounds like contradiction. I meant that i am psychologically not attached to the current behavior, and may provide a naïve view point, if you like. -Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/3] Reflogs for deleted refs: fix breakage and suggest namespace change
On 19 Aug 2012, at 02:02, Junio C Hamano wrote: Alexey Muranov alexey.mura...@gmail.com writes: Excuse me if i miss something again, but i might be willing to discuss the ultimate destination. Could you possibly state in simple terms what the problem with determining the ultimate destination is? Decide if it makes sense to break backward compatibility of loose ref representation merely to support having a branch next and another branch next/foo in the same repository, and if it does, what the new loose ref representation looks like. I looked again through this thread and tried to understand better the issues. 1. I vote for moving dead reflogs to logs/graveyard (or to logs/deadlogs). 2. I think that allowing both next and next/foo complicates the mapping from branch names to file paths, and it does not seem necessary if dead reflogs are moved away to graveyard anyway. 3. There remains the question what to do with dead reflogs for different branches having the same name. Maybe, keep the death date and time under the graveyard directory and not allow the user to delete 2 times in less than 1 second? /logs/graveyard/-mm-dd-hhmmss/refs/heads/next/foo In a sense this is similar to the git storage model: an atomic destructive operation creates a timestamped commit in logs/graveyard directory.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/3] Reflogs for deleted refs: fix breakage and suggest namespace change
On 19 Aug 2012, at 19:38, Junio C Hamano wrote: Alexey Muranov alexey.mura...@gmail.com writes: 2. I think that allowing both next and next/foo complicates the mapping from branch names to file paths, and it does not seem necessary if dead reflogs are moved away to graveyard anyway. It is unclear why the first two lines above leads to the conclusion it does not seem necessary (but honestly, I do not particularly care). I thought that the first reason that allowing next and next/foo seemed necessary was avoiding conflicts with dead reflogs or between dead reflogs. If dead reflog for next/foo is moved away, it will not conflict with a new one for next. There remains a problem with a conflict between dead next/foo and dead next. This can be solved as Jeff suggested by adding special escape symbols, or as i suggested below, by keeping reflogs deleted on different occasions in different timestamp directories. 3. There remains the question what to do with dead reflogs for different branches having the same name. Maybe, keep the death date and time under the graveyard directory and not allow the user to delete 2 times in less than 1 second? /logs/graveyard/-mm-dd-hhmmss/refs/heads/next/foo How would that help us in what way? When I ask git log -g next/foo for the next/foo branch that currently exists, I want to see the update history of its tip since I created it for the last time, and then an entry that says I created it at such and such time. If I used to have the branch before but deleted, then the output should be followed by another entry that says I deleted it at such and such time, followed by the history of the tip updates. I only suggested how to resolve conflicts between dead reflogs in graveyard if next and next/foo cannot coexist. For example, if first next/foo was created and deleted, and then next was created and deleted. It also seems nice to me to have dead reflogs for different identically named branches (created and deleted independently) in separate files. It is possible to collect the information for git log -g next/foo by looking through all timestamp subdirectories in graveyard. -Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC 0/3] Reflogs for deleted refs: fix breakage and suggest namespace change
On 18 Aug 2012, at 22:39, Junio C Hamano wrote: Do we _know_ already what the ultimate destination looks like? If the answer is yes, then I agree, but otherwise, I doubt it is a good idea to introduce unnecessary complexity to the system that may have to be ripped out and redone. I didn't get the impression that we know the ultimate destination from the previous discussion, especially if we discount the tangent around having next and next/foo at the same time which was on nobody's wish, but I may be misremembering things. Excuse me if i miss something again, but i might be willing to discuss the ultimate destination. Could you possibly state in simple terms what the problem with determining the ultimate destination is? I hope my opinion might be useful because i do not know anything about the actual implementation of Git, but for a while i thought i was understanding it's intended mathematical model, until i ran into unexpected for me default behavior of not pruning when fetching. To just give a quick idea of my ideas, i thought that 'fetching' in Git was an inevitable evil that stands apart from other operations and is necessary only because the computer communication on Earth is not sufficiently developed to keep all Git repositories constantly in sync, and because one might prefer to work with a somewhat dated snapshot of a remote than with the constantly changing current version. I thought snapshot could be a good alternative name for fetch. -Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] retain reflogs for deleted refs
On 26 Jul 2012, at 14:47, Nguyen Thai Ngoc Duy wrote: So we haven't found any way to present both branches foo and foo/bar on file system at the same time. How about when we a new branch introduces such a conflict, we push the new branch directly to packed-refs? If we need either of them on a separate file, for fast update for example, then we unpack just one and repack all refs that conflict with it. Attempting to update two conflict branches in parallel may impact performance, but I don't think that happens often. -- Duy How about simply deprecating / in branch name? -Alexey. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] retain reflogs for deleted refs
On 26 Jul 2012, at 18:59, Jeff King wrote: Not to mention git itself, as it splits up the refs/remotes hierarchy into subdirectories. I think deprecating / is out of the question. -Peff Ok, i guess you know better than me, my vision of Git is probably still too simplistic. -Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] retain reflogs for deleted refs
On 20 Jul 2012, at 19:09, Jeff King wrote: On Fri, Jul 20, 2012 at 06:37:02PM +0200, Johannes Sixt wrote: Am 20.07.2012 17:44, schrieb Jeff King: So I think a suffix like :d is probably the least horrible. Not so. It does not work on Windows :-( in the expected way. Trying to open a file with a colon-separated suffix either opens a resource fork on NTFS or fails with invalid path. Bleh. It seems that we did too good a job in coming up with a list of disallowed ref characters; they really are things you don't want in your filenames at all. :) How about using '@' as an escape character ? -Alexey. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] retain reflogs for deleted refs
On 20 Jul 2012, at 17:44, Jeff King wrote: On Fri, Jul 20, 2012 at 11:49:07AM +0200, Michael Haggerty wrote: This patch moves reflog entries into a special graveyard namespace, and appends a tilde (~) character, which is not allowed in a valid ref name. This means that the deleted reflogs of these refs: refs/heads/a refs/heads/a/b refs/heads/a/b/c will be stored in: logs/graveyard/refs/heads/a~ logs/graveyard/refs/heads/a/b~ logs/graveyard/refs/heads/a/b/c~ Putting them in the graveyard namespace ensures they will not conflict with live refs, and the tilde prevents D/F conflicts within the graveyard namespace. Sorry if this idea is stupid or if i miss something, but how about putting deleted reflogs for refs/heads/a refs/heads/a/b refs/heads/a/b/c to refs/heads/a@-mm-dd-hhmmss refs/heads/a/b@-mm-dd-hhmmss refs/heads/a/b/c@-mm-dd-hhmmss with the time they were deleted? -Alexey. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] retain reflogs for deleted refs
On 22 Jul 2012, at 15:14, Jeff King wrote: 3. Most importantly, it does not resolve D/F conflicts (it has the same problem as logs/refs/heads/a~). If you delete foo/bar, you will end up with logs/refs/heads/foo/bar@{...}. That will prevent D/F conflicts with a new branch foo/bar/baz, but will still have a problem with just foo. Unfortunately i do not really follow this, because i have not seen any directories in logs/refs/heads/, i only saw files named after local branches there. I do not know how directories are used there. -Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature request: fetch --prune by default
On 19 Jul 2012, at 13:55, Jeff King wrote: On Thu, Jul 19, 2012 at 09:30:59AM +0200, Alexey Muranov wrote: i would like `git fetch --prune remote` to be the default behavior of `git fetch remote` In fact, i think this is the only reasonable behavior. Keeping copies of deleted remote branches after `fetch` is more confusing than useful. I agree it would be much less confusing. However, one downside is that we do not keep reflogs on deleted branches (and nor did the commits in remote branches necessarily make it into the HEAD reflog). That makes git fetch a potentially destructive operation (you irrevocably lose the notion of which remote branches pointed where before the fetch, and you open up new commits to immediate pruning by gc --auto. I do not still understand very well some aspects of Git, like the exact purpose of remote tracking branches (are they for pull or for push?), so i may be wrong. However, i thought that a user was not expected to follow the moves of a remote branch of which the user is not an owner: if the user needs to follow the brach and not lose its commits, he/she should create a remote tracking branch. So I think it would be a lot more palatable if we kept reflogs on deleted branches. That, in turn, has a few open issues, such as how to manage namespace conflicts (e.g., the fact that a deleted foo branch can conflict with a new foo/bar branch). I prefer to think of a remote branch and its local copy as the same thing, which are physically different only because of current real world/hardware/software limitations, which make it necessary to keep a local cache of remote data. With this approach, reflogs should be deleted with the branch, and there will be no namespace conflicts. Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature request: fetch --prune by default
On 19 Jul 2012, at 13:55, Jeff King wrote: I agree it would be much less confusing. However, one downside is that we do not keep reflogs on deleted branches (and nor did the commits in remote branches necessarily make it into the HEAD reflog). That makes git fetch a potentially destructive operation (you irrevocably lose the notion of which remote branches pointed where before the fetch, and you open up new commits to immediate pruning by gc --auto. If i understand correctly, existence of a reflog entry will not stop gc from removing a commit, will it? In this case, if a remote branch was rebased or reset, commits can be lost anyway, right? Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature request: fetch --prune by default
On 19 Jul 2012, at 18:48, Dan Johnson wrote: From the git-gc man page: git gc tries very hard to be safe about the garbage it collects. In particular, it will keep not only objects referenced by your current set of branches and tags, but also objects referenced by the index, remote-tracking branches, refs saved by git filter-branch in refs/original/, or reflogs (which may reference commits in branches that were later amended or rewound). So yes, a reflog entry does stop gc from removing objects, including commits. It will expire old reflog entries (90 days by default) though, so it's not like they will stay around forever. Dan, thanks for the explanation. Alexey. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature request: fetch --prune by default
On 19 Jul 2012, at 19:34, Konstantin Khomoutov wrote: On Thu, 19 Jul 2012 18:21:21 +0200 Alexey Muranov alexey.mura...@gmail.com wrote: [...] I do not still understand very well some aspects of Git, like the exact purpose of remote tracking branches (are they for pull or for push?), so i may be wrong. This is wery well explained in the Pro Git book, for instance. And in numerous blog posts etc. I have read the Pro Gut book and numerous blog posts, but i keep forgetting the explanation because it does not make much sense to me: Tracking branches are local branches that have a direct relationship to a remote branch. If you’re on a tracking branch and type git push, Git automatically knows which server and branch to push to. Also, running git pull while on one of these branches fetches all the remote references and then automatically merges in the corresponding remote branch. etc. Why the same direct relationship for push and pull? What happens if one of the branches was reset (yes, i know, push -f). Most importantly, what is the purpose of it? It is natural to expect that you might be pushing to and pulling from different remotes, i can even imagine pulling from more than one. However, i thought that a user was not expected to follow the moves of a remote branch of which the user is not an owner: if the user needs to follow the brach and not lose its commits, he/she should create a remote tracking branch. This would present another namespacing issue: how would you name the branches you're interested in so that they don't clash with your own personal local branches? You'd have to invent a scheme which would encode the remote's name in a branch name. But remote branches already do just this. So you create a remote tracking branch when you intend to actually *develop* something on that branch with the final intention to push that work back. But i am not interested in remote branches, they are just fetched automatically when i do git fetch. You cannot commit to a remote branch, and i think it is not common to checkout them without a -b option. If i am interested in them, i name them somehow. I think this is the only practical way if i do not want to chase reflogs, because the owner of the branch can reset or rebase it anytime. I do not develop on tracking branches. In fact, i am not even using git pull. So I think it would be a lot more palatable if we kept reflogs on deleted branches. That, in turn, has a few open issues, such as how to manage namespace conflicts (e.g., the fact that a deleted foo branch can conflict with a new foo/bar branch). I prefer to think of a remote branch and its local copy as the same thing, which are physically different only because of current real world/hardware/software limitations, which make it necessary to keep a local cache of remote data. With this approach, reflogs should be deleted with the branch, and there will be no namespace conflicts. It appears, the distributed nature of a DVCS did not fully sink into your mindset yet. ;-) Looks like you mentally treat a Git remote as a thing being used to access a centralized reference server which maintains a master copy of a repository, of which you happen to also have a local copy. Then it's quite logically to think that if someone deleted a branch in the master copy, everyone downstream should have the same remote branch deleted to be in sync with that master copy. But this is not the only way to organize your work. You could fetch from someone else's repository and be interested in their branch foo, but think what happens when you fetch next time from that repo and see Git happily deleting your local branch thatremote/foo simply because someone with push access deleted that branch from the repo. This might *not* be what you really want or expect. But this is true that the object store of Git can be viewed as a single centralized repository. The fact that not everybody has access to every object in Git is a limitation and not a benefit. These are the branches which are individual, and i do not think it is a good habit to treat every reference that was ever fetched with git fetch as your own, and put reflogs of all fetched remote branches under Git version control :D. If i care about thatremote/foo branch, i track it, i do not plan to go through reflogs if it is rebased. Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Feature request: fetch --prune by default
I just want to correct my mistake in what i've just sent: On 19 Jul 2012, at 23:20, Alexey Muranov wrote: because the owner of the branch can reset or rebase it anytime. I do not develop on tracking branches. In fact, i am not even using git pull. I do not develop on tracking branches. Of course i develop on tracking branches, i just got confused once again by pull/push thing: i develop on branches that track origin, not upstream. I think they should be called remotely tracked branches, so there would be remote tracking branches for pull and remotely tracked branches for push. Alexey.-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] retain reflogs for deleted refs
Jeff, i have no idea about Git source and little idea of how it is working internally, but reading through your message i wonder: wouldn't it be a good idea to timestamp the dead reflogs ? Alexey. On 19 Jul 2012, at 23:33, Jeff King wrote: When a ref is deleted, we completely delete its reflog on the spot, leaving very little help for the user to reverse the action. One can sometimes reconstruct the missing entries based on the HEAD reflog, but not always; the deleted entries may not have ever been on HEAD (for example, in the case of a refs/remotes branch that was pruned). That leaves git fsck --lost-found, which can be quite tedious. Instead, let's keep the reflogs for deleted refs around until their entries naturally expire according to the regular reflog expiration rules. This cannot be done by simply leaving the reflog files in place. The ref namespace does not allow D/F conflicts, so a ref foo would block the creation of another ref foo/bar, and vice versa. This limitation is acceptable for two refs to exist simultaneously, but should not have an impact if one of the refs is deleted. This patch moves reflog entries into a special graveyard namespace, and appends a tilde (~) character, which is not allowed in a valid ref name. This means that the deleted reflogs of these refs: refs/heads/a refs/heads/a/b refs/heads/a/b/c will be stored in: logs/graveyard/refs/heads/a~ logs/graveyard/refs/heads/a/b~ logs/graveyard/refs/heads/a/b/c~ Putting them in the graveyard namespace ensures they will not conflict with live refs, and the tilde prevents D/F conflicts within the graveyard namespace. The implementation is fairly straightforward, but it's worth noting a few things: 1. Updates to logs/graveyard/refs/heads/foo~ happen under the ref-lock for refs/heads/foo. So deletion still takes a single lock, and anyone touching the reflog directly needs to reverse the transformation to find the correct lockfile. 2. We append entries to the graveyard reflog rather than simply renaming the file into place. This means that if you create and delete a branch repeatedly, the graveyard will contain the concatenation of all iterations. 3. We do not resurrect dead entries when a new ref is created with the same name. However, it would be possible to build an undelete feature on top of this if one was so inclined. 4. The for_each_reflog code has been loosened to allow reflogs that do not have a matching ref. In this case, the callback is passed the null_sha1, and callers must be prepared to handle this case (the only caller that cares is the reflog expiration code, which is updated here). Only one test needed to be updated; t7701 tries to create unreachable objects by deleting branches. Of course that no longer works, which is the intent of this patch. The test now works around it by removing the graveyard logs. Signed-off-by: Jeff King p...@peff.net --- builtin/reflog.c | 9 +++-- refs.c | 69 +--- refs.h | 3 ++ t/t7701-repack-unpack-unreachable.sh | 5 ++- 4 files changed, 79 insertions(+), 7 deletions(-) diff --git a/builtin/reflog.c b/builtin/reflog.c index b3c9e27..e79a2ca 100644 --- a/builtin/reflog.c +++ b/builtin/reflog.c @@ -359,6 +359,7 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused, struct commit *tip_commit; struct commit_list *tips; int status = 0; + int updateref = cmd-updateref !is_null_sha1(sha1); memset(cb, 0, sizeof(cb)); @@ -367,6 +368,10 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused, * getting updated. */ lock = lock_any_ref_for_update(ref, sha1, 0); + if (!lock is_null_sha1(sha1)) + lock = lock_any_ref_for_update( + graveyard_reflog_to_refname(ref), + sha1, 0); if (!lock) return error(cannot lock ref '%s', ref); log_file = git_pathdup(logs/%s, ref); @@ -426,7 +431,7 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused, status |= error(%s: %s, strerror(errno), newlog_path); unlink(newlog_path); - } else if (cmd-updateref + } else if (updateref (write_in_full(lock-lock_fd, sha1_to_hex(cb.last_kept_sha1), 40) != 40 || write_str_in_full(lock-lock_fd, \n) != 1 || @@ -438,7 +443,7 @@ static int expire_reflog(const char *ref, const unsigned char *sha1, int unused, status |= error(cannot rename %s to %s,