Re: [PATCH v5 0/3] interactive git clean
Eric Sunshine sunsh...@sunshineco.com writes: The pattern [y] will match file named 'y'. It probably is unusual for files named 'y', 'n', etc. to exist in the top-level directory, but the gitignore patterns already provide an escape hatch for these unusual cases. But how does the user know that? I'd rather stay away from dwim that works in 99% of cases but do something dangerous in the 1% remaining, and complex un-guessable escape scheme to solve these few cases. The two stages (yes/no/edit, and then escape patterns) is clear, and does not require so many additional keystrokes. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re[4]: [PATCH 4/5] git-svn: fix bottleneck in stash_placeholder_list()
The last error I encountered is: r7009 = 39805bb078983e34f2fc8d2c8c02d695d00d11c0 (refs/remotes/DMC4_Basic) Too many open files: Can't open file '/home/il/builds/sicap/gitsvn/prd_dmc4.svn/db/revs/0/786': Too many open files at /.snapshots/persist/builds/git/git-git/perl/blib/lib/Git/SVN/Ra.pm line 282. I think It's unrelated to empty dirs. EW Can you get an lsof on the git-svn process right before this? IB /.snapshots/persist/builds/sicap/gitsvn/aaa/.git/A4O_OTQxWc IB /.snapshots/persist/builds/sicap/gitsvn/aaa/.git/LfpcENJduN IB /.snapshots/persist/builds/sicap/gitsvn/aaa/.git/Dkk7pN4Mpz IB etc. EW What's your open files limit? IB 1024 IB Why no call to close() from temp_release() in Git.pm? Found, fixed. It was related to empty dirs. -- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re[3]: [PATCH 4/5] git-svn: fix bottleneck in stash_placeholder_list()
Hi Eric. I'm out of spare time and I still unable to import my repo. The code of SVN.pm is too complex. Please help me. Here's the list of my issues: * I think git-svn doesn't handle the case, when a tag is deleted. I expected it to rename the ref from tags/tagname to tags/tagname@rev, but that doesn't happen. If a tag is replaced, there's no way to tell what was the previous state of that tag: git-svn just rewrites the ref. On the contrary, the temporary refs (with @rev suffix), used for re-import subdir tags are kept after successful reimport, although they have no usage. * As I said already, I have 25k revisions and 200 tags created from subdirs in trunk. This increases the import time from 2h to 12h. I would bear it, if it had to be done once, but fetching a new revision may cause re-import of all 25k revisions too. You should implement some mechanism to find the parent branches of subdir tags. Maybe the unused refs I mentioned in the previous issue are good candidates for that, but I would name them somehow different to distinguish with deleted/replaced tags/branches. * There are mistake commits in the svn history, similar to this: r21255 | xxx_xx_x | 2012-03-02 18:46:30 +0300 (Fri, 02 Mar 2012) | 1 line Changed paths: A /tags/dmagentenabler-4.1.31/DMAgent (from /tags:20998) Delivery 4.1.31 git-svn tries to creates a tag, containing dirs with other tags. Technically, behaves correctly, but it hangs, because of the size of the commit. To solve it, I had to edit the svn dump file: Node-path: tags/dmagentenabler-4.1.31/DMAgent Node-kind: dir Node-action: add -Node-copyfrom-rev: 20998 -Node-copyfrom-path: tags +Prop-content-length: 10 +Content-length: 10 + +PROPS-END Revision-number: 21256 It creates an empty dir, instead of copying. Since the author noticed the mistake, he immediately deleted the dir in the next revision, so it works. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 0/3] interactive git clean
Hi Matthieu, On Mon, May 6, 2013 at 3:58 AM, Matthieu Moy matthieu@grenoble-inp.fr wrote: Eric Sunshine sunsh...@sunshineco.com writes: The pattern [y] will match file named 'y'. It probably is unusual for files named 'y', 'n', etc. to exist in the top-level directory, but the gitignore patterns already provide an escape hatch for these unusual cases. But how does the user know that? Documentation. Junio also mentioned having a '?' in each prompt explaining the options at that point. I'd rather stay away from dwim that works in 99% of cases but do something dangerous in the 1% remaining, and complex un-guessable escape scheme to solve these few cases. The two stages (yes/no/edit, and then escape patterns) is clear, and does not require so many additional keystrokes. The scheme doesn't have to be unsafe. git-clean *knows* which files are up for deletion. If one of those filenames conflicts with one of the prompt options, --interactive mode can warn about the ambiguity at the point the user types that particular ambiguous response, and ask for clarification (did you mean 'yes, delete everything' or 'exclude filename y'?). You get DWIM for 99.9% of the cases, and an extra prompt for the other 0.1%. Nothing unsafe. Is such an implementation more complicated or ugly than the separate 'edit' mode? Is the user experience better or worse (more or less convenient)? Is it easier or harder to document and explain? Another observation which came to mind after my original email: Would it make sense for --interactive mode to launch an editor with the list of files scheduled for deletion and allow the user to remove lines he does not want deleted (similar to git-rebase --interactive). Depending upon case, this could be more or less convenient than the proposed gitignore-patterned 'edit' mode. It could be yet another option in the yes/no/prompt/edit menu or it could be the default and only behavior of git-clean --interactive (again similar to git-rebase --interactive). I'm not advocating any particular user interface. The observations and questions about convenience and user experience both here and in my original email were made with the hope of promoting discussion before the interface gets locked in. How clunky or fluid should it be? How verbose or terse? -- ES -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trouble on windows network share
David Goldfarb d...@degel.com writes: Git works correctly under Linux (Ubuntu 12.04; git 1.7.9.5). I've attached the strace outputs. (Note: for reasons that are probably irrelevant, I needed to run the commands sudo'd. Shout back if this is an issue). Under Windows 7, Cygwin git 1.7.9, commit fails: U:\foogit commit -m added foo2 error: unable to find 0b89efdeef245ed6a0a7eacc5c578629a141f856 fatal: 0b89efdeef245ed6a0a7eacc5c578629a141f856 is not a valid object For what it's worth, note that the file does exist. U:\fools -l .git/objects/0b total 1024 -rwxrw-r-- 1 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 (I'm not sure why the permissions are trashed. Seems to be a Cygwin thing, or maybe my Cygwin config. The ?? also appears on local files, and I believe also with files on the old Buffalo drive, so I don't think it is relevant to the problem). Just in case, here's the same dir, as seen from the Ubuntu VM: deg@ubuntu:/mnt/users/foo$ ls -l .git/objects/0b total 64 -rwxr-xr-x 0 root root 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 Again, note that there is some user permissions lossage here. I don't know enough about Linux mount or CIFS, and apparently did the mount in a way that everything seems to appear to be stuck owned by root. (same problem I hinted at above). Hope this is not relevant to the problem. Here's how the same directory looks, when I'm ssh'd into the NAS box itself: CentralPark:/shares/Users/foo# ls -l .git/objects/0b total 64 -rwxrw-r-- 1 deg share 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 In any event, the symptoms don't seem to be a permissions problem, so all this extra info is probably just a red herring, I hope. Hrm. What about what Jeff already asked of the OP (and AFAICS never got a reply)? } If it's a race condition between the write and the subsequent read in } the same process, then it would be solved by looking at the object } later. Does git cat-file -p 6838761d549cf76033d2e9faf5954e62839eb25d } work, or is the object forever inaccessible? In your case: git cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add porcelain command to revert a single staged file to its HEAD state while preserving its staged state
Dimitar Bonev dsbo...@gmail.com writes: One more argument against the suggestion of doing a commit ahead of time is that I like to think in separation to reduce complexity - in particular I like to think only for the working dir and index states, commits - I treat them as finished work. If I have to amend a commit, it is about fixing a mistake - adding/removing a wrong file, fixing a typo, that sort of things and not for actual work to be done. Aside from what's been said already, there's a fundamental problem with your approach: it doesn't scale in any direction. You cannot use it if you want to do the same dance with more than one version set aside, because the index gives you exactly one more storage area. And it does not generalize in any useful way, whereas using temporary commits is useful also if you want to briefly leave a branch with unfinished work to do something else. Furthermore, knowing how to work with temporary commits (in the plural), and clean them up using 'rebase -i', *is* a part of work to be done in most git workflows. Namely, it is used to reshape your branch from a how did I hack this up angle into a how do I explain this to others angle. So I hope I'm not misconstruing what Junio and Jonathan said, but I think you can sum up our position as: your proposal is a very specific solution to a very narrow problem -- that is only needed because your environment (PowerShell) is arguably broken. All the while, there are several existing solutions that we consider more natural, and that generalize to other use-cases. -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: trouble on windows network share
Looks like it works. From the windows machine: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foo Double-checking that nothing was fixed or changed when I earlier committed the file from Linux, here's a second test: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foogit status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # trace1 # trace2 nothing added to commit but untracked files present (use git add to track) U:\foogit add trace1 U:\foogit commit trace1 -m testing error: unable to find cecae5b4c87ea21aef513fcfcd5c27fe87e0536f fatal: cecae5b4c87ea21aef513fcfcd5c27fe87e0536f is not a valid object U:\foogit cat-file -p cecae5b4c87ea21aef513fcfcd5c27fe87e0536f 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 100644 blob 19102815663d23f8b75a47e7a01965dcdc96468ctest.txt 100644 blob c9009b02950964cf1d5281125e6e2f647dd9dc16trace1 U:\foo David -Original Message- From: Thomas Rast [mailto:tr...@inf.ethz.ch] Sent: Monday, May 06, 2013 12:42 PM To: David Goldfarb Cc: git@vger.kernel.org Subject: Re: trouble on windows network share David Goldfarb d...@degel.com writes: Git works correctly under Linux (Ubuntu 12.04; git 1.7.9.5). I've attached the strace outputs. (Note: for reasons that are probably irrelevant, I needed to run the commands sudo'd. Shout back if this is an issue). Under Windows 7, Cygwin git 1.7.9, commit fails: U:\foogit commit -m added foo2 error: unable to find 0b89efdeef245ed6a0a7eacc5c578629a141f856 fatal: 0b89efdeef245ed6a0a7eacc5c578629a141f856 is not a valid object For what it's worth, note that the file does exist. U:\fools -l .git/objects/0b total 1024 -rwxrw-r-- 1 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 (I'm not sure why the permissions are trashed. Seems to be a Cygwin thing, or maybe my Cygwin config. The ?? also appears on local files, and I believe also with files on the old Buffalo drive, so I don't think it is relevant to the problem). Just in case, here's the same dir, as seen from the Ubuntu VM: deg@ubuntu:/mnt/users/foo$ ls -l .git/objects/0b total 64 -rwxr-xr-x 0 root root 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 Again, note that there is some user permissions lossage here. I don't know enough about Linux mount or CIFS, and apparently did the mount in a way that everything seems to appear to be stuck owned by root. (same problem I hinted at above). Hope this is not relevant to the problem. Here's how the same directory looks, when I'm ssh'd into the NAS box itself: CentralPark:/shares/Users/foo# ls -l .git/objects/0b total 64 -rwxrw-r-- 1 deg share 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 In any event, the symptoms don't seem to be a permissions problem, so all this extra info is probably just a red herring, I hope. Hrm. What about what Jeff already asked of the OP (and AFAICS never got a reply)? } If it's a race condition between the write and the subsequent read in } the same process, then it would be solved by looking at the object } later. Does git cat-file -p 6838761d549cf76033d2e9faf5954e62839eb25d } work, or is the object forever inaccessible? In your case: git cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: jn/config-ignore-inaccessible (Re: What's cooking in git.git (Apr 2013, #10; Mon, 29))
Jonathan Nieder jrnie...@gmail.com writes: Thomas Rast wrote: Junio C Hamano gits...@pobox.com writes: * jn/config-ignore-inaccessible (2013-04-15) 1 commit - config: allow inaccessible configuration under $HOME When $HOME is misconfigured to point at an unreadable directory, we used to complain and die. This loosens the check. I do not think we agreed that this is a good idea, though. As a data point: yesterday on IRC, two users complained that they each had this problem. http://colabti.org/irclogger/irclogger_log/git?date=2013-05-03#l3022 http://colabti.org/irclogger/irclogger_log/git?date=2013-05-03#l3111 I think the approach taken in the patch above is a good one. If /etc/gitconfig contains important configuration, it is still not ignored, errors other than permissions reading ~/.gitconfig are still fatal, and permissions errors accessing ~/.gitconfig are no longer fatal because they are expected as something very common in normal setups. I haven't been able to convince myself there is a different, better behavior to be found. Special-casing inaccessible $HOME while still forbidding inaccessible $HOME/.config/git and $HOME/.gitconfig would seem strange. What I found iffy about all of it is that the current failures happen really late: they prevent the children spawned by the daemon for repo handling from doing any useful work, while the daemon itself chugs along nicely. Wouldn't it be better to (attempt to) reload configs immediately after switching to the new user/group, and then either warn or exit? -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: trouble on windows network share
David Goldfarb d...@degel.com writes: Looks like it works. From the windows machine: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foo Double-checking that nothing was fixed or changed when I earlier committed the file from Linux, here's a second test: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foogit status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # trace1 # trace2 nothing added to commit but untracked files present (use git add to track) U:\foogit add trace1 U:\foogit commit trace1 -m testing Note that specifying 'trace1' here is redundant, since you already added it and had no other staged changes. Perhaps you can re-run a test like this without the extra argument for comparison. That would tell us if it matters that the write and read happen in the same process. error: unable to find cecae5b4c87ea21aef513fcfcd5c27fe87e0536f fatal: cecae5b4c87ea21aef513fcfcd5c27fe87e0536f is not a valid object U:\foogit cat-file -p cecae5b4c87ea21aef513fcfcd5c27fe87e0536f 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 100644 blob 19102815663d23f8b75a47e7a01965dcdc96468ctest.txt 100644 blob c9009b02950964cf1d5281125e6e2f647dd9dc16trace1 I'm inclined to just say that your FS is crazy. What's unsatisfactory is that we already have a bunch of crazy FS workarounds in move_temp_to_file(), which is one obvious candidate for what is going on here. So this wouldn't be something new; just another craziness to work around. For example, you could test the theory that rename() has something to do with it by patching this into move_temp_to_file(): diff --git i/sha1_file.c w/sha1_file.c index 67e815b..22af015 100644 --- i/sha1_file.c +++ w/sha1_file.c @@ -2635,6 +2635,10 @@ int move_temp_to_file(const char *tmpfile, const char *filename) /* FIXME!!! Collision check here ? */ } + if (access(filename, R_OK) 0) + return error(access(%s, R_OK) failed immediately after rename(): %s, +filename, strerror(errno)); + out: if (adjust_shared_perm(filename)) return error(unable to set permission to '%s', filename); -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: trouble on windows network share
Ok. Continuing in the same shell: U:\foogit status # On branch master # Changes to be committed: # (use git reset HEAD file... to unstage) # # new file: trace1 # # Untracked files: # (use git add file... to include in what will be committed) # # trace2 U:\foogit commit -m test b error: Trying to write ref refs/heads/master with nonexistent object 428dafc292b8396fc7b3c7f692dc9dfe1196a40a fatal: cannot update HEAD ref U:\foodir /s *a40a Volume in drive U is Users Volume Serial Number is FC97-3DA5 Directory of U:\foo\.git\objects\42 05/06/2013 12:26 PM 152 8dafc292b8396fc7b3c7f692dc9dfe1196a40a 1 File(s)152 bytes Total Files Listed: 1 File(s)152 bytes 0 Dir(s) 1,594,477,576,192 bytes free U:\foodate The current date is: Mon 05/06/2013 Enter the new date: (mm-dd-yy) U:\footime The current time is: 13:10:49.55 Enter the new time: U:\foogit cat-file -p 428dafc292b8396fc7b3c7f692dc9dfe1196a40a tree cecae5b4c87ea21aef513fcfcd5c27fe87e0536f parent f6a0de35d12a3b58f12bf1af4ff629b8b004ad82 author David Goldfarb d...@degel.com 1367834997 +0300 committer David Goldfarb d...@degel.com 1367834997 +0300 test b U:\fooU:\foo David -Original Message- From: Thomas Rast [mailto:tr...@inf.ethz.ch] Sent: Monday, May 06, 2013 1:08 PM To: David Goldfarb Cc: git@vger.kernel.org Subject: Re: trouble on windows network share David Goldfarb d...@degel.com writes: Looks like it works. From the windows machine: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foo Double-checking that nothing was fixed or changed when I earlier committed the file from Linux, here's a second test: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foogit status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # trace1 # trace2 nothing added to commit but untracked files present (use git add to track) U:\foogit add trace1 U:\foogit commit trace1 -m testing Note that specifying 'trace1' here is redundant, since you already added it and had no other staged changes. Perhaps you can re-run a test like this without the extra argument for comparison. That would tell us if it matters that the write and read happen in the same process. error: unable to find cecae5b4c87ea21aef513fcfcd5c27fe87e0536f fatal: cecae5b4c87ea21aef513fcfcd5c27fe87e0536f is not a valid object U:\foogit cat-file -p cecae5b4c87ea21aef513fcfcd5c27fe87e0536f 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 100644 blob 19102815663d23f8b75a47e7a01965dcdc96468ctest.txt 100644 blob c9009b02950964cf1d5281125e6e2f647dd9dc16trace1 I'm inclined to just say that your FS is crazy. What's unsatisfactory is that we already have a bunch of crazy FS workarounds in move_temp_to_file(), which is one obvious candidate for what is going on here. So this wouldn't be something new; just another craziness to work around. For example, you could test the theory that rename() has something to do with it by patching this into move_temp_to_file(): diff --git i/sha1_file.c w/sha1_file.c index 67e815b..22af015 100644 --- i/sha1_file.c +++ w/sha1_file.c @@ -2635,6 +2635,10 @@ int move_temp_to_file(const char *tmpfile, const char *filename) /* FIXME!!! Collision check here ? */ } + if (access(filename, R_OK) 0) + return error(access(%s, R_OK) failed immediately after rename(): %s, +filename, strerror(errno)); + out: if (adjust_shared_perm(filename)) return error(unable to set permission to '%s', filename); -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On 05/03/2013 08:23 PM, Felipe Contreras wrote: On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote: Felipe Contreras felipe.contre...@gmail.com writes: How do we know that this doesn't break any users of fast-import? Your comment isn't very reassuring: the vast majority of them will never be used again So what's with the minority? Actually I don't think there's any minority. If the client program doesn't store blobs, the blob marks are not used anyway. So there's no change. I haven't been following this conversation in detail, but your proposed change sounds like something that would break cvs2git [1]. Let me explain what cvs2git does and why: CVS stores all of the revisions of a single file in a single filename,v file in rcsfile(5) format. The revisions are stored as deltas ordered so that a single revision can be reconstructed from a single serial read of the file. cvs2git reads each of these files once, reconstructing *all* of the revisions for a file in a single go. It then pours them into a git-fast-import stream as blobs and sets a mark on each blob. Only much later in the conversion does it have enough information to reconstruct tree-wide commits. At that time it outputs git-fast-import data (to a second file) defining the git commits and their ancestry. The contents are defined by referring to the marks of blobs from the first git-fast-import stream file. This strategy speeds up the conversion *enormously*. So if I understand correctly that you are proposing to stop allowing marks on blob objects to be set and/or referred to later, then I object vociferously. If I've misunderstood then I'll go back into my hole :-) Michael [1] http://cvs2svn.tigris.org/cvs2git.html -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
Michael Haggerty mhag...@alum.mit.edu writes: On 05/03/2013 08:23 PM, Felipe Contreras wrote: On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote: Felipe Contreras felipe.contre...@gmail.com writes: How do we know that this doesn't break any users of fast-import? Your comment isn't very reassuring: the vast majority of them will never be used again So what's with the minority? Actually I don't think there's any minority. If the client program doesn't store blobs, the blob marks are not used anyway. So there's no change. I haven't been following this conversation in detail, but your proposed change sounds like something that would break cvs2git [1]. Let me explain what cvs2git does and why: CVS stores all of the revisions of a single file in a single filename,v file in rcsfile(5) format. The revisions are stored as deltas ordered so that a single revision can be reconstructed from a single serial read of the file. cvs2git reads each of these files once, reconstructing *all* of the revisions for a file in a single go. It then pours them into a git-fast-import stream as blobs and sets a mark on each blob. Only much later in the conversion does it have enough information to reconstruct tree-wide commits. At that time it outputs git-fast-import data (to a second file) defining the git commits and their ancestry. The contents are defined by referring to the marks of blobs from the first git-fast-import stream file. This strategy speeds up the conversion *enormously*. So if I understand correctly that you are proposing to stop allowing marks on blob objects to be set and/or referred to later, then I object vociferously. The proposed patch wants to stop writing marks (in --export-marks) for anything but commits. Does cvs2git depend on that? I.e., are you using two separate fast-import processes for the blob and tree/commit phases you describe above? -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On 05/06/2013 12:32 PM, Thomas Rast wrote: Michael Haggerty mhag...@alum.mit.edu writes: On 05/03/2013 08:23 PM, Felipe Contreras wrote: On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote: Felipe Contreras felipe.contre...@gmail.com writes: How do we know that this doesn't break any users of fast-import? Your comment isn't very reassuring: the vast majority of them will never be used again So what's with the minority? Actually I don't think there's any minority. If the client program doesn't store blobs, the blob marks are not used anyway. So there's no change. I haven't been following this conversation in detail, but your proposed change sounds like something that would break cvs2git [1]. Let me explain what cvs2git does and why: CVS stores all of the revisions of a single file in a single filename,v file in rcsfile(5) format. The revisions are stored as deltas ordered so that a single revision can be reconstructed from a single serial read of the file. cvs2git reads each of these files once, reconstructing *all* of the revisions for a file in a single go. It then pours them into a git-fast-import stream as blobs and sets a mark on each blob. Only much later in the conversion does it have enough information to reconstruct tree-wide commits. At that time it outputs git-fast-import data (to a second file) defining the git commits and their ancestry. The contents are defined by referring to the marks of blobs from the first git-fast-import stream file. This strategy speeds up the conversion *enormously*. So if I understand correctly that you are proposing to stop allowing marks on blob objects to be set and/or referred to later, then I object vociferously. The proposed patch wants to stop writing marks (in --export-marks) for anything but commits. Does cvs2git depend on that? I.e., are you using two separate fast-import processes for the blob and tree/commit phases you describe above? Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. So the proposed change would break a documented use of cvs2git. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: trouble on windows network share
Oops, I earlier missed the second half of your message, where you suggested that I patch move_temp_to_file(). Sorry, I don't have the tool-chain or knowledge to recompile git into this win/Cygwin environment. What I can do, realistically, is: - Follow exact instructions. - (with some obvious hesitation) run executables that you send me. - Continue whatever tests you want with my existing git. Note, too, that: - This is occurring on a standard, off-the-shelf, NAS box, the Western Digital My Book Live - I would guess that the craziness has more to do with the box's network stack than with its FS, since it behaves fine when mounted on Linux, but not on Windows. David -Original Message- From: David Goldfarb Sent: Monday, May 06, 2013 1:13 PM To: 'Thomas Rast' Cc: git@vger.kernel.org Subject: RE: trouble on windows network share Ok. Continuing in the same shell: U:\foogit status # On branch master # Changes to be committed: # (use git reset HEAD file... to unstage) # # new file: trace1 # # Untracked files: # (use git add file... to include in what will be committed) # # trace2 U:\foogit commit -m test b error: Trying to write ref refs/heads/master with nonexistent object 428dafc292b8396fc7b3c7f692dc9dfe1196a40a fatal: cannot update HEAD ref U:\foodir /s *a40a Volume in drive U is Users Volume Serial Number is FC97-3DA5 Directory of U:\foo\.git\objects\42 05/06/2013 12:26 PM 152 8dafc292b8396fc7b3c7f692dc9dfe1196a40a 1 File(s)152 bytes Total Files Listed: 1 File(s)152 bytes 0 Dir(s) 1,594,477,576,192 bytes free U:\foodate The current date is: Mon 05/06/2013 Enter the new date: (mm-dd-yy) U:\footime The current time is: 13:10:49.55 Enter the new time: U:\foogit cat-file -p 428dafc292b8396fc7b3c7f692dc9dfe1196a40a tree cecae5b4c87ea21aef513fcfcd5c27fe87e0536f parent f6a0de35d12a3b58f12bf1af4ff629b8b004ad82 author David Goldfarb d...@degel.com 1367834997 +0300 committer David Goldfarb d...@degel.com 1367834997 +0300 test b U:\fooU:\foo David -Original Message- From: Thomas Rast [mailto:tr...@inf.ethz.ch] Sent: Monday, May 06, 2013 1:08 PM To: David Goldfarb Cc: git@vger.kernel.org Subject: Re: trouble on windows network share David Goldfarb d...@degel.com writes: Looks like it works. From the windows machine: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foo Double-checking that nothing was fixed or changed when I earlier committed the file from Linux, here's a second test: U:\foogit cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 U:\foogit status # On branch master # Untracked files: # (use git add file... to include in what will be committed) # # trace1 # trace2 nothing added to commit but untracked files present (use git add to track) U:\foogit add trace1 U:\foogit commit trace1 -m testing Note that specifying 'trace1' here is redundant, since you already added it and had no other staged changes. Perhaps you can re-run a test like this without the extra argument for comparison. That would tell us if it matters that the write and read happen in the same process. error: unable to find cecae5b4c87ea21aef513fcfcd5c27fe87e0536f fatal: cecae5b4c87ea21aef513fcfcd5c27fe87e0536f is not a valid object U:\foogit cat-file -p cecae5b4c87ea21aef513fcfcd5c27fe87e0536f 100644 blob b02e7c87fe376a353ea4f014bdb3f5200a946b37foo1 100644 blob 2cbf64f759a62392ad9dfe1fb9c2cdb175876014foo2 100644 blob 19102815663d23f8b75a47e7a01965dcdc96468ctest.txt 100644 blob c9009b02950964cf1d5281125e6e2f647dd9dc16trace1 I'm inclined to just say that your FS is crazy. What's unsatisfactory is that we already have a bunch of crazy FS workarounds in move_temp_to_file(), which is one obvious candidate for what is going on here. So this wouldn't be something new; just another craziness to work around. For example, you could test the theory that rename() has something to do with it by patching this into move_temp_to_file(): diff --git i/sha1_file.c w/sha1_file.c index 67e815b..22af015 100644 --- i/sha1_file.c +++ w/sha1_file.c @@ -2635,6 +2635,10 @@ int move_temp_to_file(const char *tmpfile, const char *filename) /* FIXME!!! Collision check here ? */ } + if (access(filename, R_OK) 0) + return error(access(%s, R_OK) failed immediately after
Re: another packed-refs race
On 05/03/2013 10:38 AM, Jeff King wrote: I found another race related to the packed-refs code. Consider for a moment what happens when we are looking at refs and another process does a simultaneous git pack-refs --all --prune, updating packed-refs and deleting the loose refs. If we are resolving a single ref, then we will either find its loose form or not. If we do, we're done. If not, we can fall back on what was in the packed-refs file. If we read the packed-refs file at that point, we're OK. If the loose ref existed before but was pruned before we could read it, then we know the packed-refs file is already in place, because pack-refs would not have deleted the loose ref until it had finished writing the new file. But imagine that we already loaded the packed-refs file into memory earlier. We may be looking at an _old_ version of it that has an irrelevant sha1 from the older packed-refs file. Or it might not even exist in the packed-refs file at all, and we think the ref does not resolve. We could fix this by making sure our packed-refs file is up to date s/file/cache/ before using it. E.g., resolving a ref with the following sequence: 1. Look for loose ref. If found, OK. 2. Compare inode/size/mtime/etc of on-disk packed-refs to their values from the last time we loaded it. If they're not the same, reload packed-refs from disk. Otherwise, continue. 3. Look for ref in in-memory packed-refs. Not too bad. We introduce one extra stat() for a ref that has been packed, and the scheme isn't very complicated. Let me think out loud alongside your analysis... By this mechanism the reader can ensure that it never uses a version of the packed-refs file that is older than its information that the corresponding loose ref is absent from the filesystem. This is all assuming that the filesystem accesses have a defined order; how is that guaranteed? pack_refs() and commit_ref() both rely on commit_lock_file(), which calls close(fd) on the lockfile rename(lk-filename, result_file) prune_ref() locks the ref, verifies that its SHA-1 is unchanged, then calls unlink(), then rollback_lock_file(). The atomicity of rename() guarantees that a reader sees either the old or the new version of the file in question. But what guarantees are there about accesses across two files? Suppose we start with ref foo that exists only as a loose ref, and we have a pack-refs process doing write packed-refs with foo commit_lock_file() for packed-refs read loose ref foo and verify that its SHA-1 is unchanged unlink() loose ref foo while another process is trying to read the reference: look for loose ref foo read packed-refs Is there any guarantee that the second process can't see the loose ref foo as being missing but nevertheless read the old version of packed-refs? I'm not strong enough on filesystem semantics to answer that question. But what about enumerating refs via for_each_ref? It's possible to have the same problem there, and the packed-refs file may be moved into place midway through the process of enumerating the loose refs. So we may see refs/heads/master, but when we get to refs/remotes/origin/master, it has now been packed and pruned. Yes. I _think_ we can get by with: 1. Generate the complete sorted list of loose refs. 2. Check that packed-refs is stat-clean, and reload if necessary, as above. 3. Merge the sorted loose and packed lists, letting loose override packed (so even if we get repacked halfway through our loose traversal and get half of the refs there, it's OK to see duplicates in packed-refs, which we will ignore). This is not very far off of what we do now. Obviously we don't do the stat-clean check in step 2. But we also don't generate the complete list of loose refs before hitting the packed-refs file. Instead we lazily load the loose directories as needed. And of course we cache that information in memory, even though it may go out of date. I think the best we could do is keep a stat() for each individual directory we see, and check it before using the in-memory contents. That may be a lot of stats, but it's still better than actually opening each loose ref separately. The loose refs cache is only used by the for_each_ref() functions, not for looking up individual references. Another approach would be to change the top-level for_each_ref() functions to re-stat() all of the loose references within the namespace that interests it, *then* verify that the packed-ref cache is not stale, *then* start the iteration. Then there would be no need to re-stat() files during the iteration. (This would mean that we have to prohibit a second reference iteration from being started while one is already in progress.) Of course, clearing (part of) the loose reference cache invalidates any pointers that other callers might have retained to refnames in the old version of the cache.
Re: another packed-refs race
On 05/03/2013 07:28 PM, Jeff King wrote: On Fri, May 03, 2013 at 11:26:11AM +0200, Johan Herland wrote: You don't really need to be sure that packed-refs is up-to-date. You only need to make sure that don't rely on lazily loading loose refs _after_ you have loaded packed-refs. True. As long as you load them both together, and always make sure you do loose first, you'd be fine. But I think there will be corner cases where you have loaded _part_ of the loose ref namespace. I think part of the point of Michael's ref work is that if you call for_each_tag_ref, we would not waste time loading refs/remotes/ at all. If you then follow that with a call to for_each_ref, you would want to re-use the cached work from traversing refs/tags/, and then traverse refs/remotes/. You know that your cached packed-refs is good with respect to refs/tags/, but you don't with respect to refs/remotes. Correct. [I wonder if there really are a lot of iterations over overlapping parts of the reference namespace within a single git process...] The following solution might work in both the resolve-a-single-ref and enumerating-refs case: 0. Look for ref already cached in memory. If found, OK. 1. Look for loose ref. If found, OK. 2. If not found, load all loose refs and packed-refs from disk (in that order), and store in memory for remainder of this process. Never reload packed-refs from disk (unless you also reload all loose refs first). I think that would be correct (modulo that step 1 cannot happen for enumeration). But we would like to avoid loading all loose refs if we can. Especially on a cold cache, it can be quite slow, and you may not even care about those refs for the current operation (I do not recall the exact original motivation for the lazy loading, but it was something along those lines). Lazy loading was first inspired by the observation that effectively every git invocation was loading *all* loose references to do an iteration over refs/replace/ (which I've never even used!) This was absolutely killing the performance of filter-branch, which creates a lot of loose references and invokes git many times--even though the cache was warm. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
Hi Thomas, On Mon, 6 May 2013, Thomas Rast wrote: The proposed patch wants to stop writing marks (in --export-marks) for anything but commits. Then it should not go in. Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote: We don't care about blobs, or any object other than commits, but in order to find the type of object, we are parsing the whole thing, which is slow, specially in big repositories with lots of big files. I did a double-take on reading this subject line and first paragraph, thinking surely fast-export needs to actually output blobs?. Reading the patch, I see that this is only about not bothering to load blob marks from --import-marks. It might be nice to mention that in the commit message, which is otherwise quite confusing. I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. I think one argument could be if we write a mark for blob X, we will also have written a mark for commit Y which contains it; on subsequent runs, we will just show the mark for Y in the first place, and not even care about showing X (as a part of Y) either way. We would only refer to the mark for X if it appears as part of a different commit, but that is a rare case not worth worrying about. Does that match your reasoning? Before this, loading the objects of a fresh emacs import, with 260598 blobs took 14 minutes, after this patch, it takes 3 seconds. Presumably most of that speed improvement comes from not parsing the blob objects. I wonder if you could get similar speedups by applying the do not bother parsing rule from your patch 3. You would still incur some cost to create a struct blob, but it may or may not be measurable. That would mean we get the case not worth worrying about from above for free. I doubt it would make that big a difference, though, given the rarity of it. So I am OK with it either way. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Segfault in git merge-tree (1.8.2.2)
Hi, I'm getting a segfault in git merge-tree using v1.8.2.2 on MacOS 10.8.3. I can't share the repo, but I can build patches and check if they fix the problem :) Here's a bt: (gdb) run Starting program: /usr/local/Cellar/git/1.8.2.2/libexec/git-core/git-merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master Reading symbols for shared libraries +.. done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x00010003e532 in threeway_callback () (gdb) bt #0 0x00010003e532 in threeway_callback () #1 0x0001000da03b in traverse_trees () #2 0x00010003e46b in threeway_callback () #3 0x0001000da03b in traverse_trees () #4 0x00010003e46b in threeway_callback () #5 0x0001000da03b in traverse_trees () #6 0x00010003e46b in threeway_callback () #7 0x0001000da03b in traverse_trees () #8 0x00010003e46b in threeway_callback () #9 0x0001000da03b in traverse_trees () #10 0x00010003e46b in threeway_callback () #11 0x0001000da03b in traverse_trees () #12 0x00010003e46b in threeway_callback () #13 0x0001000da03b in traverse_trees () #14 0x00010003e46b in threeway_callback () #15 0x0001000da03b in traverse_trees () #16 0x00010003e46b in threeway_callback () #17 0x0001000da03b in traverse_trees () #18 0x00010003e46b in threeway_callback () #19 0x0001000da03b in traverse_trees () #20 0x00010003e46b in threeway_callback () #21 0x0001000da03b in traverse_trees () #22 0x00010003e46b in threeway_callback () #23 0x0001000da03b in traverse_trees () #24 0x00010003df02 in cmd_merge_tree () #25 0x00010e99 in handle_internal_command () #26 0x00010c59 in main () -Andreas -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: trouble on windows network share
-Original Message- From: Thomas Rast Sent: Monday, May 06, 2013 5:42 AM David Goldfarb d...@degel.com writes: Git works correctly under Linux (Ubuntu 12.04; git 1.7.9.5). I've attached the strace outputs. (Note: for reasons that are probably irrelevant, I needed to run the commands sudo'd. Shout back if this is an issue). Under Windows 7, Cygwin git 1.7.9, commit fails: U:\foogit commit -m added foo2 error: unable to find 0b89efdeef245ed6a0a7eacc5c578629a141f856 fatal: 0b89efdeef245ed6a0a7eacc5c578629a141f856 is not a valid object For what it's worth, note that the file does exist. U:\fools -l .git/objects/0b total 1024 -rwxrw-r-- 1 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 (I'm not sure why the permissions are trashed. Seems to be a Cygwin thing, or maybe my Cygwin config. The ?? also appears on local files, and I believe also with files on the old Buffalo drive, so I don't think it is relevant to the problem). Just in case, here's the same dir, as seen from the Ubuntu VM: deg@ubuntu:/mnt/users/foo$ ls -l .git/objects/0b total 64 -rwxr-xr-x 0 root root 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 Again, note that there is some user permissions lossage here. I don't know enough about Linux mount or CIFS, and apparently did the mount in a way that everything seems to appear to be stuck owned by root. (same problem I hinted at above). Hope this is not relevant to the problem. Here's how the same directory looks, when I'm ssh'd into the NAS box itself: CentralPark:/shares/Users/foo# ls -l .git/objects/0b total 64 -rwxrw-r-- 1 deg share 74 May 5 01:15 89efdeef245ed6a0a7eacc5c578629a141f856 In any event, the symptoms don't seem to be a permissions problem, so all this extra info is probably just a red herring, I hope. Hrm. What about what Jeff already asked of the OP (and AFAICS never got a reply)? If referring to me, then yes but it was too big for the list. } If it's a race condition between the write and the subsequent read in } the same process, then it would be solved by looking at the object } later. Does git cat-file -p 6838761d549cf76033d2e9faf5954e62839eb25d } work, or is the object forever inaccessible? In your case: git cat-file -p 0b89efdeef245ed6a0a7eacc5c578629a141f856 smime.p7s Description: S/MIME cryptographic signature
Re: Segfault in git merge-tree (1.8.2.2)
On Mon, May 06, 2013 at 03:02:10PM +0200, Andreas Jacobsen wrote: I'm getting a segfault in git merge-tree using v1.8.2.2 on MacOS 10.8.3. I can't share the repo, but I can build patches and check if they fix the problem :) Can you rebuild with debugging information and try the backtrace again? Something like: make CFLAGS='-O0 -g' Then use the git in bin-wrappers/. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] remove the impression of unexpectedness when access is denied
Jonathan Nieder jrnie...@gmail.com writes: I ran into this message for the first time today. $ git fetch --all Fetching origin remote: Counting objects: 368, done. [...] Fetching gitk fatal: Could not read from remote repository. Please make sure you have the correct access rights and the repository exists. error: Could not fetch gitk Fetching debian Fetching pape [...] The gitk remote refers to git://git.kernel.org/pub/scm/gitk/gitk. Using ls-remote to contact it produces the same result. The message is correct: the repository does not exist. Impressions: * Looking at Could not read, it is not clear what could not read and why. GIT_TRACE_PACKET tells me the interaction was me git-upload-pack /pub/scm/gitk/gitk\0host=git.kernel.org\0 them (hangup) Would it make sense for the server to send an ERR packet to give a more helpful diagnosis? I think git-daemon does so (or at least attempts to do so); path_ok() uses enter_repo() to check if the given path is a repository, returns NULL to run_service(), whichh in turn calls daemon_error() that does the ERR thing. * The spacing and capitalization is odd and makes it not flow well with the rest of the output. I suspect it would be easier to read with the error separated from hints: Fetching gitk fatal: the remote server sent an empty response hint: does the repository exist? hint: do you have the correct access rights? error: Could not fetch gitk Fetching debian If a server is misconfigured and just decides to send an empty response for no good reason, the output would still be true. It does sound better. Also s/Could not fetch/could not fetch/. * The error message is the same whether the server returned no response or an incomplete pkt-line. Maybe in the latter case it should print the hung up unexpectedly thing. OK. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] contrib/hooks/post-receive-email: get description from repo.git/config
Hello, The included patch attempts to improve post-receive-email. It's a trivial change, but one that helps us Gitolite users. Comments are welcome, and this is my first attempt at contributing to the Git project. Please keep me on CC as I'm not on the list. From 878a7af9088e2bcc3afc9b09b9023f1f188c844b Mon Sep 17 00:00:00 2001 From: Trond Hasle Amundsen t.h.amund...@usit.uio.no Date: Mon, 6 May 2013 15:41:25 +0200 Subject: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config When getting the project description, we first try gitweb.description entry in repo.git/config, but repo.git/description takes precedence if it exists. This behaviour mimics that of Gitweb, and is what we want when using Gitolite, which deletes repo.git/description upon repo creation and rather maintains a gitweb.description entry in repo.git/config if a description is configured. Signed-off-by: Trond Hasle Amundsen t.h.amund...@usit.uio.no --- contrib/hooks/post-receive-email |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 0e5b72d..6ce046a 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -714,7 +714,14 @@ if [ -z $GIT_DIR ]; then exit 1 fi -projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +# Get the repo's description. First try gitweb.description entry in +# repo.git/config, but repo.git/description takes precedence if it +# exists. This behaviour mimics that of Gitweb. +projectdesc=$(git config gitweb.description) +if [ -f $GIT_DIR/description ]; then +projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +fi + # Check if the description is unchanged from it's default, and shorten it to # a more manageable length if it is if expr $projectdesc : Unnamed repository.*$ /dev/null -- 1.7.1 Cheers, -- Trond H. Amundsen t.h.amund...@usit.uio.no Center for Information Technology Services, University of Oslo
Re: Segfault in git merge-tree (1.8.2.2)
Sure, here you go, this time built from the HEAD I found on github (7d3ccdffb5d28970dd7a4d177cfcca690ccd0c22) with: NO_GETTEXT=1 make prefix=/usr/local/Cellar/git/HEAD CC=cc CFLAGS='-O0 -g' install (this is homebrew's setup, but I built it manually rather than using the recipe.) And the gdb output: (gdb) set args merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master (gdb) run Starting program: /usr/local/bin/git merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master Reading symbols for shared libraries +.. done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x00010006f1a3 in add_merge_entry (entry=0x100432ba0) at builtin/merge-tree.c:24 24 *merge_result_end = entry; (gdb) bt #0 0x00010006f1a3 in add_merge_entry (entry=0x100432ba0) at builtin/merge-tree.c:24 #1 0x00010006ee24 in unresolved (info=0x7fff5fbfd508, n=0x1004367c0) at builtin/merge-tree.c:270 #2 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=0, entry=0x1004367c0, info=0x7fff5fbfd508) at builtin/merge-tree.c:330 #3 0x00010018eef2 in traverse_trees (n=3, t=0x7fff5fbfd5d0, info=0x7fff5fbfd508) at tree-walk.c:407 #4 0x00010006e8e8 in merge_trees_recursive (t=0x7fff5fbfd5d0, base=0x100436780 bibliotek/httpservice/src/main/java/no/ok/it/lib/http/service, df_conflict=1) at builtin/merge-tree.c:341 #5 0x00010006f082 in unresolved_directory (info=0x7fff5fbfd7d8, n=0x100436660, df_conflict=1) at builtin/merge-tree.c:221 #6 0x00010006ed2c in unresolved (info=0x7fff5fbfd7d8, n=0x100436660) at builtin/merge-tree.c:258 #7 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=4, entry=0x100436660, info=0x7fff5fbfd7d8) at builtin/merge-tree.c:330 #8 0x00010018eef2 in traverse_trees (n=3, t=0x7fff5fbfd8a0, info=0x7fff5fbfd7d8) at tree-walk.c:407 #9 0x00010006e8e8 in merge_trees_recursive (t=0x7fff5fbfd8a0, base=0x100435e70 bibliotek/httpservice/src/main/java/no/ok/it/lib/http, df_conflict=1) at builtin/merge-tree.c:341 #10 0x00010006f082 in unresolved_directory (info=0x7fff5fbfdaa8, n=0x100435d50, df_conflict=1) at builtin/merge-tree.c:221 #11 0x00010006ed2c in unresolved (info=0x7fff5fbfdaa8, n=0x100435d50) at builtin/merge-tree.c:258 #12 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=4, entry=0x100435d50, info=0x7fff5fbfdaa8) at builtin/merge-tree.c:330 #13 0x00010018eef2 in traverse_trees (n=3, t=0x7fff5fbfdb70, info=0x7fff5fbfdaa8) at tree-walk.c:407 #14 0x00010006e8e8 in merge_trees_recursive (t=0x7fff5fbfdb70, base=0x100435d10 bibliotek/httpservice/src/main/java/no/ok/it/lib, df_conflict=1) at builtin/merge-tree.c:341 #15 0x00010006f082 in unresolved_directory (info=0x7fff5fbfdd78, n=0x100436450, df_conflict=1) at builtin/merge-tree.c:221 #16 0x00010006ed2c in unresolved (info=0x7fff5fbfdd78, n=0x100436450) at builtin/merge-tree.c:258 #17 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=4, entry=0x100436450, info=0x7fff5fbfdd78) at builtin/merge-tree.c:330 #18 0x00010018eef2 in traverse_trees (n=3, t=0x7fff5fbfde40, info=0x7fff5fbfdd78) at tree-walk.c:407 #19 0x00010006e8e8 in merge_trees_recursive (t=0x7fff5fbfde40, base=0x100436400 bibliotek/httpservice/src/main/java/no/ok/it, df_conflict=1) at builtin/merge-tree.c:341 #20 0x00010006f082 in unresolved_directory (info=0x7fff5fbfe048, n=0x100433c80, df_conflict=1) at builtin/merge-tree.c:221 #21 0x00010006ed2c in unresolved (info=0x7fff5fbfe048, n=0x100433c80) at builtin/merge-tree.c:258 #22 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=4, entry=0x100433c80, info=0x7fff5fbfe048) at builtin/merge-tree.c:330 #23 0x00010018eef2 in traverse_trees (n=3, t=0x7fff5fbfe110, info=0x7fff5fbfe048) at tree-walk.c:407 #24 0x00010006e8e8 in merge_trees_recursive (t=0x7fff5fbfe110, base=0x100433c50 bibliotek/httpservice/src/main/java/no/ok, df_conflict=1) at builtin/merge-tree.c:341 #25 0x00010006f082 in unresolved_directory (info=0x7fff5fbfe318, n=0x10042d2a0, df_conflict=1) at builtin/merge-tree.c:221 #26 0x00010006ed2c in unresolved (info=0x7fff5fbfe318, n=0x10042d2a0) at builtin/merge-tree.c:258 #27 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=4, entry=0x10042d2a0, info=0x7fff5fbfe318) at builtin/merge-tree.c:330 #28 0x00010018eef2 in traverse_trees (n=3, t=0x7fff5fbfe3e0, info=0x7fff5fbfe318) at tree-walk.c:407 #29 0x00010006e8e8 in merge_trees_recursive (t=0x7fff5fbfe3e0, base=0x10042d250 bibliotek/httpservice/src/main/java/no, df_conflict=1) at builtin/merge-tree.c:341 #30 0x00010006f082 in unresolved_directory (info=0x7fff5fbfe5e8, n=0x100435750, df_conflict=1) at builtin/merge-tree.c:221 #31 0x00010006ed2c in unresolved (info=0x7fff5fbfe5e8, n=0x100435750) at builtin/merge-tree.c:258 #32 0x00010006eaa2 in threeway_callback (n=3, mask=4, dirmask=4, entry=0x100435750,
Re: Segfault in git merge-tree (1.8.2.2)
On Mon, May 06, 2013 at 04:13:28PM +0200, Andreas Jacobsen wrote: Sure, here you go, this time built from the HEAD I found on github (7d3ccdffb5d28970dd7a4d177cfcca690ccd0c22) with: NO_GETTEXT=1 make prefix=/usr/local/Cellar/git/HEAD CC=cc CFLAGS='-O0 -g' install (this is homebrew's setup, but I built it manually rather than using the recipe.) And the gdb output: (gdb) set args merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master (gdb) run Starting program: /usr/local/bin/git merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master Reading symbols for shared libraries +.. done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x00010006f1a3 in add_merge_entry (entry=0x100432ba0) at builtin/merge-tree.c:24 24 *merge_result_end = entry; Thanks. I have an idea of what's going on, but the set up is in an earlier pass and it only fails the next time it gets into add_merge_entry. Can you try adding the following patch on top? Hopefully the added debug is in the right caller, otherwise the new assert at the top will point us to the right one. -- 8 -- diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index ec49917..8eebab7 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -23,6 +23,7 @@ static void add_merge_entry(struct merge_list *entry) { *merge_result_end = entry; merge_result_end = entry-next; + assert(merge_result_end); } static void merge_trees_recursive(struct tree_desc t[3], const char *base, int df_conflict); @@ -267,6 +268,12 @@ static void unresolved(const struct traverse_info *info, struct name_entry n[3]) if (n[0].mode !S_ISDIR(n[0].mode)) entry = link_entry(1, info, n + 0, entry); + if (!entry) { + fprintf(stderr, n[0].mode = %d\nn[1].mode = %d\nn[2].mode = %d\n, + n[0].mode, n[1].mode, n[2].mode); + assert(FALSE); + } + add_merge_entry(entry); } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Segfault in git merge-tree (1.8.2.2)
On Mon, May 06, 2013 at 03:29:23PM +0100, John Keeping wrote: On Mon, May 06, 2013 at 04:13:28PM +0200, Andreas Jacobsen wrote: Sure, here you go, this time built from the HEAD I found on github (7d3ccdffb5d28970dd7a4d177cfcca690ccd0c22) with: NO_GETTEXT=1 make prefix=/usr/local/Cellar/git/HEAD CC=cc CFLAGS='-O0 -g' install (this is homebrew's setup, but I built it manually rather than using the recipe.) And the gdb output: (gdb) set args merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master (gdb) run Starting program: /usr/local/bin/git merge-tree 027058e6ac8d03e029c4e1455bf90f63cd20e65b FETCH_HEAD master Reading symbols for shared libraries +.. done Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x 0x00010006f1a3 in add_merge_entry (entry=0x100432ba0) at builtin/merge-tree.c:24 24 *merge_result_end = entry; Thanks. I have an idea of what's going on, but the set up is in an earlier pass and it only fails the next time it gets into add_merge_entry. Can you try adding the following patch on top? Hopefully the added debug is in the right caller, otherwise the new assert at the top will point us to the right one. Actually, don't worry about it. I've tracked down the problem and have a failing test case for t4300. Patch to follow shortly. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Add porcelain command to revert a single staged file to its HEAD state while preserving its staged state
Dimitar Bonev dsbo...@gmail.com writes: [administrivia: please do not drop people out of Cc list] Actually this is not the case as I tried to explain with the 'git commit' followed by 'git checkout HEAD~1 -- targetfile' followed by 'git commit --amend' example. The index and the working dir states are very well related. That invites another question: if it is very well related, why isn't it an option to start from the state you have in the working tree (i.e. doing nothing), or in the index (i.e. git checkout controller)? But the answer does not matter that much in the end. Lets imagine that I am adding an MVC feature X - it could be implemented with 3 or more files. If I stage the files and came up with an idea that requires a major rewrite of one of these files - lets say the controller one - then it is more convenient to checkout the file's HEAD state and build on top of it - I had been doing just that right before staging So you have satisfactory changes for view and model in the index, and changes to the controller that maybe is OK but you think it could be improved, and you want to go back to the clean slate to rethink the approach only for the controller part? I think the story is essentially the same. Let's say you did git diff HEAD controller | git apply -R edit controller to get rid of the changes in the working tree, further worked on the controller part, and came up with a different implementation. Then you would presumably do git add controller but at that point you would lose the maybe OK version you had in the index. What if you think the version in the working tree might end up being better than maybe OK but can still be improved further? You never git add until the working tree version gets into truly a better shape? What happens fairly often is that you end up having more than _one_ versions that are both better than the version from the HEAD but cannot immediately say one is clearly better than the others in all counts, and at some point, you would need more than one intermediate states (while the index can have only one), to keep these competing versions. The next thing you would want to do is to take a deep breath and pick better parts from these good versions to come up with the single final version. Keeping one good version in the index does not let you do so. ... commits - I treat them as finished work. I think people should learn to take the biggest advantage of using a distributed source control system, which is that commits do not have to be finished work, until you choose to declare they are done and push them out for others to see. Fear of commitment is a disease we do not have to suffer once we graduated centralized systems ;-). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Jeff King p...@peff.net writes: On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote: We don't care about blobs, or any object other than commits, but in order to find the type of object, we are parsing the whole thing, which is slow, specially in big repositories with lots of big files. I did a double-take on reading this subject line and first paragraph, thinking surely fast-export needs to actually output blobs?. Reading the patch, I see that this is only about not bothering to load blob marks from --import-marks. It might be nice to mention that in the commit message, which is otherwise quite confusing. I had the same reaction first, but not writing the blob _objects_ out to the output stream would not make any sense, so it was fairly easy to guess what the author wanted to say ;-). I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. The existing documentation is fairly clear that marks for objects other than commits are not exported, and the import-marks codepath discards anything but commits, so there is no mechanism for the existing fast-export users to leave blob marks in the marks file for later runs of fast-export to take advantage of. The second invocation cannot refer to such a blob in the first place. The story is different on the fast-import side, where we do say we dump the full table and a later run can depend on these marks. By discarding marks on blobs, we may be robbing some optimization possibilities, and by discarding marks on tags, we may be robbing some features, from users of fast-export; we might want to add an option --use-object-marks={blob,commit,tag} or something to both fast-export and fast-import, so that the former can optionally write marks for non-commits out, and the latter can omit non commit marks if the user do not need them. But that is a separate issue. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
Michael Haggerty mhag...@alum.mit.edu writes: Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. So the proposed change would break a documented use of cvs2git. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Thanks for a concise summary. Your use case fits exactly what Felipe conjectured as the nonexistent minority. An option that lets the caller say I only care about marks on these types of objects to be written to (and read from) the exported marks file would help Felipe's use case without harming your use case, and would be a sane and safe way to go. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] merge-tree: handle directory/empty conflict correctly
git-merge-tree causes a null pointer dereference when a directory entry exists in only one or two of the three trees being compared with no corresponding entry in the other tree(s). When this happens, we want to handle the entry as a directory and not attempt to mark it as a file merge. Do this by setting the entries bit in the directory mask when the entry is missing or when it is a directory, only performing the file comparison when we know that a file entry exists. Reported-by: Andreas Jacobsen andr...@andreasjacobsen.com Signed-off-by: John Keeping j...@keeping.me.uk --- Andreas, can you try this patch and see if it fixes your test case? builtin/merge-tree.c | 6 +- t/t4300-merge-tree.sh | 51 +++ 2 files changed, 56 insertions(+), 1 deletion(-) diff --git a/builtin/merge-tree.c b/builtin/merge-tree.c index ec49917..61cbde4 100644 --- a/builtin/merge-tree.c +++ b/builtin/merge-tree.c @@ -251,7 +251,11 @@ static void unresolved(const struct traverse_info *info, struct name_entry n[3]) for (i = 0; i 3; i++) { mask |= (1 i); - if (n[i].mode S_ISDIR(n[i].mode)) + /* +* Treat missing entries as directories so that we return +* after unresolved_directory has handled this. +*/ + if (!n[i].mode || S_ISDIR(n[i].mode)) dirmask |= (1 i); } diff --git a/t/t4300-merge-tree.sh b/t/t4300-merge-tree.sh index 2defb42..9015e47 100755 --- a/t/t4300-merge-tree.sh +++ b/t/t4300-merge-tree.sh @@ -259,6 +259,57 @@ EXPECTED test_cmp expected actual ' +test_expect_success 'tree add A, B (same)' ' + cat expect -\EOF + EOF + git reset --hard initial + mkdir sub + test_commit add sub/file sub/file file add-tree-A + git merge-tree initial add-tree-A add-tree-A actual + test_cmp expect actual +' + +test_expect_success 'tree add A, B (different)' ' + cat expect -\EOF + added in both + our100644 43d5a8ed6ef6c00ff775008633f95787d088285d sub/file + their 100644 ba629238ca89489f2b350e196ca445e09d8bb834 sub/file + @@ -1 +1,5 @@ + + .our +AAA + +=== + +BBB + + .their + EOF + git reset --hard initial + mkdir sub + test_commit add sub/file sub/file AAA add-tree-a-b-A + git reset --hard initial + mkdir sub + test_commit add sub/file sub/file BBB add-tree-a-b-B + git merge-tree initial add-tree-a-b-A add-tree-a-b-B actual + test_cmp expect actual +' + +test_expect_success 'tree unchanged A, removed B' ' + cat expect -\EOF + removed in remote + base 100644 43d5a8ed6ef6c00ff775008633f95787d088285d sub/file + our100644 43d5a8ed6ef6c00ff775008633f95787d088285d sub/file + @@ -1 +0,0 @@ + -AAA + EOF + git reset --hard initial + mkdir sub + test_commit add sub/file sub/file AAA tree-remove-b-initial + git rm sub/file + test_tick + git commit -m remove sub/file + git tag tree-remove-b-B + git merge-tree tree-remove-b-initial tree-remove-b-initial tree-remove-b-B actual + test_cmp expect actual +' + test_expect_success 'turn file to tree' ' git reset --hard initial rm initial-file -- 1.8.3.rc0.149.g98a72f2.dirty -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config
Trond Hasle Amundsen t.h.amund...@usit.uio.no writes: The included patch attempts to improve post-receive-email. It's a trivial change, but one that helps us Gitolite users. Comments are welcome, and this is my first attempt at contributing to the Git project. Please keep me on CC as I'm not on the list. Bah.. It seems I attached this as MIME, sorry about that. Here it is, properly inlined this time: From 878a7af9088e2bcc3afc9b09b9023f1f188c844b Mon Sep 17 00:00:00 2001 From: Trond Hasle Amundsen t.h.amund...@usit.uio.no Date: Mon, 6 May 2013 15:41:25 +0200 Subject: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config When getting the project description, we first try gitweb.description entry in repo.git/config, but repo.git/description takes precedence if it exists. This behaviour mimics that of Gitweb, and is what we want when using Gitolite, which deletes repo.git/description upon repo creation and rather maintains a gitweb.description entry in repo.git/config if a description is configured. Signed-off-by: Trond Hasle Amundsen t.h.amund...@usit.uio.no --- contrib/hooks/post-receive-email |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 0e5b72d..6ce046a 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -714,7 +714,14 @@ if [ -z $GIT_DIR ]; then exit 1 fi -projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +# Get the repo's description. First try gitweb.description entry in +# repo.git/config, but repo.git/description takes precedence if it +# exists. This behaviour mimics that of Gitweb. +projectdesc=$(git config gitweb.description) +if [ -f $GIT_DIR/description ]; then +projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +fi + # Check if the description is unchanged from it's default, and shorten it to # a more manageable length if it is if expr $projectdesc : Unnamed repository.*$ /dev/null -- 1.7.1 Cheers, -- Trond H. Amundsen t.h.amund...@usit.uio.no Center for Information Technology Services, University of Oslo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
offtopic: ppg design decisions - encapsulation
[ Unashamedly offtopic... asking here because I like git design and coding style, and ppg is drawing plenty of inspiration from the old git shell scripts. Please kindly flame me privately... ] ppg is a wrapper around git to maintain and distribute Puppet configs, adding a few niceties. Now, ppg will actuall manage two git repositories -- one for the puppet configs, the second one for ferrying back the puppet run results to the originating repo (were they get loaded in a puppet dashboard server for cute webbased reporting). The puppet config repo is a normally-behaved git repo. The reports repo is a bit of a hack -- never used directly by the user, it will follow a store-and-forward scheme, where I should trim old history or just use something other than git. So I see two possible repo layouts: - Transparent, nested .git/ # holding puppet configs, allows direct use of git commands .git/reports.git # nested inside puppet configs repo - Mediated, parallel .ppg/puppet.git # all git commands must be wrapped .ppg/reports.git My laziness and laisses-faire take on things drives to to use the transparentnested approach. Let the user do anything in there directly with git. OTOH, the mediated approach allows for more complete support, including sanity checks on commands that don't have hooks available. I already have a /usr/bin/ppg wrapper, which I could use to wrap all git commands, setting GIT_DIR=.ppg/puppet.git for all ops. It would force ops to be from the top of the tree (unless I write explicit support) and I would have to implement explicit. And it would break related tools that are not mediated via /usr/bin/git (gitk!). Written this way, it seems to be a minimal lazy approach vs DTRT. Am I missing any important aspect or option? Thoughts? cheers, m -- martin.langh...@gmail.com - ask interesting questions - don't get distracted with shiny stuff - working code first ~ http://docs.moodle.org/en/User:Martin_Langhoff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config
Trond Hasle Amundsen t.h.amund...@usit.uio.no writes: Hello, The included patch attempts to improve post-receive-email. It's a Please don't ;-) Eh, actually, thanks for the patch. But when you send a patch the next time around, please have the above and the next three lines (i.e. introductory text) _below_ the three-dash line. trivial change, but one that helps us Gitolite users. Comments are welcome, and this is my first attempt at contributing to the Git project. Please keep me on CC as I'm not on the list. From 878a7af9088e2bcc3afc9b09b9023f1f188c844b Mon Sep 17 00:00:00 2001 From: Trond Hasle Amundsen t.h.amund...@usit.uio.no Date: Mon, 6 May 2013 15:41:25 +0200 Subject: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config And remove these five lines above. We will read the authorship and subject from the e-mail header of your message. When getting the project description, we first try gitweb.description entry in repo.git/config, but repo.git/description takes precedence if it exists. This behaviour mimics that of Gitweb, and is what we want when using Gitolite, which deletes repo.git/description upon repo creation and rather maintains a gitweb.description entry in repo.git/config if a description is configured. Signed-off-by: Trond Hasle Amundsen t.h.amund...@usit.uio.no --- contrib/hooks/post-receive-email |9 - 1 files changed, 8 insertions(+), 1 deletions(-) diff --git a/contrib/hooks/post-receive-email b/contrib/hooks/post-receive-email index 0e5b72d..6ce046a 100755 --- a/contrib/hooks/post-receive-email +++ b/contrib/hooks/post-receive-email @@ -714,7 +714,14 @@ if [ -z $GIT_DIR ]; then exit 1 fi -projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +# Get the repo's description. First try gitweb.description entry in +# repo.git/config, but repo.git/description takes precedence if it +# exists. This behaviour mimics that of Gitweb. +projectdesc=$(git config gitweb.description) +if [ -f $GIT_DIR/description ]; then +projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +fi + # Check if the description is unchanged from it's default, and shorten it to # a more manageable length if it is if expr $projectdesc : Unnamed repository.*$ /dev/null If description file takes precedence, then the right order to do this would be projectdesc=$(sed -ne 1p $GIT_DIR/description 2/dev/null) if expr $projectdesc : Unnamed repository /dev/null then : use it as is elif projectdesc=$(git config gitweb.description) then : use it as is else projectdesc=UNNAMED PROJECT fi to avoid calling git config when it is not even necessary. We can obviously shorten it by making it less readable, e.g. projectdesc=$(sed -ne 1p $GIT_DIR/description 2/dev/null) ! expr $projectdesc : Unnamed repository /dev/null || projectdesc=$(git config gitweb.description) || projectdesc=UNNAMED PROJECT but I do not think we want to go in that direction ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config
Junio C Hamano gits...@pobox.com writes: Trond Hasle Amundsen t.h.amund...@usit.uio.no writes: Hello, The included patch attempts to improve post-receive-email. It's a Please don't ;-) More precisely: you should have a look at git-multimail (in directory contrib/, in branch for now pu/, or from https://github.com/mhagger/git-multimail) before spending time on post-receive-email. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Junio C Hamano gits...@pobox.com writes: By discarding marks on blobs, we may be robbing some optimization possibilities, and by discarding marks on tags, we may be robbing some features, from users of fast-export; we might want to add an option --use-object-marks={blob,commit,tag} or something to both fast-export and fast-import, so that the former can optionally write marks for non-commits out, and the latter can omit non commit marks if the user do not need them. But that is a separate issue. s/--use-object-marks/--persistent-object-marks/; I think that would express the issue better. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 06, 2013 at 08:08:45AM -0700, Junio C Hamano wrote: I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. The existing documentation is fairly clear that marks for objects other than commits are not exported, and the import-marks codepath discards anything but commits, so there is no mechanism for the existing fast-export users to leave blob marks in the marks file for later runs of fast-export to take advantage of. The second invocation cannot refer to such a blob in the first place. OK. If the argument is we do not write them, so do not bother reading them back in, I think that is reasonable. It could hurt anybody trying to run fast-export against a marks file created by somebody else, but that is also the same case that is being helped here (since otherwise, we would not be seeing blob entries at all). I do not offhand know enough about the internals of import/export-style remote-helpers to say whether the hurt case even exists, let alone how common it is. By discarding marks on blobs, we may be robbing some optimization possibilities, and by discarding marks on tags, we may be robbing some features, from users of fast-export; we might want to add an option --use-object-marks={blob,commit,tag} or something to both fast-export and fast-import, so that the former can optionally write marks for non-commits out, and the latter can omit non commit marks if the user do not need them. But that is a separate issue. Yeah, that would allow the old behavior (and more) if anybody is hurt by this. It is nice if the order of implementation is more features, then flip the default because it provides an immediate escape hatch for anybody who is hurt by the change in default. But again, I do not know enough to say whether such hurt cases even exist. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config
Junio C Hamano gits...@pobox.com writes: But when you send a patch the next time around, please have the above and the next three lines (i.e. introductory text) _below_ the three-dash line. Allright, noted. From 878a7af9088e2bcc3afc9b09b9023f1f188c844b Mon Sep 17 00:00:00 2001 From: Trond Hasle Amundsen t.h.amund...@usit.uio.no Date: Mon, 6 May 2013 15:41:25 +0200 Subject: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config And remove these five lines above. We will read the authorship and subject from the e-mail header of your message. So many rules.. ;) Also noted. +projectdesc=$(git config gitweb.description) +if [ -f $GIT_DIR/description ]; then +projectdesc=$(sed -ne '1p' $GIT_DIR/description 2/dev/null) +fi + # Check if the description is unchanged from it's default, and shorten it to # a more manageable length if it is if expr $projectdesc : Unnamed repository.*$ /dev/null If description file takes precedence, then the right order to do this would be projectdesc=$(sed -ne 1p $GIT_DIR/description 2/dev/null) if expr $projectdesc : Unnamed repository /dev/null then : use it as is elif projectdesc=$(git config gitweb.description) then : use it as is else projectdesc=UNNAMED PROJECT fi to avoid calling git config when it is not even necessary. That doesn't work, you'll always call git config unless the string matches Unnamed repository. If you negate the expr line it still doesn't work. To avoid calling git config I'd rather suggest something like this: projectdesc=$(sed -ne 1p $GIT_DIR/description 2/dev/null) if [ -z $projectdesc ]; then projectdesc=$(git config gitweb.description) fi And let this block remain intact: if expr $projectdesc : Unnamed repository.*$ /dev/null then projectdesc=UNNAMED PROJECT fi The only change would then be the three added lines containing the if block that calls git config if the projectdesc variable is empty. The idea is just to get the description from config if the description file doesn't exist. Just curious.. why would we avoid calling git config unless necessary? Regards, -- Trond H. Amundsen t.h.amund...@usit.uio.no Center for Information Technology Services, University of Oslo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] contrib/hooks/post-receive-email: get description from repo.git/config
Matthieu Moy matthieu@grenoble-inp.fr writes: The included patch attempts to improve post-receive-email. It's a Please don't ;-) More precisely: you should have a look at git-multimail (in directory contrib/, in branch for now pu/, or from https://github.com/mhagger/git-multimail) before spending time on post-receive-email. Thanks for the tip, I'll check it out. I used post-receive-email out of convenience, as it was already there. Regards, -- Trond H. Amundsen t.h.amund...@usit.uio.no Center for Information Technology Services, University of Oslo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Jeff King p...@peff.net writes: On Mon, May 06, 2013 at 08:08:45AM -0700, Junio C Hamano wrote: I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. The existing documentation is fairly clear that marks for objects other than commits are not exported, and the import-marks codepath discards anything but commits, so there is no mechanism for the existing fast-export users to leave blob marks in the marks file for later runs of fast-export to take advantage of. The second invocation cannot refer to such a blob in the first place. OK. If the argument is we do not write them, so do not bother reading them back in, I think that is reasonable. The way I read builtin/fast-export.c::import_marks() is that it is more like we do not write them, and we do not read them back in either IN THE CURRENT CODE. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 06, 2013 at 09:32:41AM -0700, Junio C Hamano wrote: OK. If the argument is we do not write them, so do not bother reading them back in, I think that is reasonable. The way I read builtin/fast-export.c::import_marks() is that it is more like we do not write them, and we do not read them back in either IN THE CURRENT CODE. Ahh...I see now. It is not about skipping the blobs as a new behavior, but rather about skipping them _earlier_, before we have loaded the object contents from disk. I took the we don't care about as the general use of fast-export does not care about, but it is we will literally just drop them a few lines later. So yes, I think this is an obviously correct optimization. Thanks for clarifying, and sorry to be so slow. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] l10n: de.po: translate 44 new messages
Translate 44 new messages came from git.pot update in c6bc7d4 (l10n: git.pot: v1.8.3 round 2 (44 new, 12 removed)). Signed-off-by: Ralf Thielow ralf.thie...@gmail.com --- po/de.po | 139 +++ 1 file changed, 77 insertions(+), 62 deletions(-) diff --git a/po/de.po b/po/de.po index c823661..4073b42 100644 --- a/po/de.po +++ b/po/de.po @@ -134,74 +134,76 @@ msgstr Benutzen Sie '\\!' für führende Ausrufezeichen. #: branch.c:60 -#, fuzzy, c-format +#, c-format msgid Not setting branch %s as its own upstream. -msgstr externer Zweig %s nicht im anderen Projektarchiv %s gefunden +msgstr Zweig %s kann nicht sein eigener Übernahmezweig sein. #: branch.c:82 #, c-format msgid Branch %s set up to track remote branch %s from %s by rebasing. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von externem Zweig %s von %s durch +Neuaufbau. #: branch.c:83 #, c-format msgid Branch %s set up to track remote branch %s from %s. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von externem Zweig %s von %s. #: branch.c:87 #, c-format msgid Branch %s set up to track local branch %s by rebasing. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von lokalem Zweig %s durch Neuaufbau. #: branch.c:88 #, c-format msgid Branch %s set up to track local branch %s. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von lokalem Zweig %s. #: branch.c:92 #, c-format msgid Branch %s set up to track remote ref %s by rebasing. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von externer Referenz %s durch +Neuaufbau. #: branch.c:93 #, c-format msgid Branch %s set up to track remote ref %s. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von externer Referenz %s. #: branch.c:97 #, c-format msgid Branch %s set up to track local ref %s by rebasing. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von lokaler Referenz %s durch +Neuaufbau. #: branch.c:98 #, c-format msgid Branch %s set up to track local ref %s. -msgstr +msgstr Zweig %s konfiguriert zum Folgen von lokaler Referenz %s. #: branch.c:118 -#, fuzzy, c-format +#, c-format msgid Tracking not set up: name too long: %s -msgstr Markierungsname zu lang: %.*s... +msgstr Konfiguration zum Folgen von Zweig nicht eingerichtet. Name zu lang: %s #: branch.c:137 -#, fuzzy, c-format +#, c-format msgid Not tracking: ambiguous information for ref %s -msgstr Kein externer Übernahmezweig für %s von %s +msgstr Konfiguration zum Folgen von Zweig nicht eingerichtet. Referenz %s ist mehrdeutig. #: branch.c:182 -#, fuzzy, c-format +#, c-format msgid '%s' is not a valid branch name. -msgstr '%s' ist kein gültiger Markierungsname. +msgstr '%s' ist kein gültiger Zweigname. #: branch.c:187 -#, fuzzy, c-format +#, c-format msgid A branch named '%s' already exists. -msgstr Arbeitsbaum '%s' existiert bereits. +msgstr Zweig '%s' existiert bereits. #: branch.c:195 -#, fuzzy msgid Cannot force update the current branch. -msgstr bezieht den aktuellen Zweig ein +msgstr Kann Aktualisierung des aktuellen Zweiges nicht erzwingen. #: branch.c:201 #, c-format @@ -237,29 +239,27 @@ msgstr beim Versand zu konfigurieren. #: branch.c:250 -#, fuzzy, c-format +#, c-format msgid Not a valid object name: '%s'. -msgstr %s ist kein gültiger Objekt-Name +msgstr Ungültiger Objekt-Name: '%s' #: branch.c:270 -#, fuzzy, c-format +#, c-format msgid Ambiguous object name: '%s'. -msgstr fehlerhafter Objekt-Name '%s' +msgstr mehrdeutiger Objekt-Name: '%s' #: branch.c:275 -#, fuzzy, c-format +#, c-format msgid Not a valid branch point: '%s'. -msgstr Ungültiger Zweig-Name: '%s' +msgstr Ungültige Zweig-Version: '%s' #: branch.c:281 -#, fuzzy msgid Failed to lock ref for update -msgstr Fehler beim Erstellen der Ausgabedateien. +msgstr Fehler beim Sperren der Referenz zur Aktualisierung. #: branch.c:299 -#, fuzzy msgid Failed to write ref -msgstr konnte %s nicht schreiben +msgstr Fehler beim Schreiben der Referenz. #: bundle.c:36 #, c-format @@ -1161,14 +1161,14 @@ msgid Could not format %s. msgstr Konnte %s nicht formatieren. #: sequencer.c:1083 -#, fuzzy, c-format +#, c-format msgid %s: can't cherry-pick a %s -msgstr Kann \cherry-pick\ nicht in einem leeren Zweig ausführen. +msgstr %s: %s kann nicht in \cherry-pick\ benutzt werden #: sequencer.c:1085 #, c-format msgid %s: bad revision -msgstr +msgstr %s: ungültige Revision #: sequencer.c:1119 msgid Can't revert as initial commit @@ -1561,6 +1561,9 @@ msgid may speed it up, but you have to be careful not to forget to add\n new files yourself (see 'git help status'). msgstr +Es dauerte %.2f Sekunden die unbeobachteten Dateien zu bestimmen.\n +'status -uno' könnte das beschleunigen, aber Sie müssen darauf achten,\n +neue Dateien selbstständig hinzuzufügen (siehe 'git help status'). #: wt-status.c:1232 #, c-format @@ -1665,7 +1668,7 @@ msgstr git add [Optionen] [--] [Pfadspezifikation...] #. * eventually we can drop
Re: [PATCH 0/7] Make $remote/$branch work with unconventional refspecs
Johan Herland jo...@herland.net writes: Let me try to summarize my views on how refnames should work in Git, to see if we can identify where we differ on the principles (or if we, in fact, differ at all): Thanks; I think I already said where I think we differ in a separate message, but a short version is that the point of remote.$nick.fetch mapping is to solve The remote may call a ref $this, which is not the refname I want to or can use in my repository, so here is the rule to use when importing it in my local namespace. With the mapping, I can name the ref in my local namespace conveniently. E.g. their refs/heads/master cannot be our refs/heads/master at the same time, so use refs/heads/origin/master. The result of the above mapping, be it remotes/origin/master or remotes/origin/heads/master, should be designed to be useful for the local use of the ref in question. If you further need to remap it when using it locally, there is something wrong in the mapping you defined in your remote.$nick.fetch mapping in the first place. We do not force any structure under refs/remotes/; it is left entirely up to the user, even though we would like to suggest the best current practice by teaching clone and remote add to lay them out in a certain way. Another thing is that refs/remotes/ is not special at all. If notes hierarchies taken from a remote need to be somewhere other than refs/notes/, it is perfectly fine to introduce refs/remote-notes/ if that is the best layout when using them locally. What is special is refs/heads/ in that they are the _only_ refs you can check out to the working tree and directly advance them by working on the working tree files. I would support disallowing multi-level remote names, although I don't know if it is commonly used, and would break many existing users. I somewhat doubt it. We very much anticipated the use of multi-level branch names from the very beginning and have support (e.g. in for-each-ref and branch --list) to group/filter them according to prefixes, but I do not think there is anywhere we consciously try to give support for multi-level remote names to treat groups of remotes that share the same prefix. *2* Perhaps bar in the above is spelled topics, and the hierarchy may be used to collect non-integration single topic branches from more than one remote. An example that is more in line with such a usage might be: [remote jh] fetch = +refs/heads/*:refs/remotes/topics/heads/jh/* [remote jk] fetch = +refs/heads/*:refs/remotes/topics/heads/jk/* [remote fc] fetch = +refs/heads/*:refs/remotes/topics/heads/fc/* and I would expect git merge topics/jh/rbranch to merge the refs/remotes/topics/heads/jh/rbranch topic branch. I like the use case, but not necessarily your expectation. ;-) With the above configuration, and my series as-is, you could simply do git merge jh/rbranch to merge the refs/remotes/topics/heads/jh/rbranch topic branch. That dropping of 'topics/' is the issue. The user wanted to group them under 'topics/' hierarchy and made a conscous effort to set up the fetch refspec to map these refs there. These are done all for convenience when she deals with refs in her namespace in the repository. What justification do we have to second guess the user and force her to drop it when naming these refs? Furthermore, I don't see why you want/need the extra heads/ level in the refspec. Just like you wanted to have separate kinds of refs under a single remote, the layout is grouping kinds of refs other than branch heads related to the topics (as opposed to integration branches). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Make $remote/$branch work with unconventional refspecs
Santi Béjar sa...@agolina.net writes: The next question could be: why not make it work with $url too? As in: $ git merge git://git.kernel.org/pub/scm/git/git.git//master You need to remember merge is a local operation, working in your local repository. Like it or not, the UI of Git makes distinction between operations that are local and those that go to other repositories (e.g. git pull). That of course does _not_ prevent you from writing an alternative UI got merge anything that interprets anthing part and invokes git merge or git pull as its helper (after all, Git is designed to be scripted). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Jeff King p...@peff.net writes: So yes, I think this is an obviously correct optimization. Thanks for clarifying, and sorry to be so slow. No need to be sorry. It just shows that the log message could have been more helpful. Here is what I tentatively queued. commit 83582e91d22c66413b291d4d6d45bbeafddc2af9 Author: Felipe Contreras felipe.contre...@gmail.com Date: Sun May 5 17:38:53 2013 -0500 fast-export: do not parse non-commit objects while reading marks file We read from the marks file and keep only marked commits, but in order to find the type of object, we are parsing the whole thing, which is slow, specially in big repositories with lots of big files. There's no need for that, we can query the object information with sha1_object_info(). Before this, loading the objects of a fresh emacs import, with 260598 blobs took 14 minutes, after this patch, it takes 3 seconds. This is the way fast-import does it. Also die if the object is not found (like fast-import). Signed-off-by: Felipe Contreras felipe.contre...@gmail.com Signed-off-by: Junio C Hamano gits...@pobox.com -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 06, 2013 at 10:17:41AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: So yes, I think this is an obviously correct optimization. Thanks for clarifying, and sorry to be so slow. No need to be sorry. It just shows that the log message could have been more helpful. Here is what I tentatively queued. [...] Yeah, that is much for to understand (to me, at least). Thanks. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Make $remote/$branch work with unconventional refspecs
Junio C Hamano gits...@pobox.com writes: Just a few typofixes... Johan Herland jo...@herland.net writes: Let me try to summarize my views on how refnames should work in Git, to see if we can identify where we differ on the principles (or if we, in fact, differ at all): Thanks; I think I already said where I think we differ in a separate message, but a short version is that the point of remote.$nick.fetch mapping is to solve The remote may call a ref $this, which is not the refname I want to or can use in my repository, so here is the rule to use when importing it in my local namespace. With the mapping, I can name the ref in my local namespace conveniently. E.g. their refs/heads/master cannot be our refs/heads/master at the same time, so use refs/heads/origin/master. The last should be refs/remotes/origin/master. The result of the above mapping, be it remotes/origin/master or remotes/origin/heads/master, should be designed to be useful for the local use of the ref in question. If you further need to remap it when using it locally, there is something wrong in the mapping you defined in your remote.$nick.fetch mapping in the first place. We do not force any structure under refs/remotes/; it is left entirely up to the user, even though we would like to suggest the best current practice by teaching clone and remote add to lay them out in a certain way. Another thing is that refs/remotes/ is not special at all. If notes hierarchies taken from a remote need to be somewhere other than refs/notes/, it is perfectly fine to introduce refs/remote-notes/ if that is the best layout when using them locally. What is special is s/the best/the most convenient/; refs/heads/ in that they are the _only_ refs you can check out to the working tree and directly advance them by working on the working tree files. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 06, 2013 at 01:19:35PM -0400, Jeff King wrote: Here is what I tentatively queued. [...] Yeah, that is much for to understand (to me, at least). Ugh. That was supposed to be much easier to understand. Perhaps I will learn to type one day. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] shorten_unambiguous_ref(): Allow shortening refs/remotes/origin/HEAD to origin
Johan Herland jo...@herland.net writes: ... there is AFAICS _no_ way for sscanf() - having already done one or more format extractions - to indicate to its caller that the input fails to match the trailing part of the format string. Yeah, we can detect when we did not have enough, but we cannot tell where it stopped matching. It is interesting that this bug has stayed so long with us, which may indicate that nobody actually uses the feature at all. Good eyes. Cc: Bert Wesarg bert.wes...@googlemail.com Signed-off-by: Johan Herland jo...@herland.net --- refs.c | 82 +++-- t/t6300-for-each-ref.sh | 12 2 files changed, 43 insertions(+), 51 deletions(-) diff --git a/refs.c b/refs.c index d17931a..7231f54 100644 --- a/refs.c +++ b/refs.c @@ -2945,80 +2945,60 @@ struct ref *find_ref_by_name(const struct ref *list, const char *name) return NULL; } +int shorten_ref(const char *refname, const char *pattern, char *short_name) Does this need to be an extern? { + /* + * pattern must be of the form [pre]%.*s[post]. Check if refname + * starts with [pre] and ends with [post]. If so, write the + * middle part into short_name, and return the number of chars + * written (not counting the added NUL-terminator). Otherwise, + * if refname does not match pattern, return 0. + */ + size_t pre_len, post_start, post_len, match_len; + size_t ref_len = strlen(refname); + char *sep = strstr(pattern, %.*s); + if (!sep || strstr(sep + 4, %.*s)) + die(invalid pattern in ref_rev_parse_rules: %s, pattern); + pre_len = sep - pattern; + post_start = pre_len + 4; + post_len = strlen(pattern + post_start); + if (pre_len + post_len = ref_len) + return 0; /* refname too short */ + match_len = ref_len - (pre_len + post_len); + if (strncmp(refname, pattern, pre_len) || + strncmp(refname + ref_len - post_len, pattern + post_start, post_len)) + return 0; /* refname does not match */ + memcpy(short_name, refname + pre_len, match_len); + short_name[match_len] = '\0'; + return match_len; } OK. Looks correct, even though I suspect some people might come up with a more concise way to express the above. char *shorten_unambiguous_ref(const char *refname, int strict) { int i; char *short_name; /* skip first rule, it will always match */ - for (i = nr_rules - 1; i 0 ; --i) { + for (i = ARRAY_SIZE(ref_rev_parse_rules) - 1; i 0 ; --i) { int j; int rules_to_fail = i; int short_name_len; + if (!ref_rev_parse_rules[i] || What is this skippage about? Isn't it what you already compensated away by starting from ARRAY_SIZE() - 1? Ahh, no. But wait. Isn't there a larger issue here? + !(short_name_len = shorten_ref(refname, +ref_rev_parse_rules[i], +short_name))) continue; - short_name_len = strlen(short_name); - /* * in strict mode, all (except the matched one) rules * must fail to resolve to a valid non-ambiguous ref */ if (strict) - rules_to_fail = nr_rules; + rules_to_fail = ARRAY_SIZE(ref_rev_parse_rules); Isn't nr_rules in the original is ARRAY_SIZE()-1? /* * check if the short name resolves to a valid ref, Could you add a test to trigger the strict codepath? Thanks. diff --git a/t/t6300-for-each-ref.sh b/t/t6300-for-each-ref.sh index 752f5cb..57e3109 100755 --- a/t/t6300-for-each-ref.sh +++ b/t/t6300-for-each-ref.sh @@ -466,4 +466,16 @@ test_expect_success 'Verify sort with multiple keys' ' refs/tags/bogo refs/tags/master actual test_cmp expected actual ' + +cat expected \EOF +origin +origin/master +EOF + +test_expect_success 'Check refs/remotes/origin/HEAD shortens to origin' ' + git remote set-head origin master + git for-each-ref --format=%(refname:short) refs/remotes actual + test_cmp expected actual +' + test_done -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug: git-p4: Sometimes p4 generates Windows-style output on OS X
I've observed that the p4 command that git-p4 delegates to occasionally outputs Windows-style line endings even on the OS X platform. When this happens, git-p4 gets very confused and crashes out. I've attached a patch which seems to fix the issue in my case. Now this patch is a pretty bad hack, and I don't recommend that it be accepted as-is. It is just a starting point. A real fix would determine in advance whether Perforce was going to emit Windows-style output. Since I don't know the circumstances under which this happens on non-Windows platforms, I can't provide a better patch. Someone who has intimate knowledge of p4's operating modes would be best to examine what's really going on with p4. P.S. In case it matters, I am not subscribed to this mailing list, so you will need to CC me for any replies to reach me. --- David Foster http://dafoster.net/ From aef963f0c45dea81f3e6f30d3b4185a0983ca4de Mon Sep 17 00:00:00 2001 From: David Foster davidf...@gmail.com Date: Mon, 6 May 2013 10:50:01 -0700 Subject: [PATCH] Compensate for Windows-style output from the p4 command on non-Windows systems. --- git-p4.py | 18 ++ 1 files changed, 18 insertions(+), 0 deletions(-) diff --git a/git-p4.py b/git-p4.py index 647f110..949d66d 100755 --- a/git-p4.py +++ b/git-p4.py @@ -1454,6 +1454,24 @@ class P4Submit(Command, P4UserMap): tmpFile = open(fileName, rb) message = tmpFile.read() tmpFile.close() + +# HACK: If Perforce spontaneously generates Windows-style output, +# compensate by assuming the entire p4 command went into +# Windows mode. +if separatorLine not in message: +print WARNING: Perforce has spontaneously decided to generate Windows-style output. Compensating. + +# Assume that Perforce is now inexplicably operating in Windows mode +self.isWindows = True + +# Retroactively rewrite expected output +submitTemplate = submitTemplate.replace(\n, \r\n) +separatorLine = separatorLine.replace(\n, \r\n) +newdiff = newdiff.replace(\n, \r\n) + +if separatorLine not in message: +raise ValueError('Confused. Thought Perforce went into Windows mode but apparently something else is wrong.') + submitTemplate = message[:message.index(separatorLine)] if self.isWindows: submitTemplate = submitTemplate.replace(\r\n, \n) -- 1.7.7.5 (Apple Git-26)
Re: another packed-refs race
On Mon, May 06, 2013 at 02:03:40PM +0200, Michael Haggerty wrote: We could fix this by making sure our packed-refs file is up to date s/file/cache/ Yeah, I meant our view of the packed-refs file, but I think cache says that more succinctly. I'll be sure to make it clear when I start writing real commit messages. Let me think out loud alongside your analysis... By this mechanism the reader can ensure that it never uses a version of the packed-refs file that is older than its information that the corresponding loose ref is absent from the filesystem. Yes, that's a good way of saying it. This is all assuming that the filesystem accesses have a defined order; how is that guaranteed? pack_refs() and commit_ref() both rely on commit_lock_file(), which calls close(fd) on the lockfile rename(lk-filename, result_file) prune_ref() locks the ref, verifies that its SHA-1 is unchanged, then calls unlink(), then rollback_lock_file(). The atomicity of rename() guarantees that a reader sees either the old or the new version of the file in question. But what guarantees are there about accesses across two files? I don't know offhand if any standard makes such guarantees. But there are other parts of git that do depend on that ordering. For example, I create a loose object representing commit X. Then I update a ref to point to X. If the ordering of those operations is not preserved, then a simultaneous reader would think that the repository is corrupted (the ref points to a missing object). I would not be surprised if there are distributed or networked filesystems for which this property does not hold. But I suspect it does hold for operations on a single machine with a decent kernel (I would imagine the VFS layer takes care of this). I am just basing my suspicion on experimental results (the patch I sent does seem to work on my ext4 system). The loose refs cache is only used by the for_each_ref() functions, not for looking up individual references. Another approach would be to change the top-level for_each_ref() functions to re-stat() all of the loose references within the namespace that interests it, *then* verify that the packed-ref cache is not stale, *then* start the iteration. Then there would be no need to re-stat() files during the iteration. (This would mean that we have to prohibit a second reference iteration from being started while one is already in progress.) Hmm. Thinking on this more, I'm not sure that we need to stat the loose references at all. We do not need to care if the loose refs are up to date or not. Well, we might care, but the point here is not to pretend that we have an up-to-date atomic view of the loose refs; it is only to make sure that the fallback-to-packed behavior does not lie to us about the existence or value of a ref. IOW, it is OK to come up with a value for ref X that was true at the beginning of the program, even if it has been simultaneously updated. Our program can operate as if it happened in the instant it started, even though in real life it takes longer. But it is _not_ OK to miss the existence of a ref, or to come up with a value that it did not hold at some point during the program (e.g., it is not OK to return some cruft we wrote into the packed-refs file when we packed it three weeks ago). That is a weaker guarantee, and I think we can provide it with: 1. Load all loose refs into cache for a particular enumeration. 2. Make sure the packed-refs cache is up-to-date (by checking its stat() information and reloading if necessary). 3. Run the usual iteration over the loose/packed ref caches. If a loose ref is updated after or during (1), that's OK. The ref hierarchy is not atomic, so it is possible for us to see a state that never existed (e.g., we read X, somebody updates X to X' and Y to Y', then we read Y'; we see X and Y', a state that never existed on disk). But either the ref was already loose, in which case we always see its loose value as it was when we read it, or it was previously only packed (or non-existent), in which case we see get its value from the packed-refs (or assume it does not exist), which is its state at the start of our program. If the loose ref is deleted instead of updated, it's the same thing; we may see it as existing, as it was at the start of our program. If the packed-refs file is updated after we have read all of the loose refs, we may see it ahead of the loose refs state we have cached. And that may mean the packed-refs file even has more up-to-date values. But we don't have to care, as long as we return some value that was valid during the course of our program. And we do, either because we have the loose value cached (in which case it trumps the packed-refs version and we use it), or it was missing (in which case we will use the updated pack-refs value, which may not even be the most recent value, but is at least a value that the ref had to hold at some point during the
Re: another packed-refs race
On Mon, May 06, 2013 at 02:12:29PM +0200, Michael Haggerty wrote: I think that would be correct (modulo that step 1 cannot happen for enumeration). But we would like to avoid loading all loose refs if we can. Especially on a cold cache, it can be quite slow, and you may not even care about those refs for the current operation (I do not recall the exact original motivation for the lazy loading, but it was something along those lines). Lazy loading was first inspired by the observation that effectively every git invocation was loading *all* loose references to do an iteration over refs/replace/ (which I've never even used!) This was absolutely killing the performance of filter-branch, which creates a lot of loose references and invokes git many times--even though the cache was warm. Yeah, obviously we want to avoid that. I _think_ we can even keep the lazy loading, as long as keep the ordering as: 1. Load a chunk of loose refs (whatever we need for the current iteration request). 2. Double-check that our packed-refs cache is up to date, and reload if necessary. 3. Return the results to the caller. It would perhaps increase latency to getting results to the caller (since otherwise we can start feeding answers to the caller as we walk the tree), but should not increase the total amount of work (just the extra stat() of packed-refs, once per for_each_ref, which is not much). I'll see if I can cook up a patch. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 6, 2013 at 7:31 AM, Jeff King p...@peff.net wrote: On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote: We don't care about blobs, or any object other than commits, but in order to find the type of object, we are parsing the whole thing, which is slow, specially in big repositories with lots of big files. I did a double-take on reading this subject line and first paragraph, thinking surely fast-export needs to actually output blobs?. If you think that, then you are not familiar with the code. --export-marks=file:: Dumps the internal marks table to file when complete. Marks are written one per line as `:markid SHA-1`. Only marks for revisions are dumped; marks for blobs are ignored. if (deco-base deco-base-type == 1) { mark = ptr_to_mark(deco-decoration); if (fprintf(f, :%PRIu32 %s\n, mark, sha1_to_hex(deco-base-sha1)) 0) { e = 1; break; } } Reading the patch, I see that this is only about not bothering to load blob marks from --import-marks. It might be nice to mention that in the commit message, which is otherwise quite confusing. The commit message says it exactly like it is: we don't care about blobs. If an object is not a commit, we *already* skip it. But as the commit message already says, we do so by parsing the whole thing. I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. Because it's pointless to have hundreds and thousands of blob marks that are *never* going to be used, only for an extremely tiny minority that would. Does that match your reasoning? It doesn't matter, it has been that way since --export-marks was introduced. Before this, loading the objects of a fresh emacs import, with 260598 blobs took 14 minutes, after this patch, it takes 3 seconds. Presumably most of that speed improvement comes from not parsing the blob objects. I wonder if you could get similar speedups by applying the do not bother parsing rule from your patch 3. You would still incur some cost to create a struct blob, but it may or may not be measurable. That would mean we get the case not worth worrying about from above for free. I doubt it would make that big a difference, though, given the rarity of it. So I am OK with it either way. How would I know if it's a blob or a commit, if not by the code this patch introduces? -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 6, 2013 at 10:08 AM, Junio C Hamano gits...@pobox.com wrote: Jeff King p...@peff.net writes: On Sun, May 05, 2013 at 05:38:53PM -0500, Felipe Contreras wrote: We don't care about blobs, or any object other than commits, but in order to find the type of object, we are parsing the whole thing, which is slow, specially in big repositories with lots of big files. I did a double-take on reading this subject line and first paragraph, thinking surely fast-export needs to actually output blobs?. Reading the patch, I see that this is only about not bothering to load blob marks from --import-marks. It might be nice to mention that in the commit message, which is otherwise quite confusing. I had the same reaction first, but not writing the blob _objects_ out to the output stream would not make any sense, so it was fairly easy to guess what the author wanted to say ;-). That's how fast-export has worked since --export-marks was introduced. I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. The existing documentation is fairly clear that marks for objects other than commits are not exported, and the import-marks codepath discards anything but commits, so there is no mechanism for the existing fast-export users to leave blob marks in the marks file for later runs of fast-export to take advantage of. The second invocation cannot refer to such a blob in the first place. The story is different on the fast-import side, where we do say we dump the full table and a later run can depend on these marks. Yes, and gaining nothing but increased disk-space. By discarding marks on blobs, we may be robbing some optimization possibilities, and by discarding marks on tags, we may be robbing some features, from users of fast-export; we might want to add an option --use-object-marks={blob,commit,tag} or something to both fast-export and fast-import, so that the former can optionally write marks for non-commits out, and the latter can omit non commit marks if the user do not need them. But that is a separate issue. How? The only way we might rob optimizations is if there's an obscene amount files, otherwise the number of blob marks that we are *actually* going to use ever again is extremely tiny. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 06, 2013 at 02:02:13PM -0500, Felipe Contreras wrote: I did a double-take on reading this subject line and first paragraph, thinking surely fast-export needs to actually output blobs?. If you think that, then you are not familiar with the code. --export-marks=file:: [...] My point was that nothing in the subject line nor that first paragraph (nor, for that matter, the entire commit message) says that we are talking about marks here. Reading the patch, I see that this is only about not bothering to load blob marks from --import-marks. It might be nice to mention that in the commit message, which is otherwise quite confusing. The commit message says it exactly like it is: we don't care about blobs. If you guess that we means the marks code and not all of fast-export, then yes. But I do not have any desire to get into another debate trying to convince you that there is value to having a clear commit message. Junio has already proposed a much more readable one. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 6, 2013 at 11:20 AM, Jeff King p...@peff.net wrote: On Mon, May 06, 2013 at 08:08:45AM -0700, Junio C Hamano wrote: I'm also not sure why your claim we don't care about blobs is true, because naively we would want future runs of fast-export to avoid having to write out the whole blob content when mentioning the blob again. The existing documentation is fairly clear that marks for objects other than commits are not exported, and the import-marks codepath discards anything but commits, so there is no mechanism for the existing fast-export users to leave blob marks in the marks file for later runs of fast-export to take advantage of. The second invocation cannot refer to such a blob in the first place. OK. If the argument is we do not write them, so do not bother reading them back in, I think that is reasonable. We already do that: 5d3698f fast-export: avoid importing blob marks It could hurt anybody trying to run fast-export against a marks file created by somebody else, but that is also the same case that is being helped here (since otherwise, we would not be seeing blob entries at all). I do not offhand know enough about the internals of import/export-style remote-helpers to say whether the hurt case even exists, let alone how common it is. By discarding marks on blobs, we may be robbing some optimization possibilities, and by discarding marks on tags, we may be robbing some features, from users of fast-export; we might want to add an option --use-object-marks={blob,commit,tag} or something to both fast-export and fast-import, so that the former can optionally write marks for non-commits out, and the latter can omit non commit marks if the user do not need them. But that is a separate issue. Yeah, that would allow the old behavior (and more) if anybody is hurt by this. There is no behavior change in this patch. We do *exactly* the same as before. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
On Mon, May 6, 2013 at 2:11 PM, Jeff King p...@peff.net wrote: On Mon, May 06, 2013 at 02:02:13PM -0500, Felipe Contreras wrote: I did a double-take on reading this subject line and first paragraph, thinking surely fast-export needs to actually output blobs?. If you think that, then you are not familiar with the code. --export-marks=file:: [...] My point was that nothing in the subject line nor that first paragraph (nor, for that matter, the entire commit message) says that we are talking about marks here. s/$/ while loading marks/. Fixed. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fwd: Uninit'ed submodules
Am 06.05.2013 02:19, schrieb Chris Packham: This did get me thinking. Why does an uninitialized submodule need to have an empty directory? If it didn't the maintainer in question probably would have realized that he needed to run git submodule update --init when his cd submodule command failed. I'm guessing there is a good reason for the empty directory - perhaps so that git can notice the fact that it exists in the worktree but is out of date? If it does need to have some presence in the worktree why not as a file? That way the cd command would still fail (albeit with a different error) providing the necessary indication to the user. The submodule update --init could then change from file - dir when it actually gets populated. Hmm, to me an empty directory is the natural representation of an unpopulated submodule, but I see why that made it hard for your maintainer to notice the fact that the submodule was uninitialized. I suspect changing an unpopulated submodule to be represented by a file will surprise quite some users (some of which will probably come up with perfectly valid use cases such a change will break). What about the following: Today's Git completely ignores empty submodule directories, but I think that when the recursive checkout support is there, the submodule.autoupdate flag - which I believe should control that behavior - could also make those empty submodule directories show up in git status as being unpopulated (after all they are configured to be updated automatically, so not having them populated is something Git should show). Would something like this have helped here? Until then I can only propose to establish a best practice of using git clone --recurse-submodules in these situations to avoid the problem you described. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Make $remote/$branch work with unconventional refspecs
El 06/05/2013 19:11, Junio C Hamano gits...@pobox.com va escriure: Santi Béjar sa...@agolina.net writes: The next question could be: why not make it work with $url too? As in: $ git merge git://git.kernel.org/pub/scm/git/git.git//master You need to remember merge is a local operation, working in your local repository. Like it or not, the UI of Git makes distinction between operations that are local and those that go to other repositories (e.g. git pull). Of course! In fact I wanted to say git pull. Santi -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 0/7] interactive git clean
Implement a 'git add --interactive' style of interactive git-clean. It will show what would be done before start to clean. See ``Interactive mode`` for details. Interactive mode When the command enters the interactive mode, it shows the files and directories to be cleaned, and goes into its interactive command loop. The command loop shows the list of subcommands available, and gives a prompt What now . In general, when the prompt ends with a single '', you can pick only one of the choices given and type return, like this: *** Commands *** 1: clean 2: edit by patterns 3: edit by numbers 4. rm -i 5. quit 6. help What now 2 You also could say `c` or `clean` above as long as the choice is unique. The main command loop has 6 subcommands. clean:: Start cleaning files and directories, and then quit. edit by patterns:: This shows the files and directories to be deleted and issues an Input ignore patterns prompt. You can input a space-seperated patterns to exclude files and directories from deletion. E.g. *.c *.h will excludes files end with .c and .h from deletion. When you are satisfied with the filtered result, press ENTER (empty) back to the main menu. edit by numbers:: This shows the files and directories to be deleted and issues an Select items to delete prompt. When the prompt ends with double '' like this, you can make more than one selection, concatenated with whitespace or comma. Also you can say ranges. E.g. 2-5 7,9 to choose 2,3,4,5,7,9 from the list. If the second number in a range is omitted, all remaining patches are taken. E.g. 7- to choose 7,8,9 from the list. You can say '*' to choose everything. Also when you are satisfied with the filtered result, press ENTER (empty) back to the main menu. rm -i:: This will show a rm -i style cleaning, that you must confirm one by one in order to delete items. This action is not as efficient as the above two actions. quit:: This lets you quit without do cleaning. help:: Show brief usage of interactive git-clean. Jiang Xin (7): Add support for -i/--interactive to git-clean Show items of interactive git-clean in columns Add colors to interactive git-clean git-clean: use a git-add-interactive compatible UI git-clean: interactive cleaning by select numbers git-clean: rm -i style interactive cleaning git-clean: update document for interactive git-clean Documentation/config.txt| 4 + Documentation/git-clean.txt | 71 - builtin/clean.c | 700 ++-- 3 files changed, 752 insertions(+), 23 deletions(-) -- 1.8.3.rc1.338.gb35aa5d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 1/7] Add support for -i/--interactive to git-clean
Show what would be done and the user must confirm before actually cleaning. In the confirmation dialog, the user has three choices: * y/yes: Start to do cleaning. * n/no: Nothing will be deleted. * e/edit: Exclude items from deletion using ignore patterns. When the user chooses the edit mode, the user can input space- separated patterns (the same syntax as gitignore), and each clean candidate that matches with one of the patterns will be excluded from cleaning. When the user feels it's OK, presses ENTER and back to the confirmation dialog. Signed-off-by: Jiang Xin worldhello@gmail.com Suggested-by: Junio C Hamano gits...@pobox.com Spelling-checked-by: Eric Sunshine sunsh...@sunshineco.com Comments-by: Matthieu Moy matthieu@imag.fr Suggested-by: Eric Sunshine sunsh...@sunshineco.com --- Documentation/git-clean.txt | 15 +++- builtin/clean.c | 195 2 files changed, 191 insertions(+), 19 deletions(-) diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index bdc3a..f5572 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -8,7 +8,7 @@ git-clean - Remove untracked files from the working tree SYNOPSIS [verse] -'git clean' [-d] [-f] [-n] [-q] [-e pattern] [-x | -X] [--] path... +'git clean' [-d] [-f] [-i] [-n] [-q] [-e pattern] [-x | -X] [--] path... DESCRIPTION --- @@ -34,7 +34,18 @@ OPTIONS -f:: --force:: If the Git configuration variable clean.requireForce is not set - to false, 'git clean' will refuse to run unless given -f or -n. + to false, 'git clean' will refuse to run unless given -f, -n or + -i. + +-i:: +--interactive:: + Show what would be done and the user must confirm before actually + cleaning. In the confirmation dialog, the user can choose to abort + the cleaning, or enter into an edit mode. In the edit mode, the + user can input space-separated patterns (the same syntax as + gitignore), and each clean candidate that matches with one of the + patterns will be excluded from cleaning. When the user feels it's + OK, presses ENTER and back to the confirmation dialog. -n:: --dry-run:: diff --git a/builtin/clean.c b/builtin/clean.c index 04e39..29fbf 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -15,9 +15,12 @@ #include quote.h static int force = -1; /* unset */ +static int interactive; +static struct string_list del_list = STRING_LIST_INIT_DUP; +static const char **the_prefix; static const char *const builtin_clean_usage[] = { - N_(git clean [-d] [-f] [-n] [-q] [-e pattern] [-x | -X] [--] paths...), + N_(git clean [-d] [-f] [-i] [-n] [-q] [-e pattern] [-x | -X] [--] paths...), NULL }; @@ -142,6 +145,139 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, return ret; } +void edit_by_patterns_cmd() +{ + struct dir_struct dir; + struct strbuf confirm = STRBUF_INIT; + struct strbuf buf = STRBUF_INIT; + struct strbuf **ignore_list; + struct string_list_item *item; + struct exclude_list *el; + const char *qname; + int changed = -1, i; + + while (1) { + /* dels list may become empty when we run string_list_remove_empty_items later */ + if (!del_list.nr) { + printf_ln(_(No more files to clean, exiting.)); + break; + } + + if (changed) { + putchar('\n'); + + /* Display dels in Would remove ... format */ + for_each_string_list_item(item, del_list) { + qname = quote_path_relative(item-string, -1, buf, *the_prefix); + printf(_(msg_would_remove), qname); + } + putchar('\n'); + } + + printf(_(Input ignore patterns )); + if (strbuf_getline(confirm, stdin, '\n') != EOF) { + strbuf_trim(confirm); + } else { + putchar('\n'); + break; + } + + /* Quit edit mode */ + if (!confirm.len) + break; + + memset(dir, 0, sizeof(dir)); + el = add_exclude_list(dir, EXC_CMDL, manual exclude); + ignore_list = strbuf_split_max(confirm, ' ', 0); + + for (i = 0; ignore_list[i]; i++) { + strbuf_trim(ignore_list[i]); + if (!ignore_list[i]-len) + continue; + + add_exclude(ignore_list[i]-buf, , 0, el, -(i+1)); + } + + changed = 0; + for_each_string_list_item(item, del_list) { + int dtype = DT_UNKNOWN; + const
[PATCH v6 2/7] Show items of interactive git-clean in columns
When there are lots of items to be cleaned, it is hard to see them all in one screen. Show them in columns instead of in one column will solve this problem. Since no longer show items to be cleaned using the Would remove ... format (only plain filenames) in interactive mode, we add instructions and warnings as header before them. Signed-off-by: Jiang Xin worldhello@gmail.com Comments-by: Matthieu Moy matthieu@imag.fr --- Documentation/config.txt | 4 builtin/clean.c | 58 +--- 2 files changed, 44 insertions(+), 18 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 6e53f..98bfa 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -955,6 +955,10 @@ column.branch:: Specify whether to output branch listing in `git branch` in columns. See `column.ui` for details. +column.clean:: + Specify whether to output cleaning files in `git clean -i` in columns. + See `column.ui` for details. + column.status:: Specify whether to output untracked files in `git status` in columns. See `column.ui` for details. diff --git a/builtin/clean.c b/builtin/clean.c index 29fbf..43383 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -13,11 +13,13 @@ #include refs.h #include string-list.h #include quote.h +#include column.h static int force = -1; /* unset */ static int interactive; static struct string_list del_list = STRING_LIST_INIT_DUP; static const char **the_prefix; +static unsigned int colopts; static const char *const builtin_clean_usage[] = { N_(git clean [-d] [-f] [-i] [-n] [-q] [-e pattern] [-x | -X] [--] paths...), @@ -32,8 +34,13 @@ static const char *msg_warn_remove_failed = N_(failed to remove %s); static int git_clean_config(const char *var, const char *value, void *cb) { - if (!strcmp(var, clean.requireforce)) + if (!prefixcmp(var, column.)) + return git_column_config(var, value, clean, colopts); + + if (!strcmp(var, clean.requireforce)) { force = !git_config_bool(var, value); + return 0; + } return git_default_config(var, value, cb); } @@ -145,6 +152,33 @@ static int remove_dirs(struct strbuf *path, const char *prefix, int force_flag, return ret; } +void pretty_print_dels() +{ + struct string_list list = STRING_LIST_INIT_DUP; + struct string_list_item *item; + struct strbuf buf = STRBUF_INIT; + const char *qname; + struct column_options copts; + + for_each_string_list_item(item, del_list) { + qname = quote_path_relative(item-string, -1, buf, *the_prefix); + string_list_append(list, qname); + } + + /* +* always enable column display, we only consult column.* +* about layout strategy and stuff +*/ + colopts = (colopts ~COL_ENABLE_MASK) | COL_ENABLED; + memset(copts, 0, sizeof(copts)); + copts.indent = ; + copts.padding = 2; + print_columns(list, colopts, copts); + putchar('\n'); + strbuf_release(buf); + string_list_clear(list, 0); +} + void edit_by_patterns_cmd() { struct dir_struct dir; @@ -153,7 +187,6 @@ void edit_by_patterns_cmd() struct strbuf **ignore_list; struct string_list_item *item; struct exclude_list *el; - const char *qname; int changed = -1, i; while (1) { @@ -166,12 +199,8 @@ void edit_by_patterns_cmd() if (changed) { putchar('\n'); - /* Display dels in Would remove ... format */ - for_each_string_list_item(item, del_list) { - qname = quote_path_relative(item-string, -1, buf, *the_prefix); - printf(_(msg_would_remove), qname); - } - putchar('\n'); + /* Display dels in columns */ + pretty_print_dels(); } printf(_(Input ignore patterns )); @@ -228,23 +257,17 @@ void edit_by_patterns_cmd() void interactive_main_loop() { struct strbuf confirm = STRBUF_INIT; - struct strbuf buf = STRBUF_INIT; - struct string_list_item *item; - const char *qname; /* dels list may become empty after return back from edit mode */ while (del_list.nr) { + putchar('\n'); printf_ln(Q_(Would remove the following item:, Would remove the following items:, del_list.nr)); putchar('\n'); - /* Display dels in Would remove ... format */ - for_each_string_list_item(item, del_list) { - qname = quote_path_relative(item-string, -1, buf, *the_prefix); -
[PATCH v6 3/7] Add colors to interactive git-clean
Show header, help, error messages, and prompt in colors for interactive git-clean. Re-use config variables for other git commands, such as git-add--interactive and git-stash: * color.interactive: When set to always, always use colors for interactive prompts and displays. When false (or never), never. When set to true or auto, use colors only when the output is to the terminal. * color.interactive.slot: Use customized color for interactive git-clean output (like git add --interactive). slot may be prompt, header, help or error. Signed-off-by: Jiang Xin worldhello@gmail.com Comments-by: Matthieu Moy matthieu@imag.fr --- builtin/clean.c | 78 - 1 file changed, 77 insertions(+), 1 deletion(-) diff --git a/builtin/clean.c b/builtin/clean.c index 43383..6bda3 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -14,6 +14,7 @@ #include string-list.h #include quote.h #include column.h +#include color.h static int force = -1; /* unset */ static int interactive; @@ -32,16 +33,81 @@ static const char *msg_skip_git_dir = N_(Skipping repository %s\n); static const char *msg_would_skip_git_dir = N_(Would skip repository %s\n); static const char *msg_warn_remove_failed = N_(failed to remove %s); +static int clean_use_color = -1; +static char clean_colors[][COLOR_MAXLEN] = { + GIT_COLOR_RESET, + GIT_COLOR_NORMAL, /* PLAIN */ + GIT_COLOR_BOLD_BLUE,/* PROMPT */ + GIT_COLOR_BOLD, /* HEADER */ + GIT_COLOR_BOLD_RED, /* HELP */ + GIT_COLOR_BOLD_RED, /* ERROR */ +}; +enum color_clean { + CLEAN_COLOR_RESET = 0, + CLEAN_COLOR_PLAIN = 1, + CLEAN_COLOR_PROMPT = 2, + CLEAN_COLOR_HEADER = 3, + CLEAN_COLOR_HELP = 4, + CLEAN_COLOR_ERROR = 5, +}; + +static int parse_clean_color_slot(const char *var, int ofs) +{ + if (!strcasecmp(var+ofs, reset)) + return CLEAN_COLOR_RESET; + if (!strcasecmp(var+ofs, plain)) + return CLEAN_COLOR_PLAIN; + if (!strcasecmp(var+ofs, prompt)) + return CLEAN_COLOR_PROMPT; + if (!strcasecmp(var+ofs, header)) + return CLEAN_COLOR_HEADER; + if (!strcasecmp(var+ofs, help)) + return CLEAN_COLOR_HELP; + if (!strcasecmp(var+ofs, error)) + return CLEAN_COLOR_ERROR; + return -1; +} + static int git_clean_config(const char *var, const char *value, void *cb) { if (!prefixcmp(var, column.)) return git_column_config(var, value, clean, colopts); + /* honors the color.interactive* config variables which also + applied in git-add--interactive and git-stash */ + if (!strcmp(var, color.interactive)) { + clean_use_color = git_config_colorbool(var, value); + return 0; + } + if (!prefixcmp(var, color.interactive.)) { + int slot = parse_clean_color_slot(var, 18); + if (slot 0) + return 0; + if (!value) + return config_error_nonbool(var); + color_parse(value, var, clean_colors[slot]); + return 0; + } + if (!strcmp(var, clean.requireforce)) { force = !git_config_bool(var, value); return 0; } - return git_default_config(var, value, cb); + + /* inspect the color.ui config variable and others */ + return git_color_default_config(var, value, cb); +} + +static const char *clean_get_color(enum color_clean ix) +{ + if (want_color(clean_use_color)) + return clean_colors[ix]; + return ; +} + +static void clean_print_color(enum color_clean ix) +{ + printf(%s, clean_get_color(ix)); } static int exclude_cb(const struct option *opt, const char *arg, int unset) @@ -192,7 +258,9 @@ void edit_by_patterns_cmd() while (1) { /* dels list may become empty when we run string_list_remove_empty_items later */ if (!del_list.nr) { + clean_print_color(CLEAN_COLOR_ERROR); printf_ln(_(No more files to clean, exiting.)); + clean_print_color(CLEAN_COLOR_RESET); break; } @@ -203,7 +271,9 @@ void edit_by_patterns_cmd() pretty_print_dels(); } + clean_print_color(CLEAN_COLOR_PROMPT); printf(_(Input ignore patterns )); + clean_print_color(CLEAN_COLOR_RESET); if (strbuf_getline(confirm, stdin, '\n') != EOF) { strbuf_trim(confirm); } else { @@ -243,7 +313,9 @@ void edit_by_patterns_cmd() if (changed) { string_list_remove_empty_items(del_list, 0); } else { +
[PATCH v6 4/7] git-clean: use a git-add-interactive compatible UI
Rewrite menu using a new method `list_and_choose`, which is borrowed from `git-add--interactive.perl`. We can reused this method later for more actions. Please NOTE: * Method `list_and_choose` return an array of integers, and * it is up to you to free the allocated memory of the array. * The array ends with EOF. * If user pressed CTRL-D (i.e. EOF), no selection returned. Signed-off-by: Jiang Xin worldhello@gmail.com --- builtin/clean.c | 410 ++-- 1 file changed, 367 insertions(+), 43 deletions(-) diff --git a/builtin/clean.c b/builtin/clean.c index 6bda3..3b9f3 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -16,6 +16,35 @@ #include column.h #include color.h +#define MENU_OPTS_SINGLETON01 +#define MENU_OPTS_IMMEDIATE02 +#define MENU_OPTS_LIST_ONLY04 + +#define MENU_RETURN_NO_LOOP10 + +struct menu_opts { + const char *header; + const char *prompt; + int flag; +}; + +enum menu_stuff_type { + MENU_STUFF_TYPE_STRING_LIST = 1, + MENU_STUFF_TYPE_MENU_ITEM +}; + +struct menu_stuff { + enum menu_stuff_type type; + int nr; + void *stuff; +}; + +struct menu_item { + char hotkey; + char *title; + int (*fn)(); +}; + static int force = -1; /* unset */ static int interactive; static struct string_list del_list = STRING_LIST_INIT_DUP; @@ -240,12 +269,284 @@ void pretty_print_dels() copts.indent = ; copts.padding = 2; print_columns(list, colopts, copts); - putchar('\n'); strbuf_release(buf); string_list_clear(list, 0); } -void edit_by_patterns_cmd() +void pretty_print_menus(struct string_list *menu_list) +{ + struct strbuf buf = STRBUF_INIT; + unsigned int local_colopts = 0; + struct column_options copts; + + /* +* always enable column display, we only consult column.* +* about layout strategy and stuff +*/ + local_colopts = COL_ENABLED | COL_ROW; + memset(copts, 0, sizeof(copts)); + copts.indent = ; + copts.padding = 2; + print_columns(menu_list, local_colopts, copts); + strbuf_release(buf); +} + +void prompt_help_cmd(int singleton) +{ + clean_print_color(CLEAN_COLOR_HELP); + printf_ln(singleton ? + _(Prompt help:\n + 1 - select a numbered item\n + foo- select item based on unique prefix\n + - (empty) select nothing) : + _(Prompt help:\n + 1 - select a single item\n + 3-5- select a range of items\n + 2-3,6-9- select multiple ranges\n + foo- select item based on unique prefix\n + -... - unselect specified items\n + * - choose all items\n + - (empty) finish selecting)); + clean_print_color(CLEAN_COLOR_RESET); +} + +/* + * Implement a git-add-interactive compatible UI, which is borrowed + * from git-add--interactive.perl. + * + * Return value: + * + * - Return an array of integers + * - , and it is up to you to free the allocated memory. + * - The array ends with EOF. + * - If user pressed CTRL-D (i.e. EOF), no selection returned. + */ +int *list_and_choose(struct menu_opts *opts, struct menu_stuff *stuff) +{ + static struct string_list menu_list = STRING_LIST_INIT_DUP; + struct strbuf menu = STRBUF_INIT; + struct strbuf choice = STRBUF_INIT; + struct strbuf **choice_list; + int *chosen, *result; + char *p; + int nr = 0; + int i, j; + int eof = 0; + + chosen = xmalloc(sizeof(int) * stuff-nr); + memset(chosen, 0, sizeof(int) * stuff-nr); + + while (1) { + int i = 0, j = 0; + string_list_clear(menu_list, 0); + + if (opts-header) { + printf_ln(%s%s%s, + clean_get_color(CLEAN_COLOR_HEADER), + opts-header, + clean_get_color(CLEAN_COLOR_RESET)); + } + + /* highlight hotkey in menu */ + if (MENU_STUFF_TYPE_MENU_ITEM == stuff-type) { + struct menu_item *item; + + item = (struct menu_item *)stuff-stuff; + for (i = 0; i stuff-nr; i++, item++) { + p = item-title; + strbuf_addf(menu, %s%2d: , chosen[i] ? * : , i+1); + for (; *p; p++) { + if (*p == item-hotkey) { + strbuf_addstr(menu, clean_get_color(CLEAN_COLOR_PROMPT)); + strbuf_addch(menu,
[PATCH v3 10/9] revision.c: treat A...B merge bases as if manually specified
The documentation assures users that A...B is defined as 'r1 r2 --not $(git merge-base --all r1 r2)'. This isn't in fact quite true, because the calculated merge bases are not sent to add_rev_cmdline(). Previously, the effect was pretty minor - all that I can think of is that git rev-list --ancestry-path A B ^AB_base works, but git rev-list --ancestry-path A...B fails with a no bottom commits error. But now that all history walking cares about bottom commits, this failure to note the merge bases as bottoms can lead to worse results for A...B compared to A B ^AB_base. So ensure that the calculated merge bases are sent to add_rev_cmdline(), flagged as 'REV_CMD_MERGE_BASE'. Signed-off-by: Kevin Bracey ke...@bracey.fi --- revision.c | 9 +++-- revision.h | 2 ++ t/t6111-rev-list-treesame.sh | 4 3 files changed, 13 insertions(+), 2 deletions(-) diff --git a/revision.c b/revision.c index 99a3432..aad16a6 100644 --- a/revision.c +++ b/revision.c @@ -1305,12 +1305,16 @@ void init_revisions(struct rev_info *revs, const char *prefix) static void add_pending_commit_list(struct rev_info *revs, struct commit_list *commit_list, +int whence, unsigned int flags) { while (commit_list) { struct object *object = commit_list-item-object; + const char *sha1 = sha1_to_hex(object-sha1); object-flags |= flags; - add_pending_object(revs, object, sha1_to_hex(object-sha1)); + if (whence != REV_CMD_NONE) + add_rev_cmdline(revs, object, sha1, whence, flags); + add_pending_object(revs, object, sha1); commit_list = commit_list-next; } } @@ -1332,7 +1336,7 @@ static void prepare_show_merge(struct rev_info *revs) add_pending_object(revs, head-object, HEAD); add_pending_object(revs, other-object, MERGE_HEAD); bases = get_merge_bases(head, other, 1); - add_pending_commit_list(revs, bases, UNINTERESTING); + add_pending_commit_list(revs, bases, REV_CMD_MERGE_BASE, UNINTERESTING); free_commit_list(bases); head-object.flags |= SYMMETRIC_LEFT; @@ -1420,6 +1424,7 @@ int handle_revision_arg(const char *arg_, struct rev_info *revs, int flags, unsi if (symmetric) { exclude = get_merge_bases(a, b, 1); add_pending_commit_list(revs, exclude, + REV_CMD_MERGE_BASE, flags_exclude); free_commit_list(exclude); a_flags = flags | SYMMETRIC_LEFT; diff --git a/revision.h b/revision.h index 765630a..b2ac5a3 100644 --- a/revision.h +++ b/revision.h @@ -32,10 +32,12 @@ struct rev_cmdline_info { struct object *item; const char *name; enum { + REV_CMD_NONE, REV_CMD_REF, REV_CMD_PARENTS_ONLY, REV_CMD_LEFT, REV_CMD_RIGHT, + REV_CMD_MERGE_BASE, REV_CMD_REV } whence; unsigned flags; diff --git a/t/t6111-rev-list-treesame.sh b/t/t6111-rev-list-treesame.sh index 689d357..ded0b1a 100755 --- a/t/t6111-rev-list-treesame.sh +++ b/t/t6111-rev-list-treesame.sh @@ -161,6 +161,10 @@ check_result 'F' B..F --ancestry-path --simplify-merges -- file check_result 'F D' B..F --first-parent check_result 'F' B..F --first-parent -- file +# E...F should be equivalent to E F ^B, and be able to drop D as above. +check_result 'F' E F ^B -- file +check_result 'F' E...F -- file + # Any sort of full history of C..F should show D, as it's the connection to C, # and it differs from it. check_result 'F D B' C..F -- 1.8.3.rc0.28.g4b02ef5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 5/7] git-clean: interactive cleaning by select numbers
Draw a multiple choice menu using `list_and_choose` to select items to be deleted by numbers. User can input: * 1,5-7 : select 1,5,6,7 items to be deleted * * : select all items to be deleted * -*: unselect all, nothing will be deleted *: (empty) finish selecting, and return back to main menu Signed-off-by: Jiang Xin worldhello@gmail.com --- builtin/clean.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/builtin/clean.c b/builtin/clean.c index 3b9f3..3b07f 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -622,6 +622,43 @@ int edit_by_patterns_cmd() return 0; } +int edit_by_numbers_cmd() +{ + struct menu_opts menu_opts; + struct menu_stuff menu_stuff; + struct string_list_item *items; + int *chosen; + int i, j; + + menu_opts.header = NULL; + menu_opts.prompt = Select items to delete; + menu_opts.flag = 0; + + menu_stuff.type = MENU_STUFF_TYPE_STRING_LIST; + menu_stuff.stuff = del_list; + menu_stuff.nr = del_list.nr; + + chosen = list_and_choose(menu_opts, menu_stuff); + items = del_list.items; + for(i = 0, j = 0; i del_list.nr; i++) { + if (i chosen[j]) { + *(items[i].string) = '\0'; + } else if (i == chosen[j]) { + /* delete selected item */ + j++; + continue; + } else { + /* end of chosen (EOF), won't delete */ + *(items[i].string) = '\0'; + } + } + + string_list_remove_empty_items(del_list, 0); + + free(chosen); + return 0; +} + int quit_cmd() { string_list_clear(del_list, 0); @@ -635,6 +672,7 @@ int help_cmd(int x) printf_ln(_( clean- start cleaning\n edit by patterns - exclude items from deletion\n + edit by numbers - select items to be deleted by numbers\n quit - stop cleaning\n help - this screen\n ?- help for prompt selection @@ -652,6 +690,7 @@ void interactive_main_loop() struct menu_item menus[] = { {'c', clean, clean_cmd}, {'p', edit by patterns, edit_by_patterns_cmd}, + {'n', edit by numbers,edit_by_numbers_cmd}, {'q', quit, quit_cmd}, {'h', help, help_cmd}, }; -- 1.8.3.rc1.338.gb35aa5d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 6/7] git-clean: rm -i style interactive cleaning
Add a rm -i style interactive cleaning method. User must confirm one by one before starting to delete. Signed-off-by: Jiang Xin worldhello@gmail.com --- builtin/clean.c | 36 1 file changed, 36 insertions(+) diff --git a/builtin/clean.c b/builtin/clean.c index 3b07f..f36ad 100644 --- a/builtin/clean.c +++ b/builtin/clean.c @@ -659,6 +659,40 @@ int edit_by_numbers_cmd() return 0; } +int rm_i_cmd() +{ + struct strbuf confirm = STRBUF_INIT; + struct strbuf buf = STRBUF_INIT; + struct string_list_item *item; + const char *qname; + int changed = 0, eof = 0; + + for_each_string_list_item(item, del_list) { + /* Ctrl-D should stop removing files */ + if (!eof) { + qname = quote_path_relative(item-string, -1, buf, *the_prefix); + printf(_(remove %s ? ), qname); + if (strbuf_getline(confirm, stdin, '\n') != EOF) { + strbuf_trim(confirm); + } else { + putchar('\n'); + eof = 1; + } + } + if (!confirm.len || !strncasecmp(confirm.buf, no, confirm.len)) { + *item-string = '\0'; + changed++; + } + } + + if (changed) + string_list_remove_empty_items(del_list, 0); + + strbuf_release(buf); + strbuf_release(confirm); + return MENU_RETURN_NO_LOOP; +} + int quit_cmd() { string_list_clear(del_list, 0); @@ -673,6 +707,7 @@ int help_cmd(int x) clean- start cleaning\n edit by patterns - exclude items from deletion\n edit by numbers - select items to be deleted by numbers\n + rm -i- delete items one by one, like \rm -i\\n quit - stop cleaning\n help - this screen\n ?- help for prompt selection @@ -691,6 +726,7 @@ void interactive_main_loop() {'c', clean, clean_cmd}, {'p', edit by patterns, edit_by_patterns_cmd}, {'n', edit by numbers,edit_by_numbers_cmd}, + {'i', rm -i, rm_i_cmd}, {'q', quit, quit_cmd}, {'h', help, help_cmd}, }; -- 1.8.3.rc1.338.gb35aa5d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v6 7/7] git-clean: update document for interactive git-clean
Signed-off-by: Jiang Xin worldhello@gmail.com --- Documentation/git-clean.txt | 70 - 1 file changed, 63 insertions(+), 7 deletions(-) diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt index f5572..56d60 100644 --- a/Documentation/git-clean.txt +++ b/Documentation/git-clean.txt @@ -39,13 +39,8 @@ OPTIONS -i:: --interactive:: - Show what would be done and the user must confirm before actually - cleaning. In the confirmation dialog, the user can choose to abort - the cleaning, or enter into an edit mode. In the edit mode, the - user can input space-separated patterns (the same syntax as - gitignore), and each clean candidate that matches with one of the - patterns will be excluded from cleaning. When the user feels it's - OK, presses ENTER and back to the confirmation dialog. + Show what would be done and clean files interactively. See + ``Interactive mode`` for details. -n:: --dry-run:: @@ -74,6 +69,67 @@ OPTIONS Remove only files ignored by Git. This may be useful to rebuild everything from scratch, but keep manually created files. +Interactive mode + +When the command enters the interactive mode, it shows the +files and directories to be cleaned, and goes into its +interactive command loop. + +The command loop shows the list of subcommands available, and +gives a prompt What now . In general, when the prompt ends +with a single '', you can pick only one of the choices given +and type return, like this: + + +*** Commands *** + 1: clean 2: edit by patterns 3: edit by numbers + 4. rm -i 5. quit 6. help +What now 2 + + +You also could say `c` or `clean` above as long as the choice is unique. + +The main command loop has 6 subcommands. + +clean:: + + Start cleaning files and directories, and then quit. + +edit by patterns:: + + This shows the files and directories to be deleted and issues an + Input ignore patterns prompt. You can input a space-seperated + patterns to exclude files and directories from deletion. + E.g. *.c *.h will excludes files end with .c and .h from + deletion. When you are satisfied with the filtered result, press + ENTER (empty) back to the main menu. + +edit by numbers:: + + This shows the files and directories to be deleted and issues an + Select items to delete prompt. When the prompt ends with double + '' like this, you can make more than one selection, concatenated + with whitespace or comma. Also you can say ranges. E.g. 2-5 7,9 + to choose 2,3,4,5,7,9 from the list. If the second number in a + range is omitted, all remaining patches are taken. E.g. 7- to + choose 7,8,9 from the list. You can say '*' to choose everything. + Also when you are satisfied with the filtered result, press ENTER + (empty) back to the main menu. + +rm -i:: + + This will show a rm -i style cleaning, that you must confirm one + by one in order to delete items. This action is not as efficient + as the above two actions. + +quit:: + + This lets you quit without do cleaning. + +help:: + + Show brief usage of interactive git-clean. + SEE ALSO linkgit:gitignore[5] -- 1.8.3.rc1.338.gb35aa5d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] l10n: de.po: translate 44 new messages
Ralf Thielow ralf.thie...@gmail.com writes: #: branch.c:275 -#, fuzzy, c-format +#, c-format msgid Not a valid branch point: '%s'. -msgstr Ungültiger Zweig-Name: '%s' +msgstr Ungültige Zweig-Version: '%s' That's a bit strange and calls for a proper translation of branch point. Abzweigung is really tempting, but probably does nothing to clarify the intent ;-) Perhaps Abzweigpunkt. dict.leo.org has some other reasonable ideas. #: builtin/checkout.c:1065 msgid do not limit pathspecs to sparse entries only -msgstr +msgstr keine Beschränkung von Pfadspezifikationen zu eingeschränkten +Einträgen Huh?! :-) Even if you translated sparse entry as eingeschränkte Einträge -- it doesn't have a wiki entry -- using Beschränkung in the same sentence is just confusing. How about Pfadspezifikationen ignorieren Einstellungen zum partiellen Auschecken All the ones I didn't quote here: Acked-by: Thomas Rast tr...@inf.ethz.ch Thanks for your work! -- Thomas Rast trast@{inf,student}.ethz.ch -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/9] t6111: new TREESAME test set
Kevin Bracey ke...@bracey.fi writes: Some side branching and odd merging to illustrate various flaws in revision list scans, particularly when limiting the list. Many expected failures, which will be gone by the end of the history traversal refinements series. Signed-off-by: Kevin Bracey ke...@bracey.fi --- t/t6111-rev-list-treesame.sh | 180 +++ 1 file changed, 180 insertions(+) create mode 100755 t/t6111-rev-list-treesame.sh diff --git a/t/t6111-rev-list-treesame.sh b/t/t6111-rev-list-treesame.sh new file mode 100755 index 000..602c02d --- /dev/null +++ b/t/t6111-rev-list-treesame.sh @@ -0,0 +1,180 @@ +#!/bin/sh +# +#,---E--. *H--. * marks !TREESAME parent paths +# /\ / \* +# *A--*B---D--*F-*G-K-*L-*M +# \ /* \ / +#`-C-' `-*I-*J +# +# A creates file, B and F change it. +# Odd merge G takes the old version from B. +# I changes it, but J reverts it. +# H and L both change it, and M merges those changes. ... and because J is a revert of I, K ends up merging the same and being the same with both G and J. +test_description='TREESAME and limiting' + +. ./test-lib.sh + +note () { + git tag $1 +} + +unnote () { + git name-rev --tags --stdin | sed -e s|$_x40 (tags/\([^)]*\)) |\1 |g +} + +test_expect_success setup ' + test_commit Initial file file Hi there A + git branch other-branch + + test_commit file=Hello file Hello B + git branch third-branch + + git checkout other-branch + test_commit Added other other Hello C + + git checkout master + test_merge D other-branch + + git checkout third-branch + test_commit Third file third Nothing E + + git checkout master + test_commit file=Blah file Blah F + + test_tick git merge --no-commit third-branch + git checkout third-branch file + git commit + note G + git branch fiddler-branch + + git checkout -b part2-branch + test_commit file=Part 2 file Part 2 H + + git checkout fiddler-branch + test_commit Bad commit file Silly I + + test_tick git revert I note J + + git checkout master + test_tick git merge --no-ff fiddler-branch + note K + + test_commit file=Part 1 file Part 1 L + + test_tick test_must_fail git merge part2-branch + test_commit M file Parts 1+2 +' + +FMT='tformat:%P %H | %s' + +# could we soup this up to optionally check parents? So (BA)C would check +# that C is shown and has parents B A. Sounds like a good thing to do, especially given that some of the earlier issues you brought up triggered only when --parents is not used, so... Perhaps something like the attached. +check_outcome () { + outcome=$1 + shift + for c in $1 + do + echo $c + done expect Maybe printf %s\n $1 expect is more fashionable these days? +# Odd merge G drops a change in F. Important that G is listed in all +# except the most basic list. Achieving this means normal merge D will also be +# shown in normal full-history, as we can't distinguish unless we do a +# simplification pass. After simplification, D is dropped but G remains. +check_result 'M L K J I H G F E D C B A' I'll read the expectation and comment on them in a separate message. t/t6111-rev-list-treesame.sh | 30 +- 1 file changed, 21 insertions(+), 9 deletions(-) diff --git a/t/t6111-rev-list-treesame.sh b/t/t6111-rev-list-treesame.sh index 602c02d..f4c9cac 100755 --- a/t/t6111-rev-list-treesame.sh +++ b/t/t6111-rev-list-treesame.sh @@ -20,7 +20,7 @@ note () { } unnote () { - git name-rev --tags --stdin | sed -e s|$_x40 (tags/\([^)]*\)) |\1 |g + git name-rev --tags --stdin | sed -e s|$_x40 (tags/\([^)]*\))\([ ]\)|\1\2|g } test_expect_success setup ' @@ -66,23 +66,34 @@ test_expect_success setup ' test_commit M file Parts 1+2 ' -FMT='tformat:%P%H | %s' - # could we soup this up to optionally check parents? So (BA)C would check # that C is shown and has parents B A. check_outcome () { outcome=$1 shift - for c in $1 - do - echo $c - done expect - shift + + case $1 in + *(*) + FMT=%P %H | %s + munge_actual= + s/^\([^ ]*\)\([^ ]*\) .*/(\1)\2/ + s/ //g + s/()// + + ;; + *) + FMT=%H | %s + munge_actual=s/^\([^ ]*\) .*/\1/ + ;; + esac + printf %s\n $1 expect + shift + param=$* test_expect_$outcome log $param ' git log --format=$FMT $param | unnote actual - sed -e s/^.* \([^ ]*\) .*/\1/
Re: git rev-list | git cherry-pick --stdin is leaky
On 04/30/13 15:47, René Scharfe wrote: Am 30.04.2013 02:11, schrieb Stephen Boyd: (resending since the attachment seems to make vger sad) Hi, I'm running git rev-list | git cherry-pick --stdin on a range of about 300 commits. Eventually the chery-pick dies with: error: cannot fork() for commit: Cannot allocate memory Running valgrind shows me that the tree traversal code is leaking gigabytes of memory (particularly unpack_callback). Since cherry-pick is a very long running process all these allocations are never freed and eventually I run out of memory. The worst offender and summary is: ==7986== 938,956,692 (929,961,582 direct, 8,995,110 indirect) bytes in 7,765,439 blocks are definitely lost in loss record 257 of 257 ==7986==at 0x4C267CC: calloc (vg_replace_malloc.c:467) ==7986==by 0x4FAF57: xcalloc (wrapper.c:119) ==7986==by 0x4F5281: unpack_callback (unpack-trees.c:539) ==7986==by 0x4F40E5: traverse_trees (tree-walk.c:407) ==7986==by 0x4F586C: unpack_callback (unpack-trees.c:467) ==7986==by 0x4F40E5: traverse_trees (tree-walk.c:407) ==7986==by 0x4F586C: unpack_callback (unpack-trees.c:467) ==7986==by 0x4F40E5: traverse_trees (tree-walk.c:407) ==7986==by 0x4F586C: unpack_callback (unpack-trees.c:467) ==7986==by 0x4F40E5: traverse_trees (tree-walk.c:407) ==7986==by 0x4F586C: unpack_callback (unpack-trees.c:467) ==7986==by 0x4F40E5: traverse_trees (tree-walk.c:407) ==7986== ==7986== LEAK SUMMARY: ==7986==definitely lost: 2,514,117,692 bytes in 21,210,861 blocks ==7986==indirectly lost: 885,481,947 bytes in 10,165,801 blocks ==7986== possibly lost: 650,712,395 bytes in 6,014,309 blocks ==7986==still reachable: 7,734,870 bytes in 47,794 blocks ==7986== suppressed: 0 bytes in 0 blocks I looked at that particular leak a year ago but couldn't convince myself to submit the patch below. If the callback function we call through call_unpack_fn does something strange like free()ing entries itself or adding them to some list without duplication then the added free() can cause trouble. Looking at it again today I don't understand that concern any more. The current callback functions don't do something like that, in any case. Maybe I'm missing something. Anyway, could you please check if the patch helps with your use case? Ok. I tested it and it definitely helps. ==10728== LEAK SUMMARY: ==10728==definitely lost: 316,355,458 bytes in 8,652 blocks ==10728==indirectly lost: 1,327,251,588 bytes in 16,180,628 blocks ==10728== possibly lost: 677,049,918 bytes in 7,381,801 blocks ==10728==still reachable: 9,238,039 bytes in 63,947 blocks ==10728== suppressed: 0 bytes in 0 blocks vs. ==27614== LEAK SUMMARY: ==27614==definitely lost: 2,369,692,222 bytes in 20,005,707 blocks ==27614==indirectly lost: 829,151,786 bytes in 9,594,715 blocks ==27614== possibly lost: 658,069,373 bytes in 6,345,172 blocks ==27614==still reachable: 8,806,386 bytes in 50,199 blocks ==27614== suppressed: 0 bytes in 0 blocks -- Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 3/9] t6111: new TREESAME test set
Kevin Bracey ke...@bracey.fi writes: +#,---E--. *H--. * marks !TREESAME parent paths +# /\ / \* +# *A--*B---D--*F-*G-K-*L-*M +# \ /* \ / +#`-C-' `-*I-*J +# +# A creates file, B and F change it. +# Odd merge G takes the old version from B. +# I changes it, but J reverts it. +# H and L both change it, and M merges those changes. + ... +# Odd merge G drops a change in F. Important that G is listed in all +# except the most basic list. Achieving this means normal merge D will also be +# shown in normal full-history, as we can't distinguish unless we do a +# simplification pass. After simplification, D is dropped but G remains. +check_result 'M L K J I H G F E D C B A' OK. +check_result 'M H L K J I G E F D C B A' --topo-order OK. +check_result 'M L H B A' -- file OK. +check_result 'M L H B A' --parents -- file OK. +check_outcome failure 'M L J I H G F D B A' --full-history -- file # drops G This breaks the promise that full history at least gives a connected graph. +check_result 'M L K J I H G F D B A' --full-history --parents -- file And it is saved by --parents; the (parents)commit notation may make it easier to read the above two tests. +check_outcome failure 'M H L J I G F B A' --simplify-merges -- file # drops G The hsitory simplify merges gives ought to be a full history that keeps any commit that locally touches the specified path, so keeping I and J even though they do not contribute to the end result is correct. The path in question is the same between G and B, but again, F should be shown because it should be in the full history. Even though E simplifies down to B, making G a merge between B and F (the former is an ancestor of the latter), G is kept, because...? Because the path at B and F are different? I have to wonder what should happen if D makes path different from B, and then F makes path the same as B. B and F should still be kept. So because the path at B and F are different is not the reason why we keep G. How do we decide why G must be kept? +check_result 'M L K G F D B A' --first-parent +check_result 'M L G F B A' --first-parent -- file OK. +# Check that odd merge G remains shown when F is the bottom. I did not have a chance to say this when you responded in the previous round with that bottom thing before you sent this new round, but I am not sure if it is a good idea to pay special attention to the bottom-ness. Very often, instead of running git log -- path and stopping when seeing enough by killing the pager, people run git log HEAD~200.. -- path to pick recent enough commits, but the number 200 is very much arbitrary. +check_result 'M L K J I H G E' F..M OK. +check_result 'M H L K J I G E' F..M --topo-order OK. But a note about style: options should come before revision ranges. +check_result 'M L H' F..M -- file OK. It is still one possible history to explain what we have in the path at the end. +check_result 'M L H' F..M --parents -- file # L+H's parents rewritten to B, so more useful than it may seem OK. +check_outcome failure 'M L J I H G' F..M --full-history -- file # drops G G is same as E which is in the graph (F and B are the boundaries of the history, and F is no special than B in that regard). You want to show a merge that is different from at least one parent (as opposed to showing only when it is different from all parents), so you want to show G. OK. +check_result 'M L K J I H G' F..M --full-history --parents -- file OK. It almost makes me suspect that the --full-history option should flip revs-rewrite_parents on (but not revs-print_parents, unless the --parents option asks us to) when the option is parsed. +check_outcome failure 'M H L J I G' F..M --simplify-merges -- file # drops G The same as the --full-history without --parents before. +check_result 'M L K J I H G' F..M --ancestry-path +check_outcome failure 'M L J I H G' F..M --ancestry-path -- file # drops G With --ancestry-path, what G does relative to E becomes irrelevant, because E is not in the F..M graph. G changes path relative to its remaining parent F, so it should be shown. OK. +check_result 'M L K J I H G' F..M --ancestry-path --parents -- file OK. Again this makes me suspect --ancestry-path should flip revs-rewrite_parents on. +check_result 'M H L J I G' F..M --ancestry-path --simplify-merges -- file +check_result 'M L K G' F..M --first-parent +check_result 'M L G' F..M --first-parent -- file + +# Note that G is pruned when E is the bottom, even if it's the same commit list +# If we want history since E, then we're quite happy to ignore G that took E. +check_result 'M L K J I H G' E..M --ancestry-path OK. +check_result 'M L J I H' E..M --ancestry-path -- file Because F is outside our graph with --ancestry-path, G does not bring anything new to path relative to its remaining parent E and should not be shown.
Re: [PATCH v3 5/9] revision.c: Make --full-history consider more merges
Kevin Bracey ke...@bracey.fi writes: diff --git a/revision.c b/revision.c index a67b615..c88ded8 100644 --- a/revision.c +++ b/revision.c @@ -429,10 +429,100 @@ static int rev_same_tree_as_empty(struct rev_info *revs, struct commit *commit) return retval = 0 (tree_difference == REV_TREE_SAME); } +struct treesame_state { + unsigned int nparents; + unsigned char treesame[FLEX_ARRAY]; +}; I have been wondering if we want to do one-bit (not one-byte) per parent but no biggie ;-) @@ -1971,6 +2083,70 @@ static struct merge_simplify_state *locate_simplify_state(struct rev_info *revs, return st; } +static int mark_redundant_parents(struct rev_info *revs, struct commit *commit) +{ +... + po=po-next; po = po-next; + } This seems to be identical (modulo tests) from the previous round, which I found a reasonable thing to do. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Felipe Contreras felipe.contre...@gmail.com writes: The story is different on the fast-import side, where we do say we dump the full table and a later run can depend on these marks. Yes, and gaining nothing but increased disk-space. I thought that the gaining nothing has already been refuted by the discussion several hours ago... cf. http://thread.gmane.org/gmane.comp.version-control.git/223275/focus=223440 Puzzled... By discarding marks on blobs, we may be robbing some optimization possibilities, and by discarding marks on tags, we may be robbing some features, from users of fast-export; we might want to add an option --use-object-marks={blob,commit,tag} or something to both fast-export and fast-import, so that the former can optionally write marks for non-commits out, and the latter can omit non commit marks if the user do not need them. But that is a separate issue. How? * if we teach fast-import to optionally not write marks for blobs and trees out, your remote-bzr can take advantage of it, because it does not reuse marks for non-commits in later runs, right? Existing users like cvs2git that do not ask to skip marks for non-commits will not be hurt and keep referring to blobs an earlier run wrote out. * if we teach fast-export to optionally write marks for blobs and trees out, the users of fast-export could reuse marks for blobs and trees in later runs (perhaps they can drive fast-export from the output of git log --raw, noticing blob object names they already saw). Existing users that do not ask for such a feature will not be hurt. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On Mon, May 6, 2013 at 5:45 AM, Michael Haggerty mhag...@alum.mit.edu wrote: On 05/06/2013 12:32 PM, Thomas Rast wrote: Michael Haggerty mhag...@alum.mit.edu writes: On 05/03/2013 08:23 PM, Felipe Contreras wrote: On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote: Felipe Contreras felipe.contre...@gmail.com writes: How do we know that this doesn't break any users of fast-import? Your comment isn't very reassuring: the vast majority of them will never be used again So what's with the minority? Actually I don't think there's any minority. If the client program doesn't store blobs, the blob marks are not used anyway. So there's no change. I haven't been following this conversation in detail, but your proposed change sounds like something that would break cvs2git [1]. Let me explain what cvs2git does and why: CVS stores all of the revisions of a single file in a single filename,v file in rcsfile(5) format. The revisions are stored as deltas ordered so that a single revision can be reconstructed from a single serial read of the file. cvs2git reads each of these files once, reconstructing *all* of the revisions for a file in a single go. It then pours them into a git-fast-import stream as blobs and sets a mark on each blob. Only much later in the conversion does it have enough information to reconstruct tree-wide commits. At that time it outputs git-fast-import data (to a second file) defining the git commits and their ancestry. The contents are defined by referring to the marks of blobs from the first git-fast-import stream file. This strategy speeds up the conversion *enormously*. So if I understand correctly that you are proposing to stop allowing marks on blob objects to be set and/or referred to later, then I object vociferously. The proposed patch wants to stop writing marks (in --export-marks) for anything but commits. Does cvs2git depend on that? I.e., are you using two separate fast-import processes for the blob and tree/commit phases you describe above? Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. It can be used that way, but it doesn't have to be. So the proposed change would break a documented use of cvs2git. It's documented as an alternative. How many people actually use this form over the other? Is there really any advantage? It's possibly that basically nobody is using this form, and there's no benefits. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Nobody benefits from leaving the default as it is. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On Mon, May 6, 2013 at 7:20 AM, Johannes Schindelin johannes.schinde...@gmx.de wrote: Hi Thomas, On Mon, 6 May 2013, Thomas Rast wrote: The proposed patch wants to stop writing marks (in --export-marks) for anything but commits. Then it should not go in. If that rationale was valid, no change in behavior should ever go in. Of course, that is not the case, we want to move forward, so changes in behavior do happen. -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote: Michael Haggerty mhag...@alum.mit.edu writes: Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. So the proposed change would break a documented use of cvs2git. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Thanks for a concise summary. Your use case fits exactly what Felipe conjectured as the nonexistent minority. Not true. cvs2git does *not* rely on the blobs being stored in a marks file, because cvs2git does not rely on mark files at all. An option that lets the caller say I only care about marks on these types of objects to be written to (and read from) the exported marks file would help Felipe's use case without harming your use case, and would be a sane and safe way to go. His case is not harmed at all. It's only the unfortunate command that is mentioned in the documentation that didn't need to be mentioned at all in the first place. It should be the other way around, if it's only this documentation that is affected, we could add a switch for that particular command, and the documentation should be updated, but it's overkill to add a switch for one odd command in some documentation somewhere, it would be much better to update the odd command to avoid using marks at all, which is what the more appropriate command does, right below in the same documentation. cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import Should the rest of the real world be punished because somebody added a command in some documentation somewhere, which wasn't actually needed in the first place? -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3 10/9] revision.c: treat A...B merge bases as if manually specified
Kevin Bracey ke...@bracey.fi writes: The documentation assures users that A...B is defined as 'r1 r2 --not $(git merge-base --all r1 r2)'. This isn't in fact quite true, because the calculated merge bases are not sent to add_rev_cmdline(). We want the commands to be able to tell which ones in revs-pending and revs-commits were specified by the end user and how. While I think it makes sense to mark these negative merge bases with These came from the command line with A...B syntax, I am not sure if it is the best way to do so in add_pending_commit_list(). By the way, why does this have anything to do with the history traversal series in the first place? When there is anythning marked UNINTERESTING on the rev-pending before calling prepare_revision_walk(), you have a history with some bottom boundaries, and when there isn't, your bottom boundaries are root commits. If you want to behave differently depending on how the user gave us the revision range from the command line, e.g. acting differently between A ^B and B..A, cmdline is a good place to learn the exact form, but at the history traversal level, I do not think you should even care. Why does the code even look at the cmdline, and not rev-pending? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On Mon, May 6, 2013 at 4:19 PM, Felipe Contreras felipe.contre...@gmail.com wrote: On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote: Michael Haggerty mhag...@alum.mit.edu writes: Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. So the proposed change would break a documented use of cvs2git. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Thanks for a concise summary. Your use case fits exactly what Felipe conjectured as the nonexistent minority. Not true. cvs2git does *not* rely on the blobs being stored in a marks file, because cvs2git does not rely on mark files at all. An option that lets the caller say I only care about marks on these types of objects to be written to (and read from) the exported marks file would help Felipe's use case without harming your use case, and would be a sane and safe way to go. His case is not harmed at all. It's only the unfortunate command that is mentioned in the documentation that didn't need to be mentioned at all in the first place. It should be the other way around, if it's only this documentation that is affected, we could add a switch for that particular command, and the documentation should be updated, but it's overkill to add a switch for one odd command in some documentation somewhere, it would be much better to update the odd command to avoid using marks at all, which is what the more appropriate command does, right below in the same documentation. This would simplify the documentation, and obliterate the need to use mark files at all: diff -ur cvs2svn-2.4.0/www/cvs2git.html cvs2svn-2.4.0-mod/www/cvs2git.html --- cvs2svn-2.4.0/www/cvs2git.html 2012-09-22 01:49:55.0 -0500 +++ cvs2svn-2.4.0-mod/www/cvs2git.html 2013-05-06 16:33:12.070189985 -0500 @@ -355,14 +355,13 @@ fast-import/tt:/p pre -git fast-import --export-marks=../cvs2svn-tmp/git-marks.dat lt; ../cvs2svn-tmp/git-blob.dat -git fast-import --import-marks=../cvs2svn-tmp/git-marks.dat lt; ../cvs2svn-tmp/git-dump.dat +cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import /pre -pOn Linux/Unix this can be shortened to:/p +pOn Windows you should use type instead:/p pre -cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import +type ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import /pre /li Only in cvs2svn-2.4.0-mod/www: .cvs2git.html.swp -- Felipe Contreras -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: another packed-refs race
On Mon, May 06, 2013 at 02:41:22PM -0400, Jeff King wrote: That is a weaker guarantee, and I think we can provide it with: 1. Load all loose refs into cache for a particular enumeration. 2. Make sure the packed-refs cache is up-to-date (by checking its stat() information and reloading if necessary). 3. Run the usual iteration over the loose/packed ref caches. This does seem to work in my experiments. With stock git, I can trigger the race reliably with: # base load, in one terminal, as before git init -q repo cd repo git commit -q --allow-empty -m one one=`git rev-parse HEAD` git commit -q --allow-empty -m two two=`git rev-parse HEAD` sha1=$one while true; do # this re-creates the loose ref in .git/refs/heads/master if test $sha1 = $one; then sha1=$two else sha1=$one fi git update-ref refs/heads/master $sha1 # we can remove packed-refs safely, as we know that # its only value is now stale. Real git would not do # this, but we are simulating the case that master # simply wasn't included in the last packed-refs file. rm -f .git/packed-refs # and now we repack, which will create an up-to-date # packed-refs file, and then delete the loose ref git pack-refs --all --prune done # in another terminal, enumerate and make sure we never miss the ref cd repo while true; do refs=`git.compile for-each-ref --format='%(refname)'` echo == $refs test -z $refs break done It usually takes about 30 seconds to hit a problem, though I measured failures at up to 90 seconds. With the patch below (on top of the one I posted the other day, which refreshes the packed-refs cache in get_packed_refs), it has been running fine for 15 minutes. The noop_each_fn is a little gross. I could also just reimplement the recursion from do_for_each_ref_in_dir (except we don't care about flags, trim, base, etc, and would just be calling get_ref_dir recursively). It's a slight repetition of code, but it would be less subtle than what I have written below (which uses a no-op callback for the side effect that it primes the loose ref cache). Which poison do you prefer? diff --git a/refs.c b/refs.c index 45a7ee6..59ae7e4 100644 --- a/refs.c +++ b/refs.c @@ -1363,19 +1363,38 @@ static int do_for_each_ref(const char *submodule, const char *base, each_ref_fn for_each_rawref(warn_if_dangling_symref, data); } +static int noop_each_fn(const char *ref, const unsigned char *sha1, int flags, + void *data) +{ + return 0; +} + static int do_for_each_ref(const char *submodule, const char *base, each_ref_fn fn, int trim, int flags, void *cb_data) { struct ref_cache *refs = get_ref_cache(submodule); - struct ref_dir *packed_dir = get_packed_refs(refs); - struct ref_dir *loose_dir = get_loose_refs(refs); + struct ref_dir *packed_dir; + struct ref_dir *loose_dir; int retval = 0; - if (base *base) { - packed_dir = find_containing_dir(packed_dir, base, 0); + /* +* Prime the loose ref cache; we must make sure the packed ref cache is +* uptodate after we read the loose refs in order to avoid race +* conditions with a simultaneous pack-refs --prune. +*/ + loose_dir = get_loose_refs(refs); + if (base *base) loose_dir = find_containing_dir(loose_dir, base, 0); + if (loose_dir) { + sort_ref_dir(loose_dir); + do_for_each_ref_in_dir(loose_dir, 0, base, noop_each_fn, 0, 0, + NULL); } + packed_dir = get_packed_refs(refs); + if (base *base) + packed_dir = find_containing_dir(packed_dir, base, 0); + if (packed_dir loose_dir) { sort_ref_dir(packed_dir); sort_ref_dir(loose_dir); -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] Enable minimal stat checking
On Tue, Jan 22, 2013 at 08:49:22AM +0100, Robin Rosenberg wrote: Specifically the fields uid, gid, ctime, ino and dev are set to zero by JGit. Other implementations, eg. Git in cygwin are allegedly also somewhat incompatible with Git For Windows and on *nix platforms the resolution of the timestamps may differ. This is an old commit, but I noticed a bug today... This change introduces a core.checkstat config option where the [...] +core.checkstat:: [...] + if (!strcmp(var, core.statinfo)) { One of these is not like the others. I didn't prepare a patch, though, because I wasn't sure which it was supposed to be. A documentation bug or a code bug? :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Make $remote/$branch work with unconventional refspecs
On Mon, May 6, 2013 at 7:06 PM, Junio C Hamano gits...@pobox.com wrote: Johan Herland jo...@herland.net writes: Let me try to summarize my views on how refnames should work in Git, to see if we can identify where we differ on the principles (or if we, in fact, differ at all): Thanks; I think I already said where I think we differ in a separate message, but a short version is that the point of remote.$nick.fetch mapping is to solve The remote may call a ref $this, which is not the refname I want to or can use in my repository, so here is the rule to use when importing it in my local namespace. With the mapping, I can name the ref in my local namespace conveniently. E.g. their refs/heads/master cannot be our refs/heads/master at the same time, so use refs/remotes/origin/master. The result of the above mapping, be it remotes/origin/master or remotes/origin/heads/master, should be designed to be useful for the local use of the ref in question. If you further need to remap it when using it locally, there is something wrong in the mapping you defined in your remote.$nick.fetch mapping in the first place. Ok, so whereas I consider the refspec to be king, and that the expansion from convenient shorthands to full remote-tracking refnames should be derived from the chosen refspec, you would (if I understand you correctly) rather have a constant (i.e. independent of remotes and refspecs) set of rules for expanding shorthands to full refnames, and if the user chooses refspecs that don't mesh well with those rules, then that is the user's problem, and not Git's. We do not force any structure under refs/remotes/; it is left entirely up to the user, even though we would like to suggest the best current practice by teaching clone and remote add to lay them out in a certain way. If we were to suggest +refs/heads/*:refs/remotes/origin/heads/* as the best practice, I assume you do want origin/master to keep working. And since you do not want to use the configured refspec when expanding origin/master into refs/remotes/origin/heads/master, then I assume you would rather add a hardcoded (what I call a textual expansion in my patches) rule that would map $nick/$name into /refs/remotes/$nick/heads/$name But isn't the existence of such a rule evidence of us trying to impose (or at least hint) at a certain structure for refs/remotes/*? In light of this, I'm interested in your thoughts about the following related problem that I've just started looking at: git branch -r shows the remote-tracking branches in this repo. Currently, AFAICS, this just spits out all refs under refs/remotes/*. This behavior must clearly be modified if we are to allow remote-tracking tags at refs/remotes/$remote/tags/* (they currently show up in git branch -r, but shouldn't). One could say that the filter should merely change from refs/remotes/* to refs/remotes/*/heads/*, but this would break for existing (old-style) remotes. Should we add a heuristic for detecting when to use refs/remotes/* vs. refs/remotes/*/heads/* as a filter? My approach would be to iterate through the configured remotes, and for each remote list all refs that match the RHS of the refspec whose LHS is refs/heads/*. This would work for both old- and new-style remotes with no heuristics. If you agree that my approach is correct for enumerating remote-tracking branches, then what is different about using the refspec when expanding remote-tracking refs in general? In other words, given the following configuration: [remote origin] +refs/heads/*:refs/foo/bar/baz/* [remote foo] +refs/heads/*:refs/remotes/origin/heads/* 1. In your opininon, is refs/foo/bar/baz/master a remote-tracking branch? 2. Should refs/foo/bar/baz/master be listed by git branch -r? 3. Should the origin/master shorthand notation expand to refs/remotes/origin/heads/master from remote foo, or refs/foo/bar/baz/master from remote origin? ...Johan -- Johan Herland, jo...@herland.net www.herland.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] revision.c: Fix a sparse warning
Kevin Bracey ke...@bracey.fi writes: On 04/05/2013 20:25, Ramsay Jones wrote: Sparse issues an 'sole_interesting' not declared. Should it be static? warning. In order to suppress the warning, since this symbol does not need more than file visibility, we simply add the static modifier to its declaration. Thanks! I'll include that fix. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/7] t7900: Start testing usability of namespaced remote refs
Johan Herland jo...@herland.net writes: +test_expect_success 'work-around clone with namespaced remote refs' ' + rm -rf client + git init client + ( + cd client + git remote add origin ../server + git config --unset-all remote.origin.fetch + git config --add remote.origin.fetch +refs/heads/*:refs/remotes/origin/heads/* If you were to do this, I think you should drop the remote add origin step and illustrate what configuration variables should be prepared (at least minimally---the final implementation of git clone --separate-remote-layout may add some other configuration variable as a hint to say this remote is using the new layout or somesuch) in this client repository. That would make the test more self documenting. I am not convinced that it is a good idea to reuse remotes/origin hierarchy which traditionally has been branches-only like this, though. It may be better to use refs/$remotes_new_layout/origin/{heads,tags,...}/* for a value of $remotes_new_layout that is different from remote, and teach the dwim_ref() machinery to pay attention to it, to avoid confusion. Otherwise, you wouldn't be able to tell between a topic branch that works on tags named tags/refactor under the old layout, and a tag that marks a good point in a refactoring effort refactor under the new layout. + git config --add remote.origin.fetch +refs/tags/*:refs/remotes/origin/tags/* + git config --add remote.origin.fetch +refs/notes/*:refs/remotes/origin/notes/* + git config --add remote.origin.fetch +refs/replace/*:refs/remotes/origin/replace/* + git config remote.origin.tagopt --no-tags + git fetch + git checkout master + ) + test_clone client +' + +test_done -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/7] t7900: Demonstrate failure to expand $remote/$branch according to refspecs
Johan Herland jo...@herland.net writes: This test verifies that the following expressions all evaluate to the full refname refs/remotes/origin/heads/master: As I've aleady said, I am not convinced that local refname resolution should pay attention to refspec mapping, so I won't look at this step. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/7] refs.c: Refactor rules for expanding shorthand names into full refnames
Johan Herland jo...@herland.net writes: diff --git a/refs.c b/refs.c index 7231f54..8b02140 100644 --- a/refs.c +++ b/refs.c @@ -1724,7 +1724,24 @@ const char *prettify_refname(const char *name) 0); } -const char *ref_rev_parse_rules[] = { +static void ref_expand_txtly(const struct ref_expand_rule *rule, + char *dst, size_t dst_len, + const char *shortname, size_t shortname_len) +{ + mksnpath(dst, dst_len, rule-pattern, shortname_len, shortname); +} + +const struct ref_expand_rule ref_expand_rules[] = { + { ref_expand_txtly, %.*s }, + { ref_expand_txtly, refs/%.*s }, + { ref_expand_txtly, refs/tags/%.*s }, + { ref_expand_txtly, refs/heads/%.*s }, + { ref_expand_txtly, refs/remotes/%.*s }, + { ref_expand_txtly, refs/remotes/%.*s/HEAD }, + { NULL, NULL } +}; + +static const char *ref_rev_parse_rules[] = { %.*s, refs/%.*s, refs/tags/%.*s, @@ -1734,15 +1751,17 @@ const char *ref_rev_parse_rules[] = { NULL }; -int refname_match(const char *abbrev_name, const char *full_name, const char **rules) +int refname_match(const char *abbrev_name, const char *full_name, + const struct ref_expand_rule *rules) { - const char **p; + const struct ref_expand_rule *p; const int abbrev_name_len = strlen(abbrev_name); + char n[PATH_MAX]; Hmmm, is it too much to ask to do this without a fixed length buffer? I think we have long learned the value of using strbuf to avoid having to worry about buffer overruns. I am OK with the idea to make ref_expand_rules[] customizable, and the overall strategy taken by this step to refactor the current code looks reasonably sensible. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/7] refs.c: Refactor code for shortening full refnames into shorthand names
Johan Herland jo...@herland.net writes: This patch removes the only remaining user of ref_rev_parse_rules. It has now been fully replaced by ref_expand_rules. Hence this patch also removes ref_rev_parse_rules. Yeah, I was wondering when you would do this while reading [4/7], as removing the parse_rules[] would break shortener side, and leaving it in for long would risk allowing parse_rules[] and expand_rules[] to drift apart. -const struct ref_expand_rule ref_expand_rules[] = { - { ref_expand_txtly, %.*s }, - { ref_expand_txtly, refs/%.*s }, - { ref_expand_txtly, refs/tags/%.*s }, - { ref_expand_txtly, refs/heads/%.*s }, - { ref_expand_txtly, refs/remotes/%.*s }, - { ref_expand_txtly, refs/remotes/%.*s/HEAD }, - { NULL, NULL } -}; I wonder if you planned the previous step a bit better, this removal of a large block of text could have come next to the replacement of it we see after the addition of ref_shorten_txtly() function. +static char *ref_shorten_txtly(const struct ref_expand_rule *rule, +const char *refname) +{ +... +} -static const char *ref_rev_parse_rules[] = { - %.*s, - refs/%.*s, - refs/tags/%.*s, - refs/heads/%.*s, - refs/remotes/%.*s, - refs/remotes/%.*s/HEAD, - NULL +const struct ref_expand_rule ref_expand_rules[] = { + { ref_expand_txtly, NULL, %.*s }, + { ref_expand_txtly, ref_shorten_txtly, refs/%.*s }, + { ref_expand_txtly, ref_shorten_txtly, refs/tags/%.*s }, + { ref_expand_txtly, ref_shorten_txtly, refs/heads/%.*s }, + { ref_expand_txtly, ref_shorten_txtly, refs/remotes/%.*s }, + { ref_expand_txtly, ref_shorten_txtly, refs/remotes/%.*s/HEAD }, + { NULL, NULL, NULL } }; int refname_match(const char *abbrev_name, const char *full_name, char *shorten_unambiguous_ref(const char *refname, int strict) { int i; char *short_name; - /* buffer for scanf result, at most refname must fit */ - short_name = xstrdup(refname); - - /* skip first rule, it will always match */ - for (i = ARRAY_SIZE(ref_rev_parse_rules) - 1; i 0 ; --i) { + for (i = ARRAY_SIZE(ref_expand_rules) - 1; i = 0 ; --i) { ... /* * in strict mode, all (except the matched one) rules * must fail to resolve to a valid non-ambiguous ref */ if (strict) - rules_to_fail = ARRAY_SIZE(ref_rev_parse_rules); + rules_to_fail = ARRAY_SIZE(ref_expand_rules); This part obviously depends on 1/7; do we still have an off-by-one change from the original, or did I miscount when I reviewed 1/7? Again, the overall strategy to refactor sounds sound. It may be a lot simpler if you have ref_expand/shorten_append() and ref_expand/shortn_append_with_HEAD() built-in helper functions. Then you can perform the expansion and contraction without %.*s at all. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 6/7] refname_match(): Caller must declare if we're matching local or remote refs
Johan Herland jo...@herland.net writes: refname_match() is used to check whether a given shorthand name matches a given full refname, but that full refname does not always belong in the local repo, rather it is sometimes taken from list of refs sent over from a remote repo. That local vs remote is a wrong way to think about things. All refs you can feed to resolve_ref() and dwim_ref() are local, and remotes is not all that special. It is just a convention to store many different kind of things per different hierarchy, and the only special hierarchy we have is refs/heads/* that can be updated by making new commits through the index and the working tree. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 2/3] fast-export: improve speed by skipping blobs
Felipe Contreras felipe.contre...@gmail.com writes: How? * if we teach fast-import to optionally not write marks for blobs and trees out, your remote-bzr can take advantage of it, I already said remote-bzr is irrelevant. *Everybody* benefits. Everybody who does _not_ need to look at marks for non-commits from previous run does. What about the others who do? Surely, some lucky ones may get the benefit of a new optimization for free if you turn it on uncondtionally without an escape hatch, but that goes against our goal of not to knowingly introduce any regression. Michael's cvs2git might have a way to work the breakage around (I will let him comment on your suggested workaround), but as long as he has been coding it using the documented feature, why should he even change anything for no gain at all in the first place? Even if you have a workaround, that does not change the fact that a removal of a feature that somebody has been depending on is a regression. What's so difficult to understand that by default the responsibility of making sure an optimization applies safely to a program that uses a new optmization lies on that program, in other words, a new feature is by default an opt-in? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git apply --interactive
I'd love for git apply to have an equivalent to git add -p. (I'm guessing it would be called --interactive since git apply -p is taken and --patch would be confusing.) Has anyone thought about this? After taking a quick look at git-add--interactive.perl, it seems like the main steps would be: 1) making sure the patch comes from a file and not stdin, since stdin is required for the interactive input 2) collecting the input patch(es) and separating by output file 3) modifying parse_diff to accept these pre-split patches 4) coloring the patch Mark -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] Make $remote/$branch work with unconventional refspecs
Johan Herland jo...@herland.net writes: Ok, so whereas I consider the refspec to be king, and that the expansion from convenient shorthands to full remote-tracking refnames should be derived from the chosen refspec, you would (if I understand you correctly) rather have a constant (i.e. independent of remotes and refspecs) set of rules for expanding shorthands to full refnames, and if the user chooses refspecs that don't mesh well with those rules, then that is the user's problem, and not Git's. You need to dig your rhetoric from the other end of the tunnel. I would consider the local namespace for the refs to be the king, and we would design how they are resolved to be useful for user's local usage pattern. How the remote refs are copied by fetch following refspecs is one of the many components (e.g. the dwimming machinery checkout foo uses to guess that the user may want to fork from and integrate later with one of the refs under refs/remotes is one of them and it is not fetch) to complement the refname resolution rule to support the local usage of refs. In light of this, I'm interested in your thoughts about the following related problem that I've just started looking at: git branch -r shows the remote-tracking branches in this repo. Currently, Should we add a heuristic for detecting when to use refs/remotes/* vs. refs/remotes/*/heads/* as a filter? Didn't I already said that I do not think repurposing refs/remotes/ for these unified copies is the best approach? A change that I think is a good thing to add on top of your [45]/7 refactoring is to allow the user to add custom expansion/contraction rules. Then the user can group refs regardless of remotes and give meaningful shortening. As I said in a very early review, viewing fetch refspec to be king and refspecs are the only way the user may want to group things locally is myopic. The every-day usage of the local names ought to be the king, and everything else should serve to make it easier to use. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] fix packed-refs races
This fixes the races I brought up in the surrounding thread: http://thread.gmane.org/gmane.comp.version-control.git/223299 The individual races are describe in more detail in each commit, but for reference, here is the complete reproduction recipe (which I posted already in several parts throughout the thread, but is collected here): # base.sh # run this script forever in one terminal git init -q repo cd repo git commit -q --allow-empty -m one one=`git rev-parse HEAD` git commit -q --allow-empty -m two two=`git rev-parse HEAD` sha1=$one while true; do # this re-creates the loose ref in .git/refs/heads/master if test $sha1 = $one; then sha1=$two else sha1=$one fi git update-ref refs/heads/master $sha1 # we can remove packed-refs safely, as we know that # its only value is now stale. Real git would not do # this, but we are simulating the case that master # simply wasn't included in the last packed-refs file. rm -f .git/packed-refs # and now we repack, which will create an up-to-date # packed-refs file, and then delete the loose ref git pack-refs --all --prune done # lookup.sh # run this script simultaneously in another terminal; when it exits, # git is broken (it erroneously told us that master did not exist). # It is fixed by applying up to patch 3/4 below. cd repo while true; do ref=`git rev-parse --verify master` echo == $ref test -z $ref break done # enumerate.sh # run this script simultaneously in another terminal (either with # lookup.sh running, too, or just base.sh); when it exits, we # for-each-ref has erroneously missed master. It is fixed by applying # up to patch 4/4 below. cd repo while true; do refs=`git.compile for-each-ref --format='%(refname)'` echo == $refs test -z $refs break done I don't think is a purely hypothetical race. About once a month for the past six months or so, I'll run across a corrupted repository on github.com that is inexplicably missing a few objects near the tip of a ref. This last time, I happened to catch something very odd in our ref audit-log. The log for the repo that became corrupted showed a small push to the tip of master, but the objects for that push were missing (the previous ref tip was fine). At almost the exact same time, our alternates repo was running a fetch --prune from the corrupted repo. But it didn't see the corrupted master ref; it saw that refs/heads/master didn't exist at all, and it deleted it. I believe that git-upload-pack hit the enumeration race above, and showed a ref advertisement in which refs/heads/master was missing[1], leading fetch to delete the ref. At the same time, a gc was running in the corrupted repo which hit the same issue, and ended up pruning[2] the newly pushed objects, which were not referenced from anywhere else. It may seem unlikely for two programs to hit the race at the same time (especially given the number of iterations you need to trigger the race with the scripts above), but I think it's exacerbated when you have a very large number of refs (because the loose enumeration takes much longer). So I think there's a good chance this was my problem. And if not, it is good to fix anyway. :) [1/4]: resolve_ref: close race condition for packed refs [2/4]: add a stat_validity struct [3/4]: get_packed_refs: reload packed-refs file when it changes [4/4]: for_each_ref: load all loose refs before packed refs -Peff [1] Usually the enumeration race would cause you to see the old value from the packed-refs file, not a completely missing ref (unless the ref was not packed at all previously, but that is almost certainly not the case here). But it does call into resolve_ref to actually read the ref, which has its own race that can cause refs to disappear (and is fixed by patch 1). As an aside, I think that calling resolve_ref is probably wrong. We know we have a fully qualified refname, because we keep track of which directory we are calling readdir() on. But if for whatever reason somebody deletes it at the exact moment between when we see it in readdir() and when we try to open it (not packs it, but actually deletes it), then we'll see it is missing and fall back to trying other ref-lookups. So in theory a race with a delete of refs/heads/foo could accidentally retrieve the value for refs/heads/refs/heads/foo. It's quite unlikely, though, and I'm not too concerned about it. [2] Recently pushed objects should still be safe, even if they are unreachable. However, in our case the pruning was exacerbated by a gc codepath that could call git repack -ad (not in upstream git, but we do our own custom gc due to the alternates structure of our forks). I've also fixed that, as we should always be using -A to keep recent unreachable objects. -- To unsubscribe from this list: send the
[PATCH 1/4] resolve_ref: close race condition for packed refs
When we attempt to resolve a ref, the first thing we do is call lstat() to see if it is a symlink or a real ref. If we find that the ref is missing, we fall back to looking it up in the packed-refs file. If we find the loose ref does exist (and is a regular file), we continue with trying to open it. However, we do not do the same fallback if our open() call fails; we just report the ref as missing. A git pack-refs --prune process which is simultaneously running may remove the loose ref between our lstat() and open(). In this case, we would erroneously report the ref as missing, even though we could find it if we checked the packed-refs file. This patch solves it by factoring out the fallback code from the lstat() case and calling it from both places. We do not need to do the same thing for the symlink/readlink code path, even though it might receive ENOENT, too, because symlinks cannot be packed (so if it disappears after the lstat, it is truly being deleted). Note that this solves only the part of the race within resolve_ref_unsafe. In the situation described above, we may still be depending on a cached view of the packed-refs file; that race will be dealt with in a future patch. Signed-off-by: Jeff King p...@peff.net --- refs.c | 63 ++- 1 file changed, 42 insertions(+), 21 deletions(-) diff --git a/refs.c b/refs.c index de2d8eb..5a14703 100644 --- a/refs.c +++ b/refs.c @@ -1083,6 +1083,43 @@ static int get_packed_ref(const char *refname, unsigned char *sha1) return -1; } +/* + * This should be called from resolve_ref_unsafe when a loose ref cannot be + * accessed; err must represent the errno from the last attempt to access the + * loose ref, and the other options are forwarded from resolve_safe_unsaefe. + */ +static const char *handle_loose_ref_failure(int err, + const char *refname, + unsigned char *sha1, + int reading, + int *flag) +{ + /* +* If we didn't get ENOENT, something is broken +* with the loose ref, and we should not fallback, +* but instead pretend it doesn't exist. +*/ + if (err != ENOENT) + return NULL; + + /* +* The loose reference file does not exist; +* check for a packed reference. +*/ + if (!get_packed_ref(refname, sha1)) { + if (flag) + *flag |= REF_ISPACKED; + return refname; + } + + /* The reference is not a packed reference, either. */ + if (reading) + return NULL; + + hashclr(sha1); + return refname; +} + const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int reading, int *flag) { int depth = MAXDEPTH; @@ -1107,26 +1144,9 @@ const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int rea git_snpath(path, sizeof(path), %s, refname); - if (lstat(path, st) 0) { - if (errno != ENOENT) - return NULL; - /* -* The loose reference file does not exist; -* check for a packed reference. -*/ - if (!get_packed_ref(refname, sha1)) { - if (flag) - *flag |= REF_ISPACKED; - return refname; - } - /* The reference is not a packed reference, either. */ - if (reading) { - return NULL; - } else { - hashclr(sha1); - return refname; - } - } + if (lstat(path, st) 0) + return handle_loose_ref_failure(errno, refname, sha1, + reading, flag); /* Follow normalized - ie refs/.. symlinks by hand */ if (S_ISLNK(st.st_mode)) { @@ -1156,7 +1176,8 @@ const char *resolve_ref_unsafe(const char *refname, unsigned char *sha1, int rea */ fd = open(path, O_RDONLY); if (fd 0) - return NULL; + return handle_loose_ref_failure(errno, refname, sha1, + reading, flag); len = read_in_full(fd, buffer, sizeof(buffer)-1); close(fd); if (len 0) -- 1.8.3.rc1.2.g12db477 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at
[PATCH 3/4] get_packed_refs: reload packed-refs file when it changes
Once we read the packed-refs file into memory, we cache it to save work on future ref lookups. However, our cache may be out of date with respect to what is on disk if another process is simultaneously packing the refs. Normally it is acceptable for us to be a little out of date, since there is no guarantee whether we read the file before or after the simultaneous update. However, there is an important special case: our packed-refs file must be up to date with respect to any loose refs we read. Otherwise, we risk the following race condition: 0. There exists a loose ref refs/heads/master. 1. Process A starts and looks up the ref master. It first checks $GIT_DIR/master, which does not exist. It then loads (and caches) the packed-refs file to see if master exists in it, which it does not. 2. Meanwhile, process B runs pack-refs --all --prune. It creates a new packed-refs file which contains refs/heads/master, and removes the loose copy at $GIT_DIR/refs/heads/master. 3. Process A continues its lookup, and eventually tries $GIT_DIR/refs/heads/master. It sees that the loose ref is missing, and falls back to the packed-refs file. But it examines its cached version, which does not have refs/heads/master. After trying a few other prefixes, it reports master as a non-existent ref. There are many variants (e.g., step 1 may involve process A looking up another ref entirely, so even a fully qualified refname can fail). One of the most interesting ones is if refs/heads/master is already packed. In that case process A will not see it as missing, but rather will report whatever value happened to be in the packed-refs file before process B repacked (which might be an arbitrarily old value). We can fix this by making sure we reload the packed-refs file from disk after looking at any loose refs. That's unacceptably slow, so we can check it's stat()-validity as a proxy, and read it only when it appears to have changed. Reading the packed-refs file after performing any loose-ref system calls is sufficient because we know the ordering of the pack-refs process: it always makes sure the newly written packed-refs file is installed into place before pruning any loose refs. As long as those operations by B appear in their executed order to process A, by the time A sees the missing loose ref, the new packed-refs file must be in place. Signed-off-by: Jeff King p...@peff.net --- I hooked the refreshing into get_packed_refs, since then all callers get it for free. It makes me a little nervous, though, just in case some caller really cares about calling get_packed_refs but not having the list of packed-refs change during the call. peel_ref looks like such a function, but isn't, for reasons I'll explain in a followup patch. Clone also looks like such a caller, as it calls get_packed_refs once for each upstream ref it adds (it puts them in the packed-refs list, and then writes them all out at the end). But it's OK because there is no packed-refs file for it to refresh from. An alternative would be to move the refreshing to an explicit refresh_packed_refs() function, and call it from a few places (resolve_ref, and later from do_for_each_ref). refs.c | 20 +++- 1 file changed, 15 insertions(+), 5 deletions(-) diff --git a/refs.c b/refs.c index 5a14703..6afe8cc 100644 --- a/refs.c +++ b/refs.c @@ -708,6 +708,7 @@ static struct ref_cache { struct ref_cache *next; struct ref_entry *loose; struct ref_entry *packed; + struct stat_validity packed_validity; /* The submodule name, or for the main repo. */ char name[FLEX_ARRAY]; } *ref_cache; @@ -717,6 +718,7 @@ static void clear_packed_ref_cache(struct ref_cache *refs) if (refs-packed) { free_ref_entry(refs-packed); refs-packed = NULL; + stat_validity_clear(refs-packed_validity); } } @@ -878,17 +880,25 @@ static struct ref_dir *get_packed_refs(struct ref_cache *refs) static struct ref_dir *get_packed_refs(struct ref_cache *refs) { + const char *packed_refs_file; + + if (*refs-name) + packed_refs_file = git_path_submodule(refs-name, packed-refs); + else + packed_refs_file = git_path(packed-refs); + + if (refs-packed + !stat_validity_check(refs-packed_validity, packed_refs_file)) + clear_packed_ref_cache(refs); + if (!refs-packed) { - const char *packed_refs_file; FILE *f; refs-packed = create_dir_entry(refs, , 0, 0); - if (*refs-name) - packed_refs_file = git_path_submodule(refs-name, packed-refs); - else - packed_refs_file = git_path(packed-refs); + f = fopen(packed_refs_file, r); if (f) { + stat_validity_update(refs-packed_validity,
[PATCH 4/4] for_each_ref: load all loose refs before packed refs
If we are iterating through the refs using for_each_ref (or any of its sister functions), we can get into a race condition with a simultaneous pack-refs --prune that looks like this: 0. We have a large number of loose refs, and a few packed refs. refs/heads/z/foo is loose, with no matching entry in the packed-refs file. 1. Process A starts iterating through the refs. It loads the packed-refs file from disk, then starts lazily traversing through the loose ref directories. 2. Process B, running pack-refs --prune, writes out the new packed-refs file. It then deletes the newly packed refs, including refs/heads/z/foo. 3. Meanwhile, process A has finally gotten to refs/heads/z (it traverses alphabetically). It descends, but finds nothing there. It checks its cached view of the packed-refs file, but it does not mention anything in refs/heads/z/ at all (it predates the new file written by B in step 2). The traversal completes successfully without mentioning refs/heads/z/foo at all (the name, of course, isn't important; but the more refs you have and the farther down the alphabetical list a ref is, the more likely it is to hit the race). If refs/heads/z/foo did exist in the packed refs file at state 0, we would see an entry for it, but it would show whatever sha1 the ref had the last time it was packed (which could be an arbitrarily long time ago). This can be especially dangerous when process A is git prune, as it means our set of reachable tips will be incomplete, and we may erroneously prune objects reachable from that tip (the same thing can happen if repack -ad is used, as it simply drops unreachable objects that are packed). This patch solves it by loading all of the loose refs for our traversal into our in-memory cache, and then refreshing the packed-refs cache. Because a pack-refs writer will always put the new packed-refs file into place before starting the prune, we know that any loose refs we fail to see will either truly be missing, or will have already been put in the packed-refs file by the time we refresh. Signed-off-by: Jeff King p...@peff.net --- So this cache-priming is ugly, but I don't think it will have a performance impact, as it his hitting only directories and files that we were about to see in a second anyway during the real traversal. The only difference is that there is a slight latency before we would start producing any output, as we read the whole loose refs tree (well, the part we are interested for this call) before processing any refs. In theory we could do our lazy walk, and then at the end double-check that packed-refs had not changed. But if it does change, we couldn't output any newly found refs, then; they would be out of sorted order. The only thing we could do is die(). Which might be OK, and is certainly preferable to silently dropping a ref and indicating that we processed the whole list. But I think the latency is probably not that big a deal. refs.c | 38 +- 1 file changed, 33 insertions(+), 5 deletions(-) diff --git a/refs.c b/refs.c index 6afe8cc..c127baf 100644 --- a/refs.c +++ b/refs.c @@ -641,6 +641,21 @@ static int do_for_each_ref_in_dirs(struct ref_dir *dir1, } /* + * Load all of the refs from the dir into our in-memory cache. The hard work + * of loading loose refs is done by get_ref_dir(), so we just need to recurse + * through all of the sub-directories. We do not even need to care about + * sorting, as traversal order does not matter to us. + */ +static void prime_ref_dir(struct ref_dir *dir) +{ + int i; + for (i = 0; i dir-nr; i++) { + struct ref_entry *entry = dir-entries[i]; + if (entry-flag REF_DIR) + prime_ref_dir(get_ref_dir(entry)); + } +} +/* * Return true iff refname1 and refname2 conflict with each other. * Two reference names conflict if one of them exactly matches the * leading components of the other; e.g., foo/bar conflicts with @@ -1351,14 +1366,27 @@ static int do_for_each_ref(const char *submodule, const char *base, each_ref_fn int trim, int flags, void *cb_data) { struct ref_cache *refs = get_ref_cache(submodule); - struct ref_dir *packed_dir = get_packed_refs(refs); - struct ref_dir *loose_dir = get_loose_refs(refs); + struct ref_dir *packed_dir; + struct ref_dir *loose_dir; int retval = 0; - if (base *base) { - packed_dir = find_containing_dir(packed_dir, base, 0); + /* +* We must make sure that all loose refs are read before accessing the +* packed-refs file; this avoids a race condition in which loose refs +* are migrated to the packed-refs file by a simultaneous process, but +* our in-memory view is from before the migration. get_packed_refs() +* takes care of making sure our view is up to date with what is on +
[PATCH 0/2] peel_ref cleanups changes
On Mon, May 06, 2013 at 10:43:13PM -0400, Jeff King wrote: I hooked the refreshing into get_packed_refs, since then all callers get it for free. It makes me a little nervous, though, just in case some caller really cares about calling get_packed_refs but not having the list of packed-refs change during the call. peel_ref looks like such a function, but isn't, for reasons I'll explain in a followup patch. I thought I would need the cleanups below for peel_ref, but it turns out that the middle chunk of the function is never exercised. They conflict textually with Michael's mh/packed-refs-various topic, so I think it may make sense to just rebase them on top of that. [1/2]: peel_ref: rename sha1 argument to peeled [2/2]: peel_ref: refactor for safety with simultaneous update -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] peel_ref: rename sha1 argument to peeled
The sha1 argument of peel_ref is meant to hold the peeled object name for output. Let's call it peeled which makes it more clear that it is not an input sha1 (especially as we will be adding such an input sha1 in the next patch). Signed-off-by: Jeff King p...@peff.net --- Simple cleanup for the next step. refs.c | 8 refs.h | 2 +- 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/refs.c b/refs.c index de2d8eb..89f8141 100644 --- a/refs.c +++ b/refs.c @@ -1231,7 +1231,7 @@ static int filter_refs(const char *refname, const unsigned char *sha1, int flags return filter-fn(refname, sha1, flags, filter-cb_data); } -int peel_ref(const char *refname, unsigned char *sha1) +int peel_ref(const char *refname, unsigned char *peeled) { int flag; unsigned char base[20]; @@ -1242,7 +1242,7 @@ int peel_ref(const char *refname, unsigned char *sha1) if (current_ref-flag REF_KNOWS_PEELED) { if (is_null_sha1(current_ref-u.value.peeled)) return -1; - hashcpy(sha1, current_ref-u.value.peeled); + hashcpy(peeled, current_ref-u.value.peeled); return 0; } hashcpy(base, current_ref-u.value.sha1); @@ -1257,7 +1257,7 @@ int peel_ref(const char *refname, unsigned char *sha1) struct ref_entry *r = find_ref(dir, refname); if (r != NULL r-flag REF_KNOWS_PEELED) { - hashcpy(sha1, r-u.value.peeled); + hashcpy(peeled, r-u.value.peeled); return 0; } } @@ -1274,7 +1274,7 @@ fallback: if (o-type == OBJ_TAG) { o = deref_tag_noverify(o); if (o) { - hashcpy(sha1, o-sha1); + hashcpy(peeled, o-sha1); return 0; } } diff --git a/refs.h b/refs.h index a35eafc..1e8b4e1 100644 --- a/refs.h +++ b/refs.h @@ -61,7 +61,7 @@ extern int ref_exists(const char *); extern int ref_exists(const char *); -extern int peel_ref(const char *refname, unsigned char *sha1); +extern int peel_ref(const char *refname, unsigned char *peeled); /** Locks a refs/ ref returning the lock on success and NULL on failure. **/ extern struct ref_lock *lock_ref_sha1(const char *refname, const unsigned char *old_sha1); -- 1.8.3.rc1.2.g12db477 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On 05/06/2013 11:19 PM, Felipe Contreras wrote: On Mon, May 6, 2013 at 10:18 AM, Junio C Hamano gits...@pobox.com wrote: Michael Haggerty mhag...@alum.mit.edu writes: Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. So the proposed change would break a documented use of cvs2git. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Thanks for a concise summary. Your use case fits exactly what Felipe conjectured as the nonexistent minority. Not true. cvs2git does *not* rely on the blobs being stored in a marks file, because cvs2git does not rely on mark files at all. An option that lets the caller say I only care about marks on these types of objects to be written to (and read from) the exported marks file would help Felipe's use case without harming your use case, and would be a sane and safe way to go. His case is not harmed at all. It's only the unfortunate command that is mentioned in the documentation that didn't need to be mentioned at all in the first place. It should be the other way around, if it's only this documentation that is affected, we could add a switch for that particular command, and the documentation should be updated, but it's overkill to add a switch for one odd command in some documentation somewhere, it would be much better to update the odd command to avoid using marks at all, which is what the more appropriate command does, right below in the same documentation. cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import Should the rest of the real world be punished because somebody added a command in some documentation somewhere, which wasn't actually needed in the first place? Don't get too fixated on the documentation. The documentation just gives some examples of how cvs2git can be used. The reason that cvs2git outputs two files is that the first file is emitted at the very beginning of the conversion and the second at the very end. These conversions can take a long time ( 1 day for very big repos), can be interrupted and restarted between passes, and passes can even be re-run with changed configurations. CVS write access has to be turned off before the start of the final conversion, so no VCS is possible until the conversion is over. So users are very interested in keeping the downtime minimal. The blobfile can also be unwieldy (its size is approximately the sum of the sizes of all revisions of all files in the project). Being able to load the blobfile into one fast-import process and the dumpfile into a different process (which relies on the feature that you propose removing) opens up a lot of possibilities: * The first fast-import of the blobfile can be started as soon as the blobfile is complete and run in parallel with the rest of the conversion. * If the blobfile needs to be transferred over the network (e.g., because Git will be served from a different server than the one doing the conversion) the network transfer can also be done in parallel with the rest of the conversion. * The blobfile could be written to a named pipe that is being read by a git-fast-import process, to avoid having to write the blobfile to disk in the first place. * The user could run git repack between loading the blobfile and loading the dumpfile. These are just the ways that cvs2git does and/or could benefit from the flexibility that is now in git-fast-import. Other tools might also be using git-fast-import in ways that would be broken by your proposed change. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] peel_ref: refactor for safety with simultaneous update
The peel_ref function lets a caller peel an object, taking advantage of the fact that we may have the peeled value cached for a packed ref. However, this function is not necessarily safe in the face of simultaneous ref updates. The caller has typically already loaded the ref from disk (e.g., as part of a for_each_ref enumeration), and therefore knows what the ref's base sha1 is. But if the asked-for ref is not the current ref, we will load the ref from disk ourselves. If another process is simultaneously updating the ref, we may not get the same sha1 that the caller thinks is in the ref, and as a result may return a peeled value that does not match the sha1 that the caller has. To make this safer, we can have the caller pass in its idea of the sha1 of the ref, and use that as the base for peeling (and we can also double-check that it matches the current_ref pointer when we use that). This makes the function harder to call (it is not really peel this ref, but peel this object, and by the way, it is called by this refname). However, none of the current callers care, because they always call it from a for_each_ref enumeration, meaning it is no extra work to pass in the sha1. Furthermore, that means that none of the current callers hit the read_ref_full codepath at all (so this bug is not currently triggerable). They either follow the current_ref optimization, or if the ref is not packed, they go straight to the deref_tag fallback. Let's just rip out the unused code path entirely. Not only is it not used now, but it would not even make sense to use: you would get the peeled value of the ref, but you would have no clue which base object led to that peel. Signed-off-by: Jeff King p...@peff.net --- So I think peel_ref is buggy as-is (in both master and in mh/packed-refs-various), it's just not triggerable because of the dead code path. And it would get buggier again with my other patch series, as then get_packed_ref could actually re-read the packed-refs file. But I really think peel_ref is a badly designed function. You do not peel a ref; you peel an object. You might think it is convenient to have a function do the ref lookup and the peeling together, but in practice nobody wants that, because you would not know what you had just peeled. The real point is to hit the current_ref optimization. Michael's series already has factored out a peel_object function with the relevant bits. I think we can simply do away with peel_ref entirely, and move the current_ref optimization there. I think it should be perfectly cromulent to do: int peel_object(const unsigned char *base, const unsigned char *peeled) { if (current_ref current_ref-flag REF_KNOWS_PEELED !hashcmp(base, current_ref-u.value.sha1)) { hashcpy(peeled, current_ref-u.value.peeled); return 0; } /* peel as usual */ } That is, we do not need to care that it is _our_ ref that peels to that. We happen to have a ref whose peeled value we know, and its object matches the one we want to peel. Whether it is the same ref or not, the same object sha1 should peel to the same peeled sha1. Of course, in practice it will be our ref, because that is how the current callers work. But it is also a safe function for callers who do not know or care about the optimization. My original patch on master is below for illustration, but do not look too closely. I think the path I outlined above makes more sense, but I am not going to work on it tonight. builtin/describe.c | 2 +- builtin/pack-objects.c | 4 ++-- builtin/show-ref.c | 2 +- refs.c | 24 refs.h | 3 ++- upload-pack.c | 2 +- 6 files changed, 11 insertions(+), 26 deletions(-) diff --git a/builtin/describe.c b/builtin/describe.c index 6636a68..23d7f1a 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -150,7 +150,7 @@ static int get_name(const char *path, const unsigned char *sha1, int flag, void return 0; /* Is it annotated? */ - if (!peel_ref(path, peeled)) { + if (!peel_ref(path, sha1, peeled)) { is_annotated = !!hashcmp(sha1, peeled); } else { hashcpy(peeled, sha1); diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index f069462..76352df 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -556,7 +556,7 @@ static int mark_tagged(const char *path, const unsigned char *sha1, int flag, if (entry) entry-tagged = 1; - if (!peel_ref(path, peeled)) { + if (!peel_ref(path, sha1, peeled)) { entry = locate_object_entry(peeled); if (entry) entry-tagged = 1; @@ -2032,7 +2032,7 @@ static int add_ref_tag(const char *path, const unsigned char *sha1, int flag, vo unsigned char peeled[20]; if (!prefixcmp(path, refs/tags/) /* is a
Re: [PATCH 4/4] fast-import: only store commit objects
On 05/06/2013 11:36 PM, Felipe Contreras wrote: This would simplify the documentation, and obliterate the need to use mark files at all: As explained in my other email, this documentation change does not remove all of the reasons that users might want to use mark files. I would still like to show users how they can load the files into Git as two separate steps. diff -ur cvs2svn-2.4.0/www/cvs2git.html cvs2svn-2.4.0-mod/www/cvs2git.html --- cvs2svn-2.4.0/www/cvs2git.html2012-09-22 01:49:55.0 -0500 +++ cvs2svn-2.4.0-mod/www/cvs2git.html2013-05-06 16:33:12.070189985 -0500 @@ -355,14 +355,13 @@ fast-import/tt:/p pre -git fast-import --export-marks=../cvs2svn-tmp/git-marks.dat lt; ../cvs2svn-tmp/git-blob.dat -git fast-import --import-marks=../cvs2svn-tmp/git-marks.dat lt; ../cvs2svn-tmp/git-dump.dat +cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import /pre -pOn Linux/Unix this can be shortened to:/p +pOn Windows you should use type instead:/p pre -cat ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import +type ../cvs2svn-tmp/git-blob.dat ../cvs2svn-tmp/git-dump.dat | git fast-import /pre /li Only in cvs2svn-2.4.0-mod/www: .cvs2git.html.swp Nevertheless, it *would* be nice to add the Windows equivalent of the cat pipeline. I knew about the type command but I was under the impression that it is intended for text files and can corrupt binary files. Are you sure that using type as you suggest is binary-clean? Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] fast-import: only store commit objects
On 05/06/2013 11:04 PM, Felipe Contreras wrote: On Mon, May 6, 2013 at 5:45 AM, Michael Haggerty mhag...@alum.mit.edu wrote: On 05/06/2013 12:32 PM, Thomas Rast wrote: Michael Haggerty mhag...@alum.mit.edu writes: On 05/03/2013 08:23 PM, Felipe Contreras wrote: On Fri, May 3, 2013 at 12:56 PM, Thomas Rast tr...@inf.ethz.ch wrote: Felipe Contreras felipe.contre...@gmail.com writes: How do we know that this doesn't break any users of fast-import? Your comment isn't very reassuring: the vast majority of them will never be used again So what's with the minority? Actually I don't think there's any minority. If the client program doesn't store blobs, the blob marks are not used anyway. So there's no change. I haven't been following this conversation in detail, but your proposed change sounds like something that would break cvs2git [1]. Let me explain what cvs2git does and why: CVS stores all of the revisions of a single file in a single filename,v file in rcsfile(5) format. The revisions are stored as deltas ordered so that a single revision can be reconstructed from a single serial read of the file. cvs2git reads each of these files once, reconstructing *all* of the revisions for a file in a single go. It then pours them into a git-fast-import stream as blobs and sets a mark on each blob. Only much later in the conversion does it have enough information to reconstruct tree-wide commits. At that time it outputs git-fast-import data (to a second file) defining the git commits and their ancestry. The contents are defined by referring to the marks of blobs from the first git-fast-import stream file. This strategy speeds up the conversion *enormously*. So if I understand correctly that you are proposing to stop allowing marks on blob objects to be set and/or referred to later, then I object vociferously. The proposed patch wants to stop writing marks (in --export-marks) for anything but commits. Does cvs2git depend on that? I.e., are you using two separate fast-import processes for the blob and tree/commit phases you describe above? Yes, it can be handy to start loading the first blobfile in parallel with the later stages of the conversion, before the second dumpfile is ready. In that case the user needs to pass --export-marks to the first fast-import process to export marks on blobs so that the marks can be passed to the second fast-import via --import-marks. It can be used that way, but it doesn't have to be. Please see my other email for an explanation of how the flexibility of loading the two files separately can bring concrete benefits. So the proposed change would break a documented use of cvs2git. It's documented as an alternative. How many people actually use this form over the other? Is there really any advantage? It's possibly that basically nobody is using this form, and there's no benefits. You conjectured earlier that nobody uses blob marks, and I provided a counterexample. Then you proposed a workaround that would require changes to the cvs2git documentation, and I even explained how your proposed workaround is not as flexible as the status quo. Do you want to go through the same argument with every possible user of git-fast-import? It would be insanity to change the default behavior when a workaround on the Git side (namely adding an option that tells git-fast-import *not* to emit blob marks) would be quite straightforward to implement and have little maintenance cost. Making the export of blob marks optional would of course be OK, as long as the default is to export them. Nobody benefits from leaving the default as it is. Sure they do. Any tool that *knows* that it doesn't need blob marks can pass the new option and benefit. Other tools benefit from not being broken by your change. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html