Re: erratic behavior commit --allow-empty
Hi PJ and Hannes, try to run the last script that I posted, with and without a sleep 1 before the last commit: git init echo aaa f1 git add f1 git commit -m A git checkout --orphan sources git commit -m A --allow-empty and git init echo aaa f1 git add f1 git commit -m A git checkout --orphan sources sleep 1 git commit -m A --allow-empty In the first one, no new commit is created, and the sources branch is not orphan (you can easily see it with the git gui). In the second one, a new commit is created, and the sources branch is orphan, as expected. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Not answering questions does not help anyone. My question was: What is the point in insisting that there is a *really* new commit when the one commit that already existed has exactly the content that you wanted? -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Cc restored; please reply to all. Am 10/3/2012 8:32, schrieb Angelo Borsotti: Hi Hannes, well, I thought I replied to your question: What is the point in insisting that there is a *really* new commit when the one commit that already existed has exactly the content that you wanted? I wanted to create an orphan branch. I did it with a git checkout --orphan sources. This command alone does not create a branch; it needs a commit to be done on it, but a real one. If it is not a real one, the branch is created, but it is not an orphan one. When you do 'git checkout --orphan sources', you request (nothing more and nothing less than) that the next commit you make on the new branch sources does not have a parent. But this is exactly what happens: The next commit you make does not have a parent. Perhaps you are confused by the fact that the commit you made first does not have a parent, either. But that is just a side effect that it happened to be the very first commit that you made after 'git init'. IOW, the second commit that you made has all properties that you requested. (It just so happens that it is exactly identical to the first commit you made.) Your case does not demonstrate a bug in git. Why don't you use a different commit message to ensure that there is a difference between the commits? -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] gitk: Update Swedish translation (296t)
This patch updates the Swedish translation for gitk. To avoid the UTF-8 encoding of the file to be mangled by my email software, the patch is attached gzip'ed. -- \\// Peter - http://www.softwolves.pp.se/ 0001-gitk-Update-Swedish-translation-296t.patch.gz Description: Binary data
Re: [PATCH] l10n: Fix to Swedish translation
Junio C Hamano: I do not think there is any issue with conflicting patch or merge caused by applying this to maint, but I CC'ed Jiang to let him know what is going on. You might get a conflict in the header (in the PO-Revision-Date line). The fixed message itself is already in the 1.8.0.rc0 release. Thank you. -- \\// Peter - http://www.softwolves.pp.se/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
From: Angelo Borsotti angelo.borso...@gmail.com Hi Junio, It does create one; it just is the same one you already happen to have, when you record the same state on top of the same history as the same person at the same time. No, it does not create one: Angelo This is a semantics problem. It is like the confusion as to whether zero is a natural number that can be used in counting. In this case we have created two commits. However they are, by design and definition, identical to each other for this case of identical content and identical administration fields. They cannot be distinguished. So when the file system is asked to 'write' the second commit, it (the file system in conjunction with the git code) does a no-op, and reports 'done'. It is a common (systems) engineering problem. Software engineering usually allows an empty subroutine to exist, while physical engineering wouldn't. Git cannot have two unique but identical commits (a contradiction in terms). Normally git will create a new (different unique) commit for each and every commit, but in this special case a second identical commit was 'created', but the uniqueness requirement means it _is_ the same as the first commit. as you can see from the trace of the execution of my script, the sha of the commit is the same as that of the other, which means that in the .git/objects there is only one such commit object, and not two with the same sha. The meaning of the word create is to bring into being something that did not exist before. There is no creation if the object already exists. And how would it help what to insert a sleep for 1 second (or 1 year for that matter)? As you said, it reads from the system clock, and there are millions of systems in the world that have Git installed. You may record the same state on top of the same history as the same person on two different machines 5 minutes in wallclock time in between doing so. These two machines may end up creating the same commit because one of them had a clock skewed by 5 minutes. I understood that the command does not create a new commit if all its data, i.e. tree, committer, ... and date are the same, representing the date with 1 second precision. Sleeping for 1 second guarantees that there is no commit in the repo that has the same time as the time after the sleep, i.e. that the command creates a (new) commit. What problem are you really trying to solve? You mentioned importing from the foreign SCM, I quoted a piece of the man page of git commit, that states that --allow-empty bypasses the safety check that prevents to make a new commit. That piece incidentally states that it is primarily used by foreign SCM interface scripts. But of course it can be used in any script that needs to build a commit on top of another. You also did not seem to have read what I wrote, or deliberately ignored it (in which case I am wasting even more time writing this, so I'll stop). I did not deliberately ignore what you wrote. I might have missed some point though. This does not have anything to do with --allow-empty; removing the option would not help anything, either. I am reporting a problem with --allow-empty, so why you say that this does not have anything to do with it? Removing the option removes a behavior that is not predictable. Often it is better to remove a feature that turns out to be inconsistent than to leave it in the software. Of course a much better avenue is to make it consistent. Run the following on a fast-enough machine. I did, and obtained most of the times I was quick enough and sometimes I was not quick enough, which is the same kind of behavior of my script. The problem I am trying to solve is to push to a remote server the source files only, while keeping in the local repo both sources and binaries. To do it, I keep an orphan branch, say sources. When I make a commit on the master branch, I make also a commit on the sources one after having un-staged (git rm --cached) the binaries. The script that does this must cope also with the particular case in which in the commit on the master branch there are no sources. Basically the script does: # this is the commit on the master branch git init echo aaa f1 git add f1 git commit -m A # this is the piece of the script that builds the sources branch git checkout --orphan sources # git rm --cached ... remove binaries, if any git commit -m A --allow-empty git rev-list --all --pretty=oneline When there are binaries in the commit A, they are removed, and the tree for the second git commit is then different, and the commit is actually created. When there are no binaries (as in the script above, in which the removal is commented out), the second git commit would not create any new commit, and I would not have an orphan branch. Thence the --allow-empty to force it to create a new commit. Unfortunately, it creates a new commit only if the system clock changes the seconds of the
Re: erratic behavior commit --allow-empty
Hi Hannes, Perhaps you are confused by the fact that the commit you made first does not have a parent, either. But that is just a side effect that it happened to be the very first commit that you made after 'git init'. Well, I know that, and this is why I added --allow-empty. The man page of git commit (This option bypasses the safety, ...). I thought that it would unconditionally create a brand new, commit. Your case does not demonstrate a bug in git. The bug is that the git commit --allow-empty does a different action depending on whether the system clock has changed its seconds right before the command. This is a time-dependent behavior, and it is very harmful. Our applications must never behave differently depending on the time they are run or on the processor speed. It is an issue of correctness and robustness of software. To have a predictable behavior, i.e. to create a brand new commit with git commit --allow-empty, the command in a script must ALWAYS be preceded by a sleep 1 so as to make sure that the date and time it will use are for sure different from any other commits'. But then it would be a lot better to embed such a sleep in the command. If that is not possible, then the users must be warned in the man page that the command sometimes may not create a brand new commit, and that if the user instead wants it s/he should change something in the commit, like, e.g. the message. Why don't you use a different commit message to ensure that there is a difference between the commits? This is what eventually I did to force the creation of a brand new commit. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
In reply to Philip, I understand what the implementation does, but I am stating that it is not what the user (by reading the man page) expects. The user adds --allow-empty to have a different unique commit, such seems to be the purpose of the option. Unfortunately, it gets that only sometimes, depending on the exact instant in time the command is executed, which is out of his/her control. I think that you would agree with me that this is not a nice behaviour. How could a user ever use a command that is not predictable? If it is not possible to change the implementation, at least warn the user in the man page. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
Junio C Hamano gits...@pobox.com writes: I'll queue this instead. Thanks. Even better, perfect! -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git diff-file bug?
Many thanks to all who have responded to my question. I have found that something is, indeed, modifying the inodes for all the files in my repository. Our systems administrator executes a backup using tar with the --atime-preserve flag. It is this flag that modifies the changed time in the inode, and causes gitk to show that all my files have changed. Thanks, Scott. On 28 September 2012 21:40, Junio C Hamano gits...@pobox.com wrote: Scott Batchelor scott.batche...@gmail.com writes: I'm fairly new to git and am witnessing some strange behavior with git that I suspect may be a bug. Can anyone set my mind at rest. Every so often (I've not quite figured out the exact set of circumstances yet) Figure that circumstances out. That is the key to the issue. Something in your workflow is futzing with the inode data of the files in your working tree behind your back. It sometimes is a virus scanner. git diff-* plumbing commands are meant to be used after running git update-index --refresh once in the program and when the caller of these commands (in your case, gitk) knows that any change in the information returned by lstat(2) on the paths in the working tree files since that call indicate real changes to the files. git status internally runs an equivalent of --refresh before it goes to find changes, so after running it, until that something smudges the inode data behind your back, gitk will not be confused. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: I think that you would agree with me that this is not a nice behaviour. This is fundamentally how Git works. You probably didn't notice it, but if you do echo 'some content' file1.txt git add file1.txt git commit -m file1 echo 'some content' file2.txt git add file2.txt git commit -m file2 Then the second commit does not create a new blob object for file2.txt, because it has the same content as an existing one. But the point is: you really don't care, or indeed, you care about sharing the blob objects to save disk space. How could a user ever use a command that is not predictable? It is predictible: give it twice the same inputs in the same conditions, and it will yield the same output. You still didn't tell us where the problem was. You are unhappy with having twice the same sha1 for the same object, but what concrete bad consequence does this have? (except for saving bandwidth in addition to disk space when trying to push your commit) -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
On Tue, Oct 2, 2012 at 10:09 PM, Ramkumar Ramachandra artag...@gmail.com wrote: David Glasser wrote: Is the newish push.default documented in the git push manpage anywhere? I don't see it mentioned (and there are several references to the default behavior), but maybe I'm missing something. Is it left out on purpose (ie, config values aren't supposed to be mentioned in command manpages)? You're right. It's documented in `man git-config`, but we should probably mention it in the `git-push` manpage. Your patch is fine. I'm just thinking whether it's a good idea to add a section in the end of each command's man page to list all relevant config keys to that command, somewhat similar to see also section. It may help finding useful config keys that otherwise hard to find. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Matthiew, Then the second commit does not create a new blob object for file2.txt, because it has the same content as an existing one. But the point is: you really don't care, or indeed, you care about sharing the blob objects to save disk space. That is fine, and it is well documented. It is predictible: give it twice the same inputs in the same conditions, and it will yield the same output. Well, I have some difficulties to hit the return key while watching the system clock at the same time so as to make sure that the command is executed before the seconds change. So, it theory it would be predictable, but not in practice. Note that commands must be predictable for the user that writes them, i.e. the user must be able to figure out what the result is. Which is certainly not the case here. You still didn't tell us where the problem was. I described it few mails above. I wanted to create an orphan branch. The command to create it is git checkout --orphan. However, the branch is not actually created until a commit is done on it. Then I did such a commit (all this is placed in a script to be used by my developers), but if there are no changes, git commit does not create a new one. To force it to create a brand new one I added --allow-empty to it because the man page stated that it would bypass the check that prevents to make a new one. The I discovered that sometimes --allow-empty does not behave as expected. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
On Wed, Oct 3, 2012 at 3:17 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Hi Duy, Nguyen Thai Ngoc Duy wrote: Your patch is fine. I'm just thinking whether it's a good idea to add a section in the end of each command's man page to list all relevant config keys to that command, somewhat similar to see also section. It may help finding useful config keys that otherwise hard to find. That sounds like a good idea. Would you like to write the first patch (for git-push)? I won't be able to do it in the next 4 hours. If you want a stab at it, go ahead. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Fixing the p4merge launch shell script
Junio C Hamano gitster at pobox.com writes: Jeremy Morton admin at game-point.net writes: I've noticed that the p4merge shell script could do with some improvement when it comes to merging. Because p4merge throws up an error when one of the files it's given to diff is /dev/null, git needs to create a temporary empty file and pass that to p4merge when diffing a file that has been created/deleted (eg. create file, git add ., git diff --cached). ... Thoughts? Is there an easier way to do this? Which version of git? Perhaps you do not have ec245ba (mergetool: Provide an empty file when needed, 2012-01-19) yet? That patch fixes the mergetool part, but the part I was referring to was the difftool part, which still has this problem. Best regards, Jeremy Morton (Jez) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: The user adds --allow-empty to have a different unique commit Where does the manual say that --allow-empty implies a different and unique commit? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
On Wed, Oct 3, 2012 at 3:46 PM, Ramkumar Ramachandra artag...@gmail.com wrote: On second thought, it might not be such a good idea. There are *lots* of variables that control the operation of each command, and it's hard to decide which ones to list and which ones to omit. I've listed all the relevant variables for git-push, except the advice.* variables- I don't know how useful such a long list might be. I think listing receive.* and advice.* (and maybe even remove.name.*) is still ok. The goal is to give users a clue. They'll need to look up in config.txt anyway for explanation. If we name the config keys (and groups) well then users should be able to guess what those keys may be for before deciding whether to look into details. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: You still didn't tell us where the problem was. I described it few mails above. I wanted to create an orphan branch. And you did. The branch happens to point to the same commit as another existing commit, but this is a very common situation. Try this: # do arbitrary hacking and commit on branch master git checkout -b new-branch gitk You will see branches master and new-branch pointing to the same commit (but you HEAD points to new-branch, as git branch will tell you). You still did not describe a _problem_. Up to now, the only problem I see is that you have twice the same sha1 showing up, but you did not describe somethine concrete that you wanted to do and did not work. However, the branch is not actually created until a commit is done on it. Right, but the definition of done in your sentence includes reusing an object in the object database. I just tried this: rm -fr test git init test cd test date foo.txt git add . git commit --allow-empty -m foo git checkout --orphan new-branch git commit --allow-empty -m foo I ended up with a branch master and a branch new-branch, both pointing to the same commit. The new branch _is_ created. (BTW, --allow-empty is useless here as you have no parent) -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ENHANCEMENT] Allow '**' pattern in .gitignore
On Tue, Oct 2, 2012 at 3:24 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Stefano Lattarini wrote: On 10/02/2012 09:21 AM, Ramkumar Ramachandra wrote: Hi, I've often found the '**' (extended) shell glob useful for matching any string crossing directory boundaries: it's especially useful if you only have a toplevel .gitignore, as opposed to a per-directory .gitignore. Unfortunately, .gitignore currently uses fnmatch(3), and doesn't recognize '**'. Would extending the .gitignore format to accept this be a useful feature? Would it involve re-implementing and extending fnmatch, or is there some other way? I think there is a topic in flight about this: http://thread.gmane.org/gmane.comp.version-control.git/206406 While I'm behind this series, I have no use cases for it in my repositories. It's tested in git test suite but that's about it. Some feedback would be nice, especially on the performance side if you do a lot of ignores. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Andreas, Where does the manual say that --allow-empty implies a different and unique commit? In the git commit man page: --allow-empty Usually recording a commit that has the exact same tree as its sole parent commit is a mistake, and the command prevents you from making such a commit. This option bypasses the safety, and is primarily for use by foreign SCM interface scripts. By reading: the command prevents I understand that a new commit is not created, and This option bypasses that it is instead created. Perhaps my reading was a bit straightforward, but a man page is not a sort of ancient holy writing that the reader has to sift every word to understand hidden meanings, it should be something clear and plain. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Git diff-file bug?
On Wed, Oct 3, 2012 at 4:04 AM, Scott Batchelor scott.batche...@gmail.com wrote: Many thanks to all who have responded to my question. I have found that something is, indeed, modifying the inodes for all the files in my repository. Our systems administrator executes a backup using tar with the --atime-preserve flag. It is this flag that modifies the changed time in the inode, and causes gitk to show that all my files have changed. Thanks, Scott. Scott, We do that in our office @work. Perhaps this will help: core.trustctime If false, the ctime differences between the index and the working tree are ignored; useful when the inode change time is regularly modified by something outside Git (file system crawlers and some backup systems). See git-update-index(1). True by default. (Quoted via http://git-scm.com/docs/git-config ) When/if I have problems I set that false. (CC list reconstructed) On 28 September 2012 21:40, Junio C Hamano gits...@pobox.com wrote: Scott Batchelor scott.batche...@gmail.com writes: I'm fairly new to git and am witnessing some strange behavior with git that I suspect may be a bug. Can anyone set my mind at rest. Every so often (I've not quite figured out the exact set of circumstances yet) Figure that circumstances out. That is the key to the issue. Something in your workflow is futzing with the inode data of the files in your working tree behind your back. It sometimes is a virus scanner. git diff-* plumbing commands are meant to be used after running git update-index --refresh once in the program and when the caller of these commands (in your case, gitk) knows that any change in the information returned by lstat(2) on the paths in the working tree files since that call indicate real changes to the files. git status internally runs an equivalent of --refresh before it goes to find changes, so after running it, until that something smudges the inode data behind your back, gitk will not be confused. -- -Drew Northup -- As opposed to vegetable or mineral error? -John Pescatore, SANS NewsBites Vol. 12 Num. 59 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
Nguyen Thai Ngoc Duy: I'm just thinking whether it's a good idea to add a section in the end of each command's man page to list all relevant config keys to that command, somewhat similar to see also section. Yes, please. Discoverability of configuration settings is not very good at the moment. -- \\// Peter - http://www.softwolves.pp.se/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
On Wed, 03 Oct 2012 10:24:00 +0200, Angelo Borsotti angelo.borso...@gmail.com wrote: create a new one. To force it to create a brand new one I added --allow-empty to it because the man page stated that it would bypass the check that prevents to make a new one. The I discovered that sometimes --allow-empty does not behave as expected. The documentation only states that it will skip the 'same tree as parent' check, not that it will *always* create a new commit. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merging/joining two repos (repo2 should be a subdirectory of repo1)
Am 30.09.2012 22:44 schrieb David Aguilar: On Sun, Sep 30, 2012 at 8:32 AM, Dirk Süsserott newslet...@dirk.my1.cc wrote: Am 30.09.2012 17:24 schrieb Tomas Carnecky: On Sun, 30 Sep 2012 17:17:53 +0200, Dirk SÃŒsserott newslet...@dirk.my1.cc wrote: Hi! I have repo1 with ~4 years of history and another repo2 with ~1 year of history, both of which I don't want to loose. Now I want to join them so that repo2 becomes a subdirectory whithin repo1, including all the history of repo2. A simple git-merge won't do because both repos have some same files (at least e.g. .gitignore) in their root directories. Of course I could resolve the conflicts, but I don't want that. My naive approach is move everything in $repo2 one directory below and then merge $repo2 into $repo1. Actually I wouldn' call that a merge but an import. I know of git filter-branch --subdirectory-filter foodir but that's just the opposite of what I need. Is there a nifty trick to get this? Or will I have to do git filter-branch --tree-filter 'mkdir subdir git mv * subdir' --all on $repo2 and then git merge $repo2 in $repo1? http://www.kernel.org/pub/software/scm/git/docs/howto/using-merge-subtree.html Wow! Thanks for that quick and *very* helpful answer! :-) Hi Dirk, You should also take a look at contrib/subtree/ in the git source tree. git subtree does pretty much exactly what you're looking to do, and it is a bit more user-friendly than the plumbing commands. https://github.com/git/git/blob/master/contrib/subtree/git-subtree.txt Hi David, thanks for the pointer. I know of subtree and like it. But for my case I'll stick to the plumbing commands because I really want to *import* $repo2 into $repo1 and then delete $repo2. One shot. (Actually I re-wrote a part of our project just for fun and didn't do it in the main project's repo in a separate branch (as I normally do) but in a totaly separate repo. And now it turned out that my rewritten part is really cool and we want to include it in the main $repo1 and drop my private $repo2.) Dirk -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
On Tue, Oct 2, 2012 at 3:34 PM, Angelo Borsotti angelo.borso...@gmail.com wrote: Usually recording a commit that has the exact same tree as its sole parent commit is a mistake, and the command prevents you from making such a commit. This option bypasses the safety, and is primarily for use by foreign SCM interface scripts. Perhaps the confusion arises from the the meaning of the safety. In this case, the safety mechanism in place is to prevent you from creating a child commit which has the same tree contents (working directory) as the parent commit. It will not be the same commit because it has different parent(s) than its parent commit; but the tree (working directory) is the same and git normally prevents you from doing this because normally this is an accident, a mistake. --allow-empty tells git you intend to do this and so it should bypass this no changed files safety mechanism. It is not a safety to prevent you creating a new commit with the exact same sha1; the safety is concerned only with the exact same working directory file contents. Can you suggest a rewrite of this description which would make it more clear? Phil -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 7:36 PM, Ævar Arnfjörð Bjarmason ava...@gmail.com wrote: I'm creating a system where a lot of remotes constantly fetch from a central repository for deployment purposes, but I've noticed that even with a remote.$name.fetch configuration to only get certain refs a git fetch will still call git-upload pack which will provide a list of all references. This is being done against a repository with tens of thousands of refs (it has a tag for each deployment), so it ends up burning a lot of CPU time on the uploader/receiver side. If all refs are packed, will it still burn lots of CPU on server side? Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? It'll be a new protocol, not an extension for git protocol. Ref advertising is step 1. Capababilities are advertised much later. The client has to time to tell the server what protocol version it likes to use for step 1. (I looked at this protcol extension from a different angle. I wanted to compress the ref list for git protocol. But git over http compresses well so I don't care much.) On that git-over-http, I don't know, maybe git client can send something as http headers, which are recognized by the server end, to negotiate interested ref patterns? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Thomas, The documentation only states that it will skip the 'same tree as parent' check, not that it will *always* create a new commit. Ok, understood: you believe that the documentation is clear, and I that it is somehow not. I would prefer to have it more plain. But that is not all the story. The behavior of the command remains time-dependent, so that a user cannot reliably predict its result. I think that this is an ill-specified option. I would not insist in removing it (although that would be the correct solution), but at least to warn the user about this possibly unexpected behavior. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Phil I think what you are missing here is that the script does _not_ have to take care for this special case. The script can do the same thing it does for all the other cases and it will work just fine. This is because your goal, as I understand it, is this: A. Take this branch, B. Copy it but remove the binaries, C. Push it to the remote (with no binaries) If the branch has no binaries to begin with, then B is a no-op. Your insistence that the new commits get unique SHA1's is unnecessary and is what is causing your trouble. Suppose the branch has binaries. Then the only way to avoid to push them is to create an orphan branch (one that has no parents), otherwise git push will upload also the parent with its binaries. This is why there is a need to make the script perform different actions depending on the presence of the binaries. In the attempt to make the script handle both cases in a simple way I tried to make an empty commit, and discovered the time-dependent behavior of it. Consider this analogous operation: A. Take this file, B. Remove every line that does not contain foo, C. Cat the result to the console (with only foo lines) This example differs from the commit one in that the user has to cope with data that s/he can fully control (the contents of files), while in the other s/he has to cope with the passing of time, which s/he cannot control. So, taking the files I can predict the result, but taking the commits, I cannot because I do not know exactly when they will actually be run. Time is a sort of independent variable that I know only approximately (or very approximately when the commands are embedded in scripts). It seems to those more familiar with git that you are saying that this is the problem, that the operation did not work because the results are not unique each time. Exactly. But if you ignore the SHA1 of the commits and just rely on the branch names, I think you will be happier. This is because two branches can refer to the same SHA1 commit without causing any problem. You may find that sometimes when you push there is no update applied to the server. But this is not a mistake. It is simply that the server already has the same contents as you are pushing, even though your local branch name is different than it was before. Actually I ignore the SHA1 of the commits, and rely on the branch names I have topic branches and /src/topic branches. Developers push when they have something new. Of course the scripts must take care of when they are called and there is nothing to push, but that is not a big problem. I eventually found a workaround, which is to change the commit message, forcing then git commit to create a brand new commit. I think when you say orphan you mean it has a different SHA1 than any other commit. But this is not what orphan means. No, I mean that it has no parents. Actually, in the special case in which there are no binaries, I could create a branch that points to the same commit as the branch that it is mirroring, and push it. However, this has two disadvantages: 1. that it will not be an orphan while in the more general case it is, and 2, that the history of commits will be pushed to the remote server, while in the general case (with an orphan) it will not. I preferred to have a unique branch topology so as to make the picture as simple as possible for the developers. Note that eventually I solved the problem with a tweak. I still believe that the git commit command does not behave properly, and that changing nothing (implementation or documentation) leaves a drifting mine on which someone (or even myself) will stumble sooner or later. I am spending time to write all this because I care for git and I would really see it improving over time removing weak spots, and believe that you do the same. -Angelo Phil -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ENHANCEMENT] Allow '**' pattern in .gitignore
Am 03.10.2012 13:35, schrieb Nguyen Thai Ngoc Duy: On Tue, Oct 2, 2012 at 3:24 PM, Ramkumar Ramachandra artag...@gmail.com wrote: Stefano Lattarini wrote: On 10/02/2012 09:21 AM, Ramkumar Ramachandra wrote: Hi, I've often found the '**' (extended) shell glob useful for matching any string crossing directory boundaries: it's especially useful if you only have a toplevel .gitignore, as opposed to a per-directory .gitignore. Unfortunately, .gitignore currently uses fnmatch(3), and doesn't recognize '**'. Would extending the .gitignore format to accept this be a useful feature? Would it involve re-implementing and extending fnmatch, or is there some other way? I think there is a topic in flight about this: http://thread.gmane.org/gmane.comp.version-control.git/206406 While I'm behind this series, I have no use cases for it in my repositories. It's tested in git test suite but that's about it. Some feedback would be nice, especially on the performance side if you do a lot of ignores. I really like it as we do have use cases at my dayjob. Due to our naming conventions in subdirectories we have stuff like this in our .gitignore files: */foo/bar */*/foo/bar */*/*/foo/bar Using **/foo/bar instead would be a great improvement (I looked into adding that myself some time ago, but decided it wasn't a low hanging fruit). Maybe I'll find time do do some performance measurements until the weekend, what numbers are you interested in? Will a hot cache time git status be sufficient or are you interested in other numbers too? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ENHANCEMENT] Allow '**' pattern in .gitignore
On Wed, Oct 3, 2012 at 8:35 PM, Jens Lehmann jens.lehm...@web.de wrote: */foo/bar */*/foo/bar */*/*/foo/bar Using **/foo/bar instead would be a great improvement If this **/foo/bar (i.e. no wildcards except one ** at the beginning) is popular, we could optimize this case, turning fmatch() into strncmp(), just like what we do for foobar* -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: By reading: the command prevents I understand that a new commit is not created, and This option bypasses that it is instead created. But where does it say different and unique? Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] diff: introduce diff.submoduleFormat configuration variable
Am 02.10.2012 21:44, schrieb Jens Lehmann: Am 02.10.2012 18:51, schrieb Ramkumar Ramachandra: Introduce a diff.submoduleFormat configuration variable corresponding to the '--submodule' command-line option of 'git diff'. Nice. Maybe a better name would be diff.submodule, as this sets the default for the --submodule option of diff? And I think you should also test in t4041 that --submodule=short overrides the config setting. We also need tests which show that setting that config to log does not break one of the many users of git diff (stash, rebase and format-patch come to mind, most probably I missed some others). I suspect we'll have to add --submodule=short options to some call sites to keep them working with submodule changes. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: [...] making then the orphan branch point to the master one, i.e. becoming a non-orphan one. I understand both parts of the sentense, but not the i.e.. And I still don't see a concrete problem. two branches point to the same commit is not a problem, it's an observation. I have branches pointing to the same commit all the time. I ended up with a branch master and a branch new-branch, both pointing to the same commit. The new branch _is_ created. Exactly, it is created, but it is not an orphan ... or more precisely, it is sometimes, depending on how fast you are to enter the second commit command. This time-dependent behaviour is what I am talking about. You don't understand what an orphan branch is. What git checkout --orphan git commit does is that it creates a commit that doesn't have parent (hence the name orphan, btw). It does in your case. You _do_ create an orphan commit regardless of the timing. The fact that another branch points to the same commit is a different matter, and you still didn't explain why this was problematic. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
On Wed, Oct 3, 2012 at 9:35 AM, Angelo Borsotti angelo.borso...@gmail.com wrote: Hi Phil I think what you are missing here is that the script does _not_ have to take care for this special case. The script can do the same thing it does for all the other cases and it will work just fine. This is because your goal, as I understand it, is this: A. Take this branch, B. Copy it but remove the binaries, C. Push it to the remote (with no binaries) If the branch has no binaries to begin with, then B is a no-op. Your insistence that the new commits get unique SHA1's is unnecessary and is what is causing your trouble. Suppose the branch has binaries. Then the only way to avoid to push them is to create an orphan branch (one that has no parents), otherwise git push will upload also the parent with its binaries. This is true only if the root commit also has binaries. Otherwise it is fine to push a branch with the common ancestor. Suppose A does not have binaries but B and C do. A---B---C Now we need to make a new branch ending at C' which has no binaries: A---B---C \ ---B'---C' A already has no binaries, so we did not need to make an A'. Now we can push C' to the server and no binaries will be pushed. That is because the server will receive only these commits: A---B'---C' This is why there is a need to make the script perform different actions depending on the presence of the binaries. In the attempt to make the script handle both cases in a simple way I tried to make an empty commit, and discovered the time-dependent behavior of it. Every commit is time-dependent. You tried to make a _unique_ empty commit, and this is where you ran into trouble. I think your uniqueness constraint is overkill. Consider this analogous operation: A. Take this file, B. Remove every line that does not contain foo, C. Cat the result to the console (with only foo lines) This example differs from the commit one in that the user has to cope with data that s/he can fully control (the contents of files), while in the other s/he has to cope with the passing of time, which s/he cannot control. So, taking the files I can predict the result, but taking the commits, I cannot because I do not know exactly when they will actually be run. Time is a sort of independent variable that I know only approximately (or very approximately when the commands are embedded in scripts). You need not be concerned with the time on the commit, nor the uniqueness of the SHA1. It seems to those more familiar with git that you are saying that this is the problem, that the operation did not work because the results are not unique each time. Exactly. But if you ignore the SHA1 of the commits and just rely on the branch names, I think you will be happier. This is because two branches can refer to the same SHA1 commit without causing any problem. You may find that sometimes when you push there is no update applied to the server. But this is not a mistake. It is simply that the server already has the same contents as you are pushing, even though your local branch name is different than it was before. Actually I ignore the SHA1 of the commits, and rely on the branch names I have topic branches and /src/topic branches. Developers push when they have something new. Of course the scripts must take care of when they are called and there is nothing to push, but that is not a big problem. I eventually found a workaround, which is to change the commit message, forcing then git commit to create a brand new commit. Doesn't this force git always to push new commits even though the contents match commits already on the server? I think when you say orphan you mean it has a different SHA1 than any other commit. But this is not what orphan means. No, I mean that it has no parents. Actually, in the special case in which there are no binaries, I could create a branch that points to the same commit as the branch that it is mirroring, and push it. However, this has two disadvantages: 1. that it will not be an orphan while in the more general case it is, and 2, that the history of commits will be pushed to the remote server, while in the general case (with an orphan) it will not. I preferred to have a unique branch topology so as to make the picture as simple as possible for the developers. It seems to me that you are creating unnecessary work for the server and for your scripts. But perhaps I do not fully understand your use case. Note that eventually I solved the problem with a tweak. I still believe that the git commit command does not behave properly, and that changing nothing (implementation or documentation) leaves a drifting mine on which someone (or even myself) will stumble sooner or later. I am spending time to write all this because I care for git and I would really see it improving over time removing weak spots, and believe that you do the same. You may suggest
Re: erratic behavior commit --allow-empty
Hi Phil, Perhaps the confusion arises from the the meaning of the safety. In this case, the safety mechanism in place is to prevent you from creating a child commit which has the same tree contents (working directory) as the parent commit. It will not be the same commit because it has different parent(s) than its parent commit; but the tree (working directory) is the same and git normally prevents you from doing this because normally this is an accident, a mistake. --allow-empty tells git you intend to do this and so it should bypass this no changed files safety mechanism. It is not a safety to prevent you creating a new commit with the exact same sha1; the safety is concerned only with the exact same working directory file contents. Can you suggest a rewrite of this description which would make it more clear? Instead of: Usually recording a commit that has the exact same tree as its sole parent commit is a mistake, and the command prevents you from making such a commit. This option bypasses the safety, and is primarily for use by foreign SCM interface scripts. I would suggest: Usually recording a commit that has the exact same tree as its sole parent commit is not allowed, and the command prevents you from making such a commit. This option allows to disregard this condition, thereby making a commit even when the trees are the same. Note that when the tree, author, parents, message and date (with the precision of one second) are the same as those of an existing commit object, no new commit object is created, and the identity of the existing one is returned. Phil -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Andreas, But where does it say different and unique? It does not, but it says: Usually recording a commit that has the exact same tree as its sole parent commit is a mistake, and the command prevents you from making such a commit., followed by This option bypasses the safety ... leading to thing that the option negates that prevents above. I do understand that by reading very carefully each word of these sentences one can eventually figure out that the option removes the check on the tree only, and that all the others remain, including the one on the identity of the time. However, it does not say that the time must be equal with the approximation of one second. Apart from this detail, it does not state plainly that no commit object is created. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Matthiew, You don't understand what an orphan branch is. I do not think so. I wanted to create a branch with a commit that has no parent, and I think that this is called orphan branch. I wanted also to have another branch, pointing to a different commit, the difference being that this contains binaries, and the other does not. So, having two references pointing to the same commit is not a problem for me, but it is not either the solution. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
git reset respect remote repo (make git idiot proof)
Suppose this case: git clone .../blessedRepo.git // do changes git commit -mbad1 // do changes git commit -mbad2 git reset --hard HEAD^4 // Why does it let me do this? // I just broke my local repository, because if I continue // do changes git commit -mgood1 git push origin master // fails because the history disrespects the remote repo's history The following commands are ok to do (because I have 2 unpushed commits): git reset --hard^1 git reset --hard^2 but these are not and should be prevented (unless forced): git reset --hard^3 git reset --hard^4 Is there any way to make git idiot proof by enabling that the local repo should always respect the history of the remote repo (unless forced)? Is there any way to make this a default for anyone who clones our blessed repo? No one that clones our blessed repo wants to come into the situation above. And if they do, they can always force it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: Hi Matthiew, You don't understand what an orphan branch is. I do not think so. I wanted to create a branch with a commit that has no parent, and I think that this is called orphan branch. Yes, and this is what you did. I wanted also to have another branch, pointing to a different commit, the difference being that this contains binaries, and the other does not. If they contain different content, they will be different commits, with different sha1. So, having two references pointing to the same commit is not a problem for me, So, you have no problem. End of discussion for me, sorry. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset respect remote repo (make git idiot proof)
Op 03-10-12 16:59, Ramkumar Ramachandra schreef: Hi Geoffrey, Geoffrey De Smet wrote: [...] The following commands are ok to do (because I have 2 unpushed commits): git reset --hard^1 git reset --hard^2 but these are not and should be prevented (unless forced): git reset --hard^3 git reset --hard^4 Is there any way to make git idiot proof by enabling that the local repo should always respect the history of the remote repo (unless forced)? Is there any way to make this a default for anyone who clones our blessed repo? No one that clones our blessed repo wants to come into the situation above. And if they do, they can always force it. This makes little sense. Which remote? To all remotes which have a relationship to this repo with the -respectRemote flag. Normally, only the blessed remote will have this. What if I have multiple remotes? Which branch? (Many of my branches are behind `master`). Everytime a branch merges or rebases with a remote repository, it's flags the last commit of that remote repository as the pointOfNoReset commit. If local branches merge or rebase with a local branch, the pointOfNoReset commit is transitively applied (only the last one wins). git reset will fail to reset beyond the pointOfNoReset commit, unless forced. Branches that are behind master, will have a pointOfNoReset commit in their history, if master goes forward afterwards, that won't affect those branches, not until they are merged. What if I want different histories on different remotes? Don't use the -respectRemote flag in the relationship between those 2 repo's. What about more advanced operations which implicitly 'reset' like rebase? Them too. All operations would need to follow the -respectRemote flag's limitations, unless forced. What if I want to rewrite history? Don't use the -respectRemote flag in the relationship between this repo and any other repo. Or force it. Ram -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset respect remote repo (make git idiot proof)
On Wed, Oct 3, 2012 at 10:49 AM, Geoffrey De Smet ge0ffrey.s...@gmail.com wrote: Suppose this case: git clone .../blessedRepo.git // do changes git commit -mbad1 // do changes git commit -mbad2 git reset --hard HEAD^4 // Why does it let me do this? // I just broke my local repository, because if I continue // do changes git commit -mgood1 git push origin master // fails because the history disrespects the remote repo's history The following commands are ok to do (because I have 2 unpushed commits): git reset --hard^1 git reset --hard^2 but these are not and should be prevented (unless forced): git reset --hard^3 git reset --hard^4 Is there any way to make git idiot proof by enabling that the local repo should always respect the history of the remote repo (unless forced)? Is there any way to make this a default for anyone who clones our blessed repo? I suppose if we go down this path we must also prevent users from having any local branches whose names match those used on the remote unless the remote branches are also ancestors of our local branch. But then we may get into trouble when we pull new branches which now conflict but previously did not. I'm afraid this is a Pandora's box of woes. But I feel your pain. I think the solution lies in relegating 'reset' to the plumbing or the power-user realm of commands since I feel it is quite overloaded and sometimes dangerous. There was a thread some months back heading in this direction, but I failed to keep it going. http://comments.gmane.org/gmane.comp.version-control.git/185825 Phil -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: it does not state plainly that no commit object is created. But the commit object _is_ created, it just doesn't have a unique name. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git reset respect remote repo (make git idiot proof)
Geoffrey De Smet ge0ffrey.s...@gmail.com writes: Suppose this case: git clone .../blessedRepo.git // do changes git commit -mbad1 // do changes git commit -mbad2 git reset --hard HEAD^4 // Why does it let me do this? Because there is nothing wrong with that. // I just broke my local repository, because if I continue No you didn't. // do changes git commit -mgood1 git push origin master // fails because the history disrespects the remote repo's history You may just as well want to push it to a different branch (or even a different repository). Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
HI PJ, take a git commit without --allow-empty: if the trees are equal, it creates no commit, and if the trees are different it creates one. Take then a git commit --allow-empty: if the trees are equal it may create a commit or not depending on the parent, message, author and date; if the trees are different it creates a commit. So, the statement does not apply to commits in general. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 03, 2012 at 02:36:00PM +0200, Ævar Arnfjörð Bjarmason wrote: I'm creating a system where a lot of remotes constantly fetch from a central repository for deployment purposes, but I've noticed that even with a remote.$name.fetch configuration to only get certain refs a git fetch will still call git-upload pack which will provide a list of all references. This is being done against a repository with tens of thousands of refs (it has a tag for each deployment), so it ends up burning a lot of CPU time on the uploader/receiver side. Where is the CPU being burned? Are your refs packed (that's a huge savings)? What are the refs like? Are they .have refs from an alternates repository, or real refs? Are they pointing to commits or tag objects? What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? I don't think so. It would be hard to do in a backwards-compatible way, because the advertisement is the first thing the server says, before it has negotiated any capabilities with the client at all. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
Nguyen Thai Ngoc Duy pclo...@gmail.com writes: There's an interesting case: **foo. According to our rules, that pattern does not contain slashes therefore is basename match. But some might find that confusing because ** can match slashes,... By our rules, if you mean if a pattern has slash, it is anchored, that obviously need to be updated with this series, if ** is meant to match multiple hierarchies. I think the latter makes more sense. When users put ** they expect to match some slashes. But that may call for a refactoring in path_matches() in attr.c. Putting strstr(pattern, **) in that matching function may increase overhead unnecessarily. The third option is just die() and let users decide either *foo, **/foo or /**foo, never **foo. For the double-star at the beginning, you should just turn it into **/ if it is not followed by a slash internally, I think. What is the semantics of ** in the first place? Is it described to a reasonable level of detail in the documentation updates? For example does **foo match afoo, a/b/foo, a/bfoo, a/foo/b, a/bfoo/c? Does x**y match xy, xay, xa/by, x/a/y? I am guessing that the only sensible definition is that ** requires anything that comes before it (if exists) is at a proper hierarchy boundary, and anything matches it is also at a proper hierarchy boundary, so x**y matches x/a/y and not xy, xay, nor xa/by in the above example. If x**y can match xy or xay (or **foo can match afoo), it would be unreasonable to say it implies the pattern is anchored at any level, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
Nguyen Thai Ngoc Duy pclo...@gmail.com writes: On Wed, Oct 3, 2012 at 3:46 PM, Ramkumar Ramachandra artag...@gmail.com wrote: On second thought, it might not be such a good idea. There are *lots* of variables that control the operation of each command, and it's hard to decide which ones to list and which ones to omit. I've listed all the relevant variables for git-push, except the advice.* variables- I don't know how useful such a long list might be. I think listing receive.* and advice.* (and maybe even remove.name.*) is still ok. The goal is to give users a clue. They'll need to look up in config.txt anyway for explanation. If we name the config keys (and groups) well then users should be able to guess what those keys may be for before deciding whether to look into details. Please do not label the list as These variables affect this command to give a false impression that it is the complete list if it isn't. Unless somebody promises to keep an up-to-date complete list there (or even better, come up with a mechanism to help us keep that promise automatically, perhaps by annotating pieces with structured comments in config.txt and automatically appending such a section to manual pages of relevant commands), that is. With a weaker phrase, e.g. These configuration variables may be of interest, such a list may not hurt readers, but personally I do not think it adds much value to have a list of variables without even a single line description of what each is used for. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal: create meaningful aliases for git reset's hard/soft/mixed
Phil Hord phil.h...@gmail.com writes: I flagged this for followup in my MUA, but I failed to follow-up after the holidays. I apologize for that, and I really regret it because I liked where this was going. I really regret to see you remembered it, actually. 1) Newbie user clones/pulls a repository from somewhere. He hacks around and then things go bad, and he decides to scratch away everything he did to make sure things are like they're supposed to be. He'd then type git checkout --force --clean master. If he didn't introduce new files, he would simply type git checkout --force master I like this just fine. I think we can explicitly say that HEAD is the implied default refspec, yes? git checkout --force --clean That depends on what the hacks around involved. Where is he now, what damage did he cause, and what can you depend on to take him to a clean state, where the definition of clean happens to match this hypothetical Newbie user? Did he do git checkout of another branch? Did he commit? Did he reset to other commit while on the 'master' branch? Is he still on master branch when he says git checkout --force --clean master? Can he say git checkout --force --clean master~4 and what does that even mean? Is he trying to go into the detached HEAD state, or is he somehow trying to rewind master? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
Nguyen Thai Ngoc Duy pclo...@gmail.com writes: On Wed, Oct 3, 2012 at 3:46 PM, Ramkumar Ramachandra artag...@gmail.com wrote: On second thought, it might not be such a good idea. There are *lots* of variables that control the operation of each command, and it's hard to decide which ones to list and which ones to omit. I've listed all the relevant variables for git-push, except the advice.* variables- I don't know how useful such a long list might be. I think listing receive.* and advice.* (and maybe even remove.name.*) is still ok. The goal is to give users a clue. They'll need to look up in config.txt anyway for explanation. If we name the config keys (and groups) well then users should be able to guess what those keys may be for before deciding whether to look into details. I would recommend against listing any advice.* in the command manual pages. They are meant to give an advice in cases that are often confusing to new people and are supposed to advise how to turn it off in the message. We want users to see them and understand the situation, and more importantly, we want to strongly discourage users to decline them until seeing them and understand what they advise for or against. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
Jeff King p...@peff.net writes: Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? I don't think so. It would be hard to do in a backwards-compatible way, because the advertisement is the first thing the server says, before it has negotiated any capabilities with the client at all. That is being discussed but hasn't surfaced on the list. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 03, 2012 at 11:53:35AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? I don't think so. It would be hard to do in a backwards-compatible way, because the advertisement is the first thing the server says, before it has negotiated any capabilities with the client at all. That is being discussed but hasn't surfaced on the list. Out of curiosity, how are you thinking about triggering such a new behavior in a backwards-compatible way? Invoke git-upload-pack2, and fall back to reconnecting to start git-upload-pack if it fails? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Proposal: create meaningful aliases for git reset's hard/soft/mixed
Junio C Hamano gits...@pobox.com writes: Phil Hord phil.h...@gmail.com writes: I flagged this for followup in my MUA, but I failed to follow-up after the holidays. I apologize for that, and I really regret it because I liked where this was going. I really regret to see you remembered it, actually. Having said that, I am glad that you brought the old discussion thread to our attention. In http://thread.gmane.org/gmane.comp.version-control.git/185825/focus=185863, I said that git reset --keep started out as an ugly workaround for the lack of git checkout -B $current_branch. Now we have it, so we can afford to make reset --keep less prominently advertised in our tool set. As I already said back then, reset --soft also has outlived its usefulness when commit --amend came, so that leaves only these modes of reset: reset --hard [$commit] reset [$commit] reset --merge I am not sure if it makes sense to give a commit different from HEAD to reset --merge, and to a lessor degree, reset --mixed to flip the HEAD to another commit while retaining the working tree contents does not make much sense, either, in a common workflow. It _might_ be possible to merge the --mixed and --merge if we think things through to reduce the often-used options even further, but I haven't done so, and I suspect nobody has (yet). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: that after the command there are no new objects. That is an uninteresting implementation detail. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: Take then a git commit --allow-empty: if the trees are equal it may create a commit or not depending on the parent, message, author and date; if the trees are different it creates a commit. The commit is _always_ created, with a name depending on the parent, message, author and date. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Hi Andreas, as a user, and owner of a repository I do care about the objects that are in it. I do not care about the way they are names, be it numbers or sha's, but for sure about their existence. So, for me it is important if a command creates a new commit or not. The commit is _always_ created, with a name depending on the parent, message, author and date. I do not understand this: I have produced several examples that show that it is not created, i.e. that the very same objects are present in the repository after the command execution as they were before it. It is possible, though, that you use the word create with a different meaning. Most dictionaries state: to cause to come into existence, i.e. before creation the thing does not exist, and after creation it does. -Angelo -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
Ævar Arnfjörð Bjarmason ava...@gmail.com writes: I'm creating a system where a lot of remotes constantly fetch from a central repository for deployment purposes, but I've noticed that even with a remote.$name.fetch configuration to only get certain refs a git fetch will still call git-upload pack which will provide a list of all references. It has been observed that the sender has to advertise megabytes of refs because it has to speak first before knowing what the receiver wants, even when the receiver is interested in getting updates from only one of them, or worse yet, when the receiver is only trying to peek the ref it is interested has been updated. I do not think upload-pack that runs on the sender side with millions refs, when asked for a single want, feeds all the refs that it has to the revision machinery, and if you observed it does, I cannot explain why it happens. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: t1450-fsck (sometimes/often) failes on Mac OS X
On 03.10.12 00:21, Junio C Hamano wrote: I think this should suffice. [snip] - test_must_fail git fsck --tags 2out - cat out - grep error in tag.*broken links out + test_must_fail git fsck --tags [snip] Thanks, and all TC passed in pu. /Torsten -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 11:55 AM, Jeff King p...@peff.net wrote: On Wed, Oct 03, 2012 at 11:53:35AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? I don't think so. It would be hard to do in a backwards-compatible way, because the advertisement is the first thing the server says, before it has negotiated any capabilities with the client at all. That is being discussed but hasn't surfaced on the list. Out of curiosity, how are you thinking about triggering such a new behavior in a backwards-compatible way? Invoke git-upload-pack2, and fall back to reconnecting to start git-upload-pack if it fails? Basically, yes. New clients connect for git-upload-pack2. Over git:// the remote peer will just close the TCP socket with no messages. The client can fallback to git-upload-pack and try again. Over SSH a similar thing will happen in the sense there is no data output from the remote side, so the client can try again. This has the downside of authentication twice over SSH, which may prompt for a password twice. But the user can get out of this by setting remote.NAME.uploadpack = git-upload-pack and thus force the Git client to use the current protocol if they have a new client and must continue to work over SSH with an old server, and don't use an ssh-agent. Over HTTP we can request ?service=git-upload-pack2 and retry just like git:// would, or be a bit smarter and say ?service=git-upload-packv=2, and determine the protocol support of the remote peer based on the response we get. If we see an immediate advertisement its still the v1 protocol, if we get back the yes I speak v2 response like git:// would see, we can continue the conversation from there. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
On Wed, Oct 3, 2012 at 10:34 AM, Angelo Borsotti angelo.borso...@gmail.com wrote: HI PJ, take a git commit without --allow-empty: if the trees are equal, it creates no commit, and if the trees are different it creates one. Take then a git commit --allow-empty: if the trees are equal it may create a commit or not depending on the parent, message, author and date; if the trees are different it creates a commit. So, the statement does not apply to commits in general. But that same thing applies to git commit without --allow-empty. If you create the same object twice then only one copy is stored, regardless of how you create it. In fact, the commits you were creating in your example were orphans, so --allow-empty couldn't have had an effect on them in any case. -PJ Gehm's Corollary to Clark's Law: Any technology distinguishable from magic is insufficiently advanced. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 03, 2012 at 12:41:38PM -0700, Shawn O. Pearce wrote: Out of curiosity, how are you thinking about triggering such a new behavior in a backwards-compatible way? Invoke git-upload-pack2, and fall back to reconnecting to start git-upload-pack if it fails? Basically, yes. New clients connect for git-upload-pack2. Over git:// the remote peer will just close the TCP socket with no messages. The client can fallback to git-upload-pack and try again. Over SSH a similar thing will happen in the sense there is no data output from the remote side, so the client can try again. This has the downside of authentication twice over SSH, which may prompt for a password twice. But the user can get out of this by setting remote.NAME.uploadpack = git-upload-pack and thus force the Git client to use the current protocol if they have a new client and must continue to work over SSH with an old server, and don't use an ssh-agent. It's a shame that we have to reestablish the TCP or ssh connection to do the retry. The password thing is annoying, but also it just wastes a round-trip. It means we'd probably want to default the v2 probe to off (and let the user turn it on for a specific remote) until v2 is much more common than v1. Otherwise everyone pays the price. It may also be worth designing v2 to handle more graceful capability negotiation so this doesn't come up again. Another alternative would be to tweak git-daemon to allow more graceful fallback. That wouldn't help us now, but it would if we ever wanted a v3. For stock ssh, you could send: sh -c 'git upload-pack2; test $? = 127 git-upload-pack' which would work if you have an unrestricted shell on the other side. But it would break for a restricted shell or other fake ssh environment. It's probably too ugly to have restricted shells recognize that as a magic token (well, I could maybe even live with the ugliness, but it is not strictly backwards compatible). I was hoping we could do something like git upload-pack --v2, but I'm pretty sure current git-daemon would reject that. Over HTTP we can request ?service=git-upload-pack2 and retry just like git:// would, or be a bit smarter and say ?service=git-upload-packv=2, and determine the protocol support of the remote peer based on the response we get. If we see an immediate advertisement its still the v1 protocol, if we get back the yes I speak v2 response like git:// would see, we can continue the conversation from there. Yeah, I would think v=2 would be better simply to avoid the round-trip if we fail. It should be safe to turn the new protocol on by default for http, then. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: On Wed, Oct 03, 2012 at 02:36:00PM +0200, Ævar Arnfjörð Bjarmason wrote: I'm creating a system where a lot of remotes constantly fetch from a central repository for deployment purposes, but I've noticed that even with a remote.$name.fetch configuration to only get certain refs a git fetch will still call git-upload pack which will provide a list of all references. This is being done against a repository with tens of thousands of refs (it has a tag for each deployment), so it ends up burning a lot of CPU time on the uploader/receiver side. Where is the CPU being burned? Are your refs packed (that's a huge savings)? What are the refs like? Are they .have refs from an alternates repository, or real refs? Are they pointing to commits or tag objects? What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. I can't provide all the details now (not with access to that machine now), but briefly: * The git client/server version is 1.7.8 * The repository has around 50k refs, they're real refs, almost all of them (say all but 0.5k-1k) are annotated tags, the rest are branches. * 99% of them are packed, there's a weekly cronjob that packs them all up, there were a few newly pushed branches and tags outside of the * I tried echo -n | git upload-pack repo on both that 50k repository and a repository with 100 refs, the former took around ~1-2s to run on a 24 core box and the latter ~500ms. * When I ran git-upload-pack with GNU parallel I managed around 20/s packs on the 24 core box on the 50k ref one, 40/s on the 100 ref one. * A co-worker who was working on this today tried it on 1.7.12 and claimed that it had the same performance characteristics. * I tried to profile it under gcc -pg echo -n | ./git-upload-pack repo but it doesn't produce a profile like that, presumably because the process exits unsuccessfully. Maybe someone here knows offhand what mock data I could feed git-upload-pack to make it happy to just list the refs, or better yet do a bit more work which it would do if it were actually doing the fetch (I suppose I could just do a fetch, but I wanted to do this from a locally compiled checkout). Has there been any work on extending the protocol so that the client tells the server what refs it's interested in? I don't think so. It would be hard to do in a backwards-compatible way, because the advertisement is the first thing the server says, before it has negotiated any capabilities with the client at all. I suppose at least for the ssh protocol we could just do: ssh server (git upload-pack repo --refs=* || git upload-pack repo) And something similar with HTTP headers, but that of course leaves the git:// protocol. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
grep.patternType (was: Re: [ANNOUNCE] Git v1.8.0-rc0)
Junio C Hamano gits...@pobox.com writes: * git grep learned to use a non-standard pattern type by default if a configuration variable tells it to. This addition makes git grep -e (integer|buffer) work as expected, when grep.patternType is set to extended. Should this git log --grep=(integer|buffer) also honor the same configuration variable? If not, why not? One more thing. Currently you can say git log -E --grep=(integer|buffer) to ask for the ERE. Should we also support -P to ask for pcre? If not, why not? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Angelo Borsotti angelo.borso...@gmail.com writes: as a user, and owner of a repository I do care about the objects that are in it. There is no need to care. I do not understand this: I have produced several examples that show that it is not created, i.e. that the very same objects are present in the repository after the command execution as they were before it. That is just an implementation detail. All you need to know is that a ref has been created or modified. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 And now for something completely different. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
Johannes Sixt j.s...@viscovery.net writes: Why don't you use a different commit message to ensure that there is a difference between the commits? That sounds like a workaround, and unnecessary one at that, as it is entirely unclear why there _needs_ to be a different commit. Perhaps OP fears that the orphan branch foo in his example, because it happens to point at the same commit object as the master, will not stay the same and follow along the advancement of master if some new commits are added to it, and that is the reason he wants a different commit? Of course, starting from master and foo pointing at the same commit (or different commit, for that matter), foo won't change if you commit on master, so that fear is unnecessary. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 03, 2012 at 10:16:56PM +0200, Ævar Arnfjörð Bjarmason wrote: I can't provide all the details now (not with access to that machine now), but briefly: * The git client/server version is 1.7.8 * The repository has around 50k refs, they're real refs, almost all of them (say all but 0.5k-1k) are annotated tags, the rest are branches. I'd definitely try upgrading, then; I got measurable speedups from this exact case using the patches in v1.7.10. * 99% of them are packed, there's a weekly cronjob that packs them all up, there were a few newly pushed branches and tags outside of the A few strays shouldn't make a big difference. The killer is calling open(2) 50,000 times, but having most of it packed should prevent that. I suspect Michael Haggerty's work on the ref cache may help, too (otherwise we have to try each packed ref in the filesystem to make sure nobody has written it since we packed). * I tried echo -n | git upload-pack repo on both that 50k repository and a repository with 100 refs, the former took around ~1-2s to run on a 24 core box and the latter ~500ms. More cores won't help, of course, as dumping the refs is single-threaded. With v1.7.12, my ~400K test repository takes about 0.8s to run (on my 2-year-old 1.8 GHz i7, though it is probably turbo-boosting to 3 GHz). So I'm surprised it is so slow. Your 100-ref case is slow, too. Upload-pack's initial advertisement on my linux-2.6 repository (without about 900 refs) is more like 20ms. I'd * A co-worker who was working on this today tried it on 1.7.12 and claimed that it had the same performance characteristics. That's surprising to me. Can you try to verify those numbers? * I tried to profile it under gcc -pg echo -n | ./git-upload-pack repo but it doesn't produce a profile like that, presumably because the process exits unsuccessfully. If it's a recent version of Linux, you'll get much nicer results with perf. Here's what my 400K-ref case looks like: $ time echo | perf record git-upload-pack . /dev/null real0m0.808s user0m0.660s sys 0m0.136s $ perf report | grep -v ^# | head 11.40% git-upload-pack libc-2.13.so[.] vfprintf 9.70% git-upload-pack git-upload-pack [.] find_pack_entry_one 7.64% git-upload-pack git-upload-pack [.] check_refname_format 6.81% git-upload-pack libc-2.13.so[.] __memcmp_sse4_1 5.79% git-upload-pack libc-2.13.so[.] getenv 4.20% git-upload-pack libc-2.13.so[.] __strlen_sse42 3.72% git-upload-pack git-upload-pack [.] ref_entry_cmp_sslice 3.15% git-upload-pack git-upload-pack [.] read_packed_refs 2.65% git-upload-pack git-upload-pack [.] sha1_to_hex 2.44% git-upload-pack libc-2.13.so[.] _IO_default_xsputn So nothing too surprising, though there is some room for improvement (e.g., it looks like we are calling getenv in a tight loop, which could be hoisted out to a single call). Do note that this version of git was compiled with -O3. Compiling with -O0 produces very different results (it's more like 1.3s, and the hotspots are check_refname_component and sha1_to_hex). Maybe someone here knows offhand what mock data I could feed git-upload-pack to make it happy to just list the refs, or better yet do a bit more work which it would do if it were actually doing the fetch (I suppose I could just do a fetch, but I wanted to do this from a locally compiled checkout). If you feed as I did above, that is the flush signal for I have no more lines to send you, which means that we are not actually fetching anything. I.e., this is the exact same conversation a no-op git fetch would produce. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: push.default documented in man git-push?
On Wed, Oct 03, 2012 at 11:26:55AM -0700, Junio C Hamano wrote: Please do not label the list as These variables affect this command to give a false impression that it is the complete list if it isn't. Unless somebody promises to keep an up-to-date complete list there (or even better, come up with a mechanism to help us keep that promise automatically, perhaps by annotating pieces with structured comments in config.txt and automatically appending such a section to manual pages of relevant commands), that is. With a weaker phrase, e.g. These configuration variables may be of interest, such a list may not hurt readers, but personally I do not think it adds much value to have a list of variables without even a single line description of what each is used for. We talked a while ago about actually moving the config options into the individual manpages, and generating config.txt to simply contain an index of keys and where their definitions may be found. That also has the list without description characteristic. But presumably you would be looking for keys in the manual of the command you want to affect, and the master list would mostly be for redirecting you to the right manpage. It does break down a little when you have keys that could go in multiple pages. In many cases, this can be solved by a canonical location that describes the shared concepts. For example, `diff.*` should probably go into a `gitdiff(7)` that talks about the various diff options and formats. Of course, that only works if you think pulling out the shared diff bits from git-diff*, git-log, etc into a separate manpage is a good idea. I do, because I think it makes it more clear to the reader how the concepts connect (as opposed to simply including shared bits inline in the manpages, as we do now, with no indication that the same content is going to apply in many places). But it does have a downside that individual manpages are not as easily searchable via the pager, as you may have to follow a cross-reference to find what you want. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Rebase doesn't restore branch pointer back on out of memory
On 10/03/2012 03:47 PM, Alexander Kostikov wrote: Expected behaviour: - restore branch to pre-rebase location on out of memory exception - not to fall with out of memory in the first place. But for our repository that could be fixed only after either: --- a) msysgit would have x64 binary (currently it's not available) --- b) rebase -m option could be used by default somehow (currently it's not possible so specify default -m) There are already some logic in rebase that will handles failures. And in the case of failures, the behavior is that rebase will just stop and not modify the branch. That allows you can go back to the pre-rebase state by rebase --abort. In your case, it's possible that rebase is failing at unexpected places, and the error wasn't caught. I tried a few simple cases by forcing some commands to fail during a rebase, but I couldn't reproduce the behavior that you're having. It might help if we can figure out which part of rebase or git is failing (or running out of memory). And since you're using msysgit, I guess another possible source of the problem is be that msysgit is not catching the error properly, or not relying the error back to git properly. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Unstaging during a merge conflict
On Mon, Oct 01, 2012 at 08:13:21PM -0500, Matt McClellan wrote: We had an issue at our organization where changes were reverted when a user was merging his local repo with the remote repo changes. The merge conflicted and he unstaged all the changes that were not a conflict, he then resolved the conflict and added just the conflicted file and committed. The end result was that he reverted every change from his last pull of the remote to his merge point. The problem I'm having how hard it is to see this problem as both git show and git log on the merge commit do not show any reverted files. It was found by diffing his commit to each of the parents and seeing the opposite of what we expected in the patch output. Anybody have ideas how we can prevent these mistakes? While we are going to do more training, a hard stop that wouldn't even let these make it to remote would be preferred. The problem is that from the remote's perspective, it is too late. These unstaged paths look exactly as if they were simply resolved in favor of the ours side of the merge. The remote does not even see that they had conflicts, but only that the path from one side was taken over the other. If it wanted to be careful, it could recreate the merge and notice that they did not, but even that will lead to some errors. For example, resolving another conflict may result in a situation where a change in another file becomes unwanted and is dropped. So I think any kind of receive hook to prevent these mistakes from being propagated is going to run afoul of legitimate cases. You'd do much better to work on the UI of the resolution workflow to prevent the mistake from happening in the first place. I've done this using git add --interactive then reverting a files changes, though the actual crime was done using egit staging tool. It seems the command line won't let you unstage changes but gui tools and interactive tools seem to allow it. You can do git checkout HEAD path from the command-line. But I would hope that the results of doing so are sufficiently obvious. Doing something like git reset is much more subtle, but it is reasonably well-known as a dangerous command, and I hope we are not encouraging it to new people. Doing a full revert of a path during merge resolution is probably fishy. It might make sense for git add -i to warn about it (I haven't used egit's staging tool, but presumably the same thing would apply). Another place to catch your issue (assuming that an unstaging tool was used, leaving the modified contents in the working tree) would be to notice unstaged changes in the working tree when committing a merge. The problem with forbidding it is that it is also a legitimate thing to do (e.g., because you carry some local modification to a file that you do not want to commit), so it is not necessarily indicative of an error. But in theory you would see a giant list of unstaged changes in the commit message template. I wonder if that is less obvious when committing via egit. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] git-send-email: use locale encoding for compose
The introduction email (--compose option) use UTF-8 as default encoding. The current locale encoding is much better default value. Signed-off-by: Krzysztof Mazur krzys...@podlesie.net --- git-send-email.perl | 12 +++- 1 file changed, 11 insertions(+), 1 deletion(-) diff --git a/git-send-email.perl b/git-send-email.perl index 107e814..139bb35 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -590,6 +590,16 @@ sub get_patch_subject { die No subject line in $fn ?; } +sub locale_encoding { + my $encoding = UTF-8; + eval { + require I18N::Langinfo; + I18N::Langinfo-import(qw(langinfo CODESET)); + $encoding = langinfo(CODESET()); + }; + return $encoding; +} + if ($compose) { # Note that this does not need to be secure, but we will make a small # effort to have it be unique @@ -643,7 +653,7 @@ EOT } elsif (/^\n$/) { $in_body = 1; if (!defined $compose_encoding) { - $compose_encoding = UTF-8; + $compose_encoding = locale_encoding(); } if ($need_8bit_cte) { print $c2 MIME-Version: 1.0\n, -- 1.7.12.2.2.g1c3c581 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] git-send-email: introduce compose-encoding
The introduction email (--compose option) have encoding hardcoded to UTF-8, but invoked editor may not use UTF-8 encoding. The encoding used by patches can be changed by the 8bit-encoding option, but this option does not have effect on introduction email and equivalent for introduction email is missing. Added compose-encoding command line option and sendemail.composeencoding configuration option specify encoding of introduction email. Signed-off-by: Krzysztof Mazur krzys...@podlesie.net --- Documentation/git-send-email.txt | 5 + git-send-email.perl | 9 - 2 files changed, 13 insertions(+), 1 deletion(-) diff --git a/Documentation/git-send-email.txt b/Documentation/git-send-email.txt index 3241170..9f09e92 100644 --- a/Documentation/git-send-email.txt +++ b/Documentation/git-send-email.txt @@ -126,6 +126,11 @@ The --to option must be repeated for each user you want on the to list. + Note that no attempts whatsoever are made to validate the encoding. +--compose-encoding=encoding:: + Specify encoding of compose message. Default is the value of the + 'sendemail.composeencoding'; if that is unspecified, UTF-8 is assumed. ++ + Sending ~~~ diff --git a/git-send-email.perl b/git-send-email.perl index aea66a0..107e814 100755 --- a/git-send-email.perl +++ b/git-send-email.perl @@ -56,6 +56,7 @@ git send-email [options] file | directory | rev-list options --in-reply-to str * Email In-Reply-To: --annotate * Review each patch that will be sent in an editor. --compose * Open an editor for introduction. +--compose-encoding str * Encoding to assume for introduction. --8bit-encoding str * Encoding to assume 8bit mails if undeclared Sending: @@ -198,6 +199,7 @@ my ($identity, $aliasfiletype, @alias_files, $smtp_domain); my ($validate, $confirm); my (@suppress_cc); my ($auto_8bit_encoding); +my ($compose_encoding); my ($debug_net_smtp) = 0; # Net::SMTP, see send_message() @@ -231,6 +233,7 @@ my %config_settings = ( confirm = \$confirm, from = \$sender, assume8bitencoding = \$auto_8bit_encoding, +composeencoding = \$compose_encoding, ); my %config_path_settings = ( @@ -315,6 +318,7 @@ my $rc = GetOptions(h = \$help, validate! = \$validate, format-patch! = \$format_patch, 8bit-encoding=s = \$auto_8bit_encoding, + compose-encoding=s = \$compose_encoding, force = \$force, ); @@ -638,10 +642,13 @@ EOT $summary_empty = 0 unless (/^\n$/); } elsif (/^\n$/) { $in_body = 1; + if (!defined $compose_encoding) { + $compose_encoding = UTF-8; + } if ($need_8bit_cte) { print $c2 MIME-Version: 1.0\n, Content-Type: text/plain; , - charset=UTF-8\n, + charset=$compose_encoding\n, Content-Transfer-Encoding: 8bit\n; } } elsif (/^MIME-Version:/i) { -- 1.7.12.2.2.g1c3c581 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: grep.patternType
Junio C Hamano gits...@pobox.com writes: Junio C Hamano gits...@pobox.com writes: * git grep learned to use a non-standard pattern type by default if a configuration variable tells it to. This addition makes git grep -e (integer|buffer) work as expected, when grep.patternType is set to extended. Should this git log --grep=(integer|buffer) also honor the same configuration variable? If not, why not? One more thing. Currently you can say git log -E --grep=(integer|buffer) to ask for the ERE. Should we also support -P to ask for pcre? If not, why not? Answering to myself who has been in tying-loose-ends mode. My answers to these questions are both yes, and I have a neatly lined up series that begins with a small bugfix and then enhancement, but I do not think these do not deserve to in the upcoming release. The topic came too late, and even the fix is for a bug that has been with us for a long time. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 11:20 PM, Jeff King p...@peff.net wrote: Thanks for all that info, it's really useful. * A co-worker who was working on this today tried it on 1.7.12 and claimed that it had the same performance characteristics. That's surprising to me. Can you try to verify those numbers? I think he was wrong, I tested this on git.git by first creating a lot of tags: parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list HEAD) Then doing: git pack-refs --all git repack -A -d And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack on 1.7.8 and 2.59/s on the master branch. * I tried to profile it under gcc -pg echo -n | ./git-upload-pack repo but it doesn't produce a profile like that, presumably because the process exits unsuccessfully. If it's a recent version of Linux, you'll get much nicer results with perf. Here's what my 400K-ref case looks like: $ time echo | perf record git-upload-pack . /dev/null real0m0.808s user0m0.660s sys 0m0.136s $ perf report | grep -v ^# | head 11.40% git-upload-pack libc-2.13.so[.] vfprintf 9.70% git-upload-pack git-upload-pack [.] find_pack_entry_one 7.64% git-upload-pack git-upload-pack [.] check_refname_format 6.81% git-upload-pack libc-2.13.so[.] __memcmp_sse4_1 5.79% git-upload-pack libc-2.13.so[.] getenv 4.20% git-upload-pack libc-2.13.so[.] __strlen_sse42 3.72% git-upload-pack git-upload-pack [.] ref_entry_cmp_sslice 3.15% git-upload-pack git-upload-pack [.] read_packed_refs 2.65% git-upload-pack git-upload-pack [.] sha1_to_hex 2.44% git-upload-pack libc-2.13.so[.] _IO_default_xsputn FWIW here are my results on the above pathological git.git $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ] 29.08% git-upload-pack libz.so.1.2.7 [.] inflate 17.99% git-upload-pack libz.so.1.2.7 [.] 0xaec1 6.21% git-upload-pack libc-2.13.so[.] 0x117503 5.69% git-upload-pack libcrypto.so.1.0.0 [.] 0x82c3d 4.87% git-upload-pack git-upload-pack [.] find_pack_entry_one 3.18% git-upload-pack ld-2.13.so [.] 0x886e 2.96% git-upload-pack libc-2.13.so[.] vfprintf 2.83% git-upload-pack git-upload-pack [.] search_for_subdir 1.56% git-upload-pack [kernel.kallsyms] [k] do_raw_spin_lock 1.36% git-upload-pack libc-2.13.so[.] vsnprintf I wonder why your report doesn't note any time in libz. This is on Debian testing, maybe your OS uses different strip settings so it doesn't show up? $ ldd -r ./git-upload-pack linux-vdso.so.1 = (0x7fff621ff000) libz.so.1 = /lib/x86_64-linux-gnu/libz.so.1 (0x7f768feee000) libcrypto.so.1.0.0 = /usr/lib/x86_64-linux-gnu/libcrypto.so.1.0.0 (0x7f768fb0a000) libpthread.so.0 = /lib/x86_64-linux-gnu/libpthread.so.0 (0x7f768f8ed000) libc.so.6 = /lib/x86_64-linux-gnu/libc.so.6 (0x7f768f566000) libdl.so.2 = /lib/x86_64-linux-gnu/libdl.so.2 (0x7f768f362000) /lib64/ld-linux-x86-64.so.2 (0x7f7690117000 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: erratic behavior commit --allow-empty
From: Angelo Borsotti angelo.borso...@gmail.com Sent: Wednesday, October 03, 2012 12:52 PM Hi You still didn't tell us where the problem was. I've split up the explanation of your problem you have seen, to see if I can understand where the 'missing' aspect is within the extended dicussions. I thought I did, but here it is: I have private and a public repositories. In the private ones the developers keep both the sources and the binaries. In the public ones they keep only the sources. They do not want the binaries there because binaries are very large and requite much time to be pushed. Besides that, they are not even needed because they must be rebuilt anyway. To push the sources only, they keep in the private repositories an orphan branch in which commits are done taking the relevant commits in the (say) master branch and removing the binaries from the index. Pushing directly the master branch would push also the binaries even if they were removed from its index (the history gets pushed): thence the need for an orphan branch. Scripts have been provided to do this easily and safely. Now, it could happen that a developer does not have (yet) binaries, but want to push all the same. The script has to take care for this special case, in which no binaries are removed, but a commit on the orphan branch is done all the same. And here is the problem since git commit does not produce a brand new, different unique commit all the times, making then the orphan branch point to the master one, i.e. becoming a non-orphan one. What isn't clear is how the master branch is created and maintained at this point. Does the script create it afresh each time, so that it is also, implicitly, an --orphan branch? I ended up with a branch master and a branch new-branch, both pointing to the same commit. The new branch _is_ created. In such a case (a new master being created every time the script runs), then you can suffer the situation you describe where you have a common sentinel commit being used for both branches, even though you thought they were orphaned from each other. - a very special case. However one has to ask how the rest of the script would work in such situations with such a truncated master branch. If the master branch has a true history, then you would get different commits being created on the two branches because the parents would be different. Or finally, you have a truly special test (initialisation) case when you are starting master (which will later grow) and comparing it to the very first test case of the --orphan branch and in that special case you could get a common commit. But that is a one off special case, and would not recur in practice. Can you say more about the script? Exactly, it is created, but it is not an orphan ... or more precisely, it is sometimes, depending on how fast you are to enter the second commit command. This time-dependent behaviour is what I am talking about. -Angelo -- -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at which it went from 1.5/s to 2.5/s upload-pack runs on the pathological git.git repository was none of those, but: ccdc6037fe - parse_object: try internal cache before reading object db -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Thu, Oct 04, 2012 at 12:15:47AM +0200, Ævar Arnfjörð Bjarmason wrote: I think he was wrong, I tested this on git.git by first creating a lot of tags: parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list HEAD) Then doing: git pack-refs --all git repack -A -d And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack on 1.7.8 and 2.59/s on the master branch. Thanks for the update, that's more like what I expected. FWIW here are my results on the above pathological git.git $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ] 29.08% git-upload-pack libz.so.1.2.7 [.] inflate 17.99% git-upload-pack libz.so.1.2.7 [.] 0xaec1 6.21% git-upload-pack libc-2.13.so[.] 0x117503 5.69% git-upload-pack libcrypto.so.1.0.0 [.] 0x82c3d 4.87% git-upload-pack git-upload-pack [.] find_pack_entry_one 3.18% git-upload-pack ld-2.13.so [.] 0x886e 2.96% git-upload-pack libc-2.13.so[.] vfprintf 2.83% git-upload-pack git-upload-pack [.] search_for_subdir 1.56% git-upload-pack [kernel.kallsyms] [k] do_raw_spin_lock 1.36% git-upload-pack libc-2.13.so[.] vsnprintf I wonder why your report doesn't note any time in libz. This is on Debian testing, maybe your OS uses different strip settings so it doesn't show up? Mine was on Debian unstable. The difference is probably that I have 400K refs, but only 12K unique ones (this is the master alternates repo containing every ref from every fork of rails/rails on GitHub). So I spend proportionally more time fiddling with refs and outputting than I do actually inflating tag objects. Hmm. It seems like we should not need to open the tags at all. The main reason is to produce the peeled advertisement just after it. But for a packed ref with a modern version of git that supports the peeled extension, we should already have that information. The hack-ish patch below tries to reuse that. The interface is terrible, and we should probably just pass the peel information via for_each_ref (peel_ref tries to do a similar thing, but it also has a bad interface; if we don't have the information already, it will redo the ref lookup. We could probably get away with a peel_sha1 which uses the same optimization trick as peel_ref). With this patch my 800ms upload-pack drops to 600ms. I suspect it will have an even greater impact for you, since you are spending much more of your time on object loading than I am. And note of course that while these micro-optimizations are neat, we're still going to end up shipping quite a lot of data over the wire. Moving to a protocol where we are advertising fewer refs would solve a lot more problems in the long run. --- diff --git a/refs.c b/refs.c index 551a0f9..68eca3a 100644 --- a/refs.c +++ b/refs.c @@ -510,6 +510,14 @@ static struct ref_entry *current_ref; static struct ref_entry *current_ref; +/* XXX horrible interface due to implied argument. not for real use */ +const unsigned char *peel_current_ref(void) +{ + if (!current_ref || !(current_ref-flag REF_KNOWS_PEELED)) + return NULL; + return current_ref-u.value.peeled; +} + static int do_one_ref(const char *base, each_ref_fn fn, int trim, int flags, void *cb_data, struct ref_entry *entry) { diff --git a/refs.h b/refs.h index 9d14558..88c5445 100644 --- a/refs.h +++ b/refs.h @@ -14,6 +14,8 @@ struct ref_lock { #define REF_ISPACKED 0x02 #define REF_ISBROKEN 0x04 +const unsigned char *peel_current_ref(void); + /* * Calls the specified function for each ref file until it returns * nonzero, and returns the value. Please note that it is not safe to diff --git a/upload-pack.c b/upload-pack.c index 8f4703b..cdf43b0 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -736,8 +736,9 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo include-tag multi_ack_detailed; struct object *o = lookup_unknown_object(sha1); const char *refname_nons = strip_namespace(refname); + const unsigned char *peeled = peel_current_ref(); - if (o-type == OBJ_NONE) { + if (!peeled o-type == OBJ_NONE) { o-type = sha1_object_info(sha1, NULL); if (o-type 0) die(git upload-pack: cannot find object %s:, sha1_to_hex(sha1)); @@ -756,11 +757,13 @@ static int send_ref(const char *refname, const unsigned char *sha1, int flag, vo o-flags |= OUR_REF; nr_our_refs++; } - if (o-type == OBJ_TAG) { + if (!peeled o-type == OBJ_TAG) { o
Re: upload-pack is slow with lots of refs
On Thu, Oct 04, 2012 at 12:32:35AM +0200, Ævar Arnfjörð Bjarmason wrote: On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at which it went from 1.5/s to 2.5/s upload-pack runs on the pathological git.git repository was none of those, but: ccdc6037fe - parse_object: try internal cache before reading object db Ah, yeah, I forgot about that one. That implies that you have a lot of refs pointing to the same objects (since the benefit of that commit is to avoid reading from disk when we have already seen it). Out of curiosity, what does your repo contain? I saw a lot of speedup with that commit because my repos are big object stores, where we have the same duplicated tag refs for every fork of the repo. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Thu, Oct 4, 2012 at 1:21 AM, Jeff King p...@peff.net wrote: On Thu, Oct 04, 2012 at 12:32:35AM +0200, Ævar Arnfjörð Bjarmason wrote: On Wed, Oct 3, 2012 at 8:03 PM, Jeff King p...@peff.net wrote: What version of git are you using? In the past year or so, I've made several tweaks to speed up large numbers of refs, including: - cff38a5 (receive-pack: eliminate duplicate .have refs, v1.7.6); note that this only helps if they are being pulled in by an alternates repo. And even then, it only helps if they are mostly duplicates; distinct ones are still O(n^2). - 7db8d53 (fetch-pack: avoid quadratic behavior in remove_duplicates) a0de288 (fetch-pack: avoid quadratic loop in filter_refs) Both in v1.7.11. I think there is still a potential quadratic loop in mark_complete() - 90108a2 (upload-pack: avoid parsing tag destinations) 926f1dd (upload-pack: avoid parsing objects during ref advertisement) Both in v1.7.10. Note that tag objects are more expensive to advertise than commits, because we have to load and peel them. Even with those patches, though, I found that it was something like ~2s to advertise 100,000 refs. FWIW I bisected between 1.7.9 and 1.7.10 and found that the point at which it went from 1.5/s to 2.5/s upload-pack runs on the pathological git.git repository was none of those, but: ccdc6037fe - parse_object: try internal cache before reading object db Ah, yeah, I forgot about that one. That implies that you have a lot of refs pointing to the same objects (since the benefit of that commit is to avoid reading from disk when we have already seen it). Out of curiosity, what does your repo contain? I saw a lot of speedup with that commit because my repos are big object stores, where we have the same duplicated tag refs for every fork of the repo. Things are much faster with your monkeypatch, got up to around 10 runs/s. The repository mainly contains a lot of git-deploy[1] generated tags which are added for every rollout to several subsystems. Of the ~50k references in the repo 75% point to a commit that no other reference points to. Around 98% of the references are annotated tags, the rest are branches. 1. https://github.com/git-deploy/git-deploy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: upload-pack is slow with lots of refs
On Thu, Oct 4, 2012 at 1:15 AM, Jeff King p...@peff.net wrote: On Thu, Oct 04, 2012 at 12:15:47AM +0200, Ævar Arnfjörð Bjarmason wrote: I think he was wrong, I tested this on git.git by first creating a lot of tags: parallel --eta git tag -a -m{} test-again-{} ::: $(git rev-list HEAD) Then doing: git pack-refs --all git repack -A -d And compiled with -g -O3 I get around 1.55 runs/s of git-upload-pack on 1.7.8 and 2.59/s on the master branch. Thanks for the update, that's more like what I expected. FWIW here are my results on the above pathological git.git $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.026 MB perf.data (~1131 samples) ] 29.08% git-upload-pack libz.so.1.2.7 [.] inflate 17.99% git-upload-pack libz.so.1.2.7 [.] 0xaec1 6.21% git-upload-pack libc-2.13.so[.] 0x117503 5.69% git-upload-pack libcrypto.so.1.0.0 [.] 0x82c3d 4.87% git-upload-pack git-upload-pack [.] find_pack_entry_one 3.18% git-upload-pack ld-2.13.so [.] 0x886e 2.96% git-upload-pack libc-2.13.so[.] vfprintf 2.83% git-upload-pack git-upload-pack [.] search_for_subdir 1.56% git-upload-pack [kernel.kallsyms] [k] do_raw_spin_lock 1.36% git-upload-pack libc-2.13.so[.] vsnprintf I wonder why your report doesn't note any time in libz. This is on Debian testing, maybe your OS uses different strip settings so it doesn't show up? Mine was on Debian unstable. The difference is probably that I have 400K refs, but only 12K unique ones (this is the master alternates repo containing every ref from every fork of rails/rails on GitHub). So I spend proportionally more time fiddling with refs and outputting than I do actually inflating tag objects. An updated profile with your patch: $ uname -r; perf --version; echo | perf record ./git-upload-pack ./dev/null; perf report | grep -v ^# | head 3.2.0-2-amd64 perf version 3.2.17 [ perf record: Woken up 1 times to write data ] [ perf record: Captured and wrote 0.015 MB perf.data (~662 samples) ] 14.45% git-upload-pack libc-2.13.so[.] 0x78140 12.13% git-upload-pack [kernel.kallsyms] [k] walk_component 11.01% git-upload-pack libc-2.13.so[.] _IO_getline_info 10.74% git-upload-pack git-upload-pack [.] find_pack_entry_one 8.96% git-upload-pack [kernel.kallsyms] [k] __mmdrop 8.64% git-upload-pack git-upload-pack [.] sha1_to_hex 6.73% git-upload-pack libc-2.13.so[.] vfprintf 4.07% git-upload-pack libc-2.13.so[.] strchrnul 4.00% git-upload-pack libc-2.13.so[.] getenv 3.37% git-upload-pack git-upload-pack [.] packet_write Hmm. It seems like we should not need to open the tags at all. The main reason is to produce the peeled advertisement just after it. But for a packed ref with a modern version of git that supports the peeled extension, we should already have that information. B.t.w. do you plan to submit this as a non-hack, I'd like to have it in git.git, so if you're not going to I could pick it up and clean it up a bit. But I think it would be better coming from you. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/6] Tying loose ends of extended grep
Over time we have added a few things to our git grep front-end, such as - grep.extendedregexp configuration (v1.7.5) - use of pcre (v1.7.6) - grep.patterntype configuration (v1.8.0) But all the time, we forgot that git log --grep would need to honor them. The first three patches should be uncontroversial. We move helpers out of builtin/grep.c to a more generic place, and fix a bug in the command line parser for git log -F -E --grep='ere' (this did not correctly enable regular expression). The fourth patch adds git log --perl-regexp --grep='pcre'. The last two teaches log --grep to honor the same grep.* configuration variables. color.grep and grep.linenumber should not matter, as the use of grep mechanism in log --grep is about boolean result do we have hits? and not about actually showing the hits in the output, but the users would expect that grep.extendedregexp and its more generalized version grep.patterntype are honored, which was not the case. Junio C Hamano (6): grep: move configuration support to top-level grep.[ch] grep: move pattern-type bits support to top-level grep.[ch] log --grep: use the same helper to set -E/-F options as git grep log --grep: accept --basic-regexp and --perl-regexp log: pass rev_info to git_log_config() log --grep: honor grep.patterntype etc. configuration variables builtin/grep.c | 105 ++--- builtin/log.c | 19 +-- grep.c | 99 + grep.h | 3 ++ revision.c | 8 +++-- t/t4202-log.sh | 6 6 files changed, 126 insertions(+), 114 deletions(-) -- 1.8.0.rc0.57.g712528f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/6] log --grep: accept --basic-regexp and --perl-regexp
When we added the --perl-regexp option (or -P) to git grep, we should have done the same for the commands in the git log family, but somehow we forgot to do so. This corrects it. Also introduce the --basic-regexp option for completeness, so that the last one wins principle can be used to defeat an earlier -E option, e.g. git log -E --basic-regexp --grep='bre'. Note that it cannot have the short -G option as the option is to grep in the patch text in the context of log family. Signed-off-by: Junio C Hamano gits...@pobox.com --- revision.c | 4 1 file changed, 4 insertions(+) diff --git a/revision.c b/revision.c index 7f5e53b..0f73512 100644 --- a/revision.c +++ b/revision.c @@ -1603,6 +1603,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg return argcount; } else if (!strcmp(arg, --grep-debug)) { revs-grep_filter.debug = 1; + } else if (!strcmp(arg, --basic-regexp)) { + grep_set_pattern_type_option(GREP_PATTERN_TYPE_BRE, revs-grep_filter); } else if (!strcmp(arg, --extended-regexp) || !strcmp(arg, -E)) { grep_set_pattern_type_option(GREP_PATTERN_TYPE_ERE, revs-grep_filter); } else if (!strcmp(arg, --regexp-ignore-case) || !strcmp(arg, -i)) { @@ -1610,6 +1612,8 @@ static int handle_revision_opt(struct rev_info *revs, int argc, const char **arg DIFF_OPT_SET(revs-diffopt, PICKAXE_IGNORE_CASE); } else if (!strcmp(arg, --fixed-strings) || !strcmp(arg, -F)) { grep_set_pattern_type_option(GREP_PATTERN_TYPE_FIXED, revs-grep_filter); + } else if (!strcmp(arg, --perl-regexp) || !strcmp(arg, -P)) { + grep_set_pattern_type_option(GREP_PATTERN_TYPE_PCRE, revs-grep_filter); } else if (!strcmp(arg, --all-match)) { revs-grep_filter.all_match = 1; } else if ((argcount = parse_long_opt(encoding, argv, optarg))) { -- 1.8.0.rc0.57.g712528f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 5/6] log: pass rev_info to git_log_config()
Call init_revisions() first to prepare the revision traversal parameters and pass it to git_log_config(), so that necessary bits in the traversal parameters can be tweaked before we call the command line parsing infrastructure setup_revisions() from the cmd_log_init_finish() function. Signed-off-by: Junio C Hamano gits...@pobox.com --- * This is made separate from the next one that touches the contents of rev to make sure the existing code does not depend on the current initialization order. I do not think it does but better be careful to keep the history easier to bisect, than be sorry when an issue does appear. builtin/log.c | 14 +- 1 file changed, 5 insertions(+), 9 deletions(-) diff --git a/builtin/log.c b/builtin/log.c index 09cf43e..07a0078 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -360,9 +360,8 @@ int cmd_whatchanged(int argc, const char **argv, const char *prefix) struct rev_info rev; struct setup_revision_opt opt; - git_config(git_log_config, NULL); - init_revisions(rev, prefix); + git_config(git_log_config, rev); rev.diff = 1; rev.simplify_history = 0; memset(opt, 0, sizeof(opt)); @@ -450,10 +449,9 @@ int cmd_show(int argc, const char **argv, const char *prefix) struct pathspec match_all; int i, count, ret = 0; - git_config(git_log_config, NULL); - init_pathspec(match_all, NULL); init_revisions(rev, prefix); + git_config(git_log_config, rev); rev.diff = 1; rev.always_show_header = 1; rev.no_walk = REVISION_WALK_NO_WALK_SORTED; @@ -530,9 +528,8 @@ int cmd_log_reflog(int argc, const char **argv, const char *prefix) struct rev_info rev; struct setup_revision_opt opt; - git_config(git_log_config, NULL); - init_revisions(rev, prefix); + git_config(git_log_config, rev); init_reflog_walk(rev.reflog_info); rev.verbose_header = 1; memset(opt, 0, sizeof(opt)); @@ -552,9 +549,8 @@ int cmd_log(int argc, const char **argv, const char *prefix) struct rev_info rev; struct setup_revision_opt opt; - git_config(git_log_config, NULL); - init_revisions(rev, prefix); + git_config(git_log_config, rev); rev.always_show_header = 1; memset(opt, 0, sizeof(opt)); opt.def = HEAD; @@ -1121,8 +1117,8 @@ int cmd_format_patch(int argc, const char **argv, const char *prefix) extra_hdr.strdup_strings = 1; extra_to.strdup_strings = 1; extra_cc.strdup_strings = 1; - git_config(git_format_config, NULL); init_revisions(rev, prefix); + git_config(git_format_config, rev); rev.commit_format = CMIT_FMT_EMAIL; rev.verbose_header = 1; rev.diff = 1; -- 1.8.0.rc0.57.g712528f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/6] log --grep: honor grep.patterntype etc. configuration variables
Read grep.extendedregexp, grep.patterntype, etc. from the configuration so that log --grep='pcre' honors the user preference without an explicit -P from the command line. Now that the callback parameter, which was so far unused, to git_log_config() has to be of type struct rev_info *, stop passing it down to git_diff_ui_config(). The latter does not currently take any callback parameter, and when it does, we would need to make a structure that has rev info and that parameter and pass it to git_log_config() anyway, and until that happens, passing NULL will be less error prone. Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/log.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/builtin/log.c b/builtin/log.c index 07a0078..a38a6dd 100644 --- a/builtin/log.c +++ b/builtin/log.c @@ -329,6 +329,8 @@ static int cmd_log_walk(struct rev_info *rev) static int git_log_config(const char *var, const char *value, void *cb) { + struct rev_info *revs = cb; + if (!strcmp(var, format.pretty)) return git_config_string(fmt_pretty, var, value); if (!strcmp(var, format.subjectprefix)) @@ -352,7 +354,8 @@ static int git_log_config(const char *var, const char *value, void *cb) if (!prefixcmp(var, color.decorate.)) return parse_decorate_color_config(var, 15, value); - return git_diff_ui_config(var, value, cb); + grep_config(var, value, revs-grep_filter); + return git_diff_ui_config(var, value, NULL); } int cmd_whatchanged(int argc, const char **argv, const char *prefix) -- 1.8.0.rc0.57.g712528f -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: What's cooking in git.git (Oct 2012, #01; Tue, 2)
On Thu, Oct 4, 2012 at 1:17 AM, Junio C Hamano gits...@pobox.com wrote: For the double-star at the beginning, you should just turn it into **/ if it is not followed by a slash internally, I think. What is the semantics of ** in the first place? Is it described to a reasonable level of detail in the documentation updates? For example does **foo match afoo, a/b/foo, a/bfoo, a/foo/b, a/bfoo/c? Does x**y match xy, xay, xa/by, x/a/y? It's basically what rsync describes: use ’**’ to match anything, including slashes. Reading rsync's man page again, I notice I missed two other rules related to **: - If the pattern contains a / (not counting a trailing /) or a **, then it is matched against the full pathname, including any leading directories. If the pattern doesn't contain a / or a **, then it is matched only against the final component of the filename. (Remember that the algorithm is applied recursively so full filename can actually be any portion of a path from the starting directory on down.) - A trailing dir_name/*** will match both the directory (as if dir_name/ had been specified) and everything in the directory (as if dir_name/** had been specified). This behavior was added in version 2.6.7. From what you wrote, I think we'll go with the first rule. The second rule looks irrelevant to what git's doing. I am guessing that the only sensible definition is that ** requires anything that comes before it (if exists) is at a proper hierarchy boundary, and anything matches it is also at a proper hierarchy boundary, so x**y matches x/a/y and x/y too? (As opposed to x/**/y which does not) and not xy, xay, nor xa/by in the above example. If x**y can match xy or xay (or **foo can match afoo), it would be unreasonable to say it implies the pattern is anchored at any level, no? Yeah. That makes things easier to reason, though not exactly what we're having. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug report
Hi all! I just ran into a problem that I'm pretty sure is a bug in git. Just read and run this (fairly trivial) shell script to replicate. Thanks! ---John Whitney git_failure.sh Description: Bourne shell script
Re: push.default documented in man git-push?
Nguyen Thai Ngoc Duy pclo...@gmail.com writes: On Thu, Oct 4, 2012 at 1:49 AM, Junio C Hamano gits...@pobox.com wrote: I would recommend against listing any advice.* in the command manual pages. They are meant to give an advice in cases that are often confusing to new people and are supposed to advise how to turn it off in the message. OK. I think I was surprised that some messages were controlled by advice.* but gave no hints about that and I found that out by other means. I'll check all the advice messages. As far as I can tell, $ git grep -e 'advice\.' Documentation shows the list in config.txt and nothing else, and they do talk about when they are issued, but the reasoning behind them may not be described to a sufficient degree (that is, unless a reader carefully thinks things through, s/he may not be able to figure out why). But I think what we have there is more or less OK. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html