'git describe' is very slow on development trees with lots of commits
(Cc:-ed the Git development list.) * David Ahern wrote: > PERF-VERSION-GEN and specifically the git commands are the > cause of more delay than the config checks, especially when > doing the build in a VM with the kernel source on an NFS > mount. Yes, I have noticed that too. So, the problem is that we use 'git describe' on the kernel tree to generate the version string, which is very, very slow if we are far away from any tagged release - which is the case for the -tip tree: comet:~/tip> perf stat --null --repeat 3 git describe v3.7-rc2-2007-g83e8223 v3.7-rc2-2007-g83e8223 v3.7-rc2-2007-g83e8223 'git describe' is much faster if we are on or near to a tag: $ git checkout v3.6 $ perf stat --null --repeat 3 git describe v3.6 v3.6 v3.6 Performance counter stats for 'git describe' (3 runs): 0.020171640 seconds time elapsed ( +- 3.64% ) $ git checkout b34e5f55a1e6 $ perf stat --null --repeat 3 git describe v3.6-41-gb34e5f5 v3.6-41-gb34e5f5 v3.6-41-gb34e5f5 Performance counter stats for 'git describe' (3 runs): 0.155603676 seconds time elapsed ( +- 0.23% ) The cost on this pretty fast machine is about 1 msecs per commit - which adds up to about 2.5 seconds during much of the development cycle. So maybe we should be using a different version string, for example, instead of: v3.7-rc2-2007-g83e8223 this would be perfectly fine: v3.7-rc2-g83e8223 the 'commit count' is informative but not essential - and in counting the number of off-tag commits is where much of the overhead is: # # Overhead Command Shared Object Symbol # ... .. .. # 39.79% git libz.so.1.2.5 [.] 0xc1fe 26.39% git libz.so.1.2.5 [.] inflate 22.42% git git [.] 0x0009bd1e 2.99% git libz.so.1.2.5 [.] adler32 1.23% git libc-2.15.so[.] _int_malloc 0.72% git libc-2.15.so[.] __GI_strtoull_l_internal 0.67% git libc-2.15.so[.] _int_free 0.62% git libc-2.15.so[.] malloc_consolidate 0.54% git [kernel.kallsyms] [k] clear_page_c 0.32% git [kernel.kallsyms] [k] page_fault So by switching to the shorter version string that still embedds the tag and the exact sha1 we'd be able to run this script a *lot* faster. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] tile: support GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE
* Linus Torvalds wrote: > On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner wrote: > >> > >> It is spelled: > >> > >> git notes add -m SHA1 > > > > Cool! > > Don't use them for anything global. > > Use them for local codeflow, but don't expect them to be > distributed. It's a separate "flow", and while it *can* be > distributed, it's not going to be for the kernel, for example. > So no, don't start using this to ack things, because the acks > *will* get lost. I'd also add a small meta argument: that it would be actively wrong to *allow* 'belated' acks to be added. In practice acks are most useful *before* a commit gets created and they often have a mostly buerocratic role afterwards. So we should encourage timely acks (which actually help development), and accept ack-less patches as long as they are correct and create no problems. More utility, less buerocracy. Incorrect, ack-less patches causing problems will get all the flames they deserve. Thanks, Ingo -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qgit-0.7
* Marco Costalba <[EMAIL PROTECTED]> wrote: > So I cannot reproduce the bug. [...] weird - i cannot reproduce it either anymore, and annotate works now as advertised - it's fast and accurate as far as i checked. But i synced to the latest tree meanwhile. Perhaps i had an inconsistent tree? (i'll keep an eye on this, i've uncommented that printout, so if it ever happens again i'll have the data.) now that everything is working fine, may i suggest improvements? :-) Firstly, now i'm listed as the author for most portions of sched.c, which is accurate for a fair portion of that, but is only done by qgit because i happened to be the author of the first commit. So it would be more accurate to denote version 1's author as empty (or with some other, nonintrusive string that shows that this file came here due to the initial commit)? We dont know the full history yet, because the current DB's history starts at 2.6.12-rc2, with a full sched.c file. (I think an empty author field would reflect version #1's authorship most accurately, and would be the visually least intrusive.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: qgit-0.7
* Marco Costalba <[EMAIL PROTECTED]> wrote: > Here is qgit-0.7, a GUI git viewer. > > you can download from: > > http://prdownloads.sourceforge.net/qgit/qgit-0.7.tar.gz?download > > > This time a small changelog, but a lot of work ;-) > > - rewrite of graph drawing > - start-up loading: switch to use git-rev-list --topo-order > - final fixes to annotation > - cache of file lists to speed-up loading of file names > - added color background on heads the good news: it's really fast now and very usable for browsing changes. Kudos! the bad news: except for annotations. I started qgit in the current kernel GIT repository, and clicked on the following commit: 5bbcfd9000887c0da7d57cc7b3ac869fc0dd5aa9 then i clicked on sched.c to see the annotated file. Firstly, it took roughly 2 minutes (!) for the annotated sched.c to show up. All the qgit windows were fully frozen during that time, no refreshes or anything. My kernel tree was fully cached in RAM, so it was pure CPU overhead (qgit was taking 99% of CPU time). It is clearly not usable in this form. then the annotations were plain wrong. Almost all lines are attributed to Tony Luck, while much of the file comes from the initial repository. So something's quite fishy here. Also, a number of lines were attributed to 'merge', which isnt very informative. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
* Olivier Andrieu <[EMAIL PROTECTED]> wrote: > > - naming the boxes by key is quite meaningless. It would be more > > informative to see the author's email shortcuts in the boxes. Also, it > > would be nice to see some simple graphical feedback about the size and > > scope of a changeset, without having to zoom on it. > > That's interesting. What do you mean exactly by scope ? usually there are two interesting things about a patchset: how many lines does it change, and how many files. Patches that change lots of files (but change only a couple of lines in every file) have a 'large' scope. Patches that change 1-2 files have a 'small' scope. A pure 'number of lines changed' metric is useful too. Generally patches that have either a large linecount or a large scope are more interesting. (I'm not sure how this could be displayed - perhaps the size of the rectangle could vary to a certain degree? Perhaps the shape too? Something non-numeric, so that one gets immediate visual feedback.) > > i guess you know it, and i'm definitely not complaining about prototype > > code, but rendering is quite slow: drawing the 340 changesets in the > > current kernel repository takes 15 seconds on a 2 GHz P4. Drawing the > > full kernel history (63,000 changesets) would take more than 45 minutes > > on this box. > > > > the current rate of kernel development is ~2000 changesets per month, so > > drawing the kernel history will get 3 seconds slower every day - it will > > exceed 1 minute in 20 days, so this will become a pressing issue quite > > soon i suspect. > > Right, it is slow. From what I could understand with a bit of > profiling, the problem is with the "text" canvas item for the boxes' > labels. I guess libgnomecanvas isn't using Pango properly or > something: it lookups the font with fontconfig each time I create such > an item. I'm not sure what I can do about this. when the redrawing happens in the visible area then one can see how really slow it is: 100-200 msec per rectangle (!). > It works with Petr Baudis' git-pasky (it calls `git diff'). I don't > know how to do that with the canonical git. ah, ok. I guess it will start working once Petr's changes are merged into Linus' tree. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
is the 'diff with ancestor' feature supposed to work at this early stage? (it just does nothing when i click on it. It correctly offers two ancestors for merge points, but does nothing there either.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
another thing, when i 'zoom out' of the graph far away (so that the whole graph becomes visible on the screen), i'm getting lots of such error messages: *** attempt to put segment in horiz list twice *** attempt to put segment in horiz list twice *** attempt to put segment in horiz list twice *** attempt to put segment in horiz list twice *** attempt to put segment in horiz list twice this doesnt seem to impact anything though. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
* Olivier Andrieu <[EMAIL PROTECTED]> wrote: > Yes, git-viz uses the `dot' program from the graphviz package (it's in > Fedora Extra too I believe). ah - that resolved all issues and i'm now running git-viz without any problems. I just checked how the kernel repository looks like with it, and i'm impressed! The GUI is top-notch, and the whole graph output and navigation is very mature visually. Kudos! a couple of suggestions that are in the 'taste' category: - isnt left-to-right layout the more natural thing instead of top-down (as it aligns with the reading direction)? It's selectable in the preferences, but you might want to make it default. OTOH, top-down creates a more compressed view of the graph. - there doesnt seem to be any performance difference between non-colored and colored rendering - so you might as well want to make 'color by author' (or color by branch) the default coloring, instead of uncolored? - naming the boxes by key is quite meaningless. It would be more informative to see the author's email shortcuts in the boxes. Also, it would be nice to see some simple graphical feedback about the size and scope of a changeset, without having to zoom on it. i guess you know it, and i'm definitely not complaining about prototype code, but rendering is quite slow: drawing the 340 changesets in the current kernel repository takes 15 seconds on a 2 GHz P4. Drawing the full kernel history (63,000 changesets) would take more than 45 minutes on this box. the current rate of kernel development is ~2000 changesets per month, so drawing the kernel history will get 3 seconds slower every day - it will exceed 1 minute in 20 days, so this will become a pressing issue quite soon i suspect. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
* Olivier Andrieu <[EMAIL PROTECTED]> wrote: > > Preprocessor error > > make: *** [viz_style.cmx] Error 2 > > That's probably because the configure script didn't find camlp4. > Camlp4 is a preprocessor for ocaml, it's needed for compiling this > file (viz_style.ml). Camlp4 is built with the ocaml compilers but some > package it separately. Try to find and install some ocaml-camlp4 (or > camlp4) package and then re-run configure. ah, ok. I installed camlp4-3.08.3-1, and this also solved the other build problem as well that looked like to be a PATH problem. when i run git-viz in a git-controlled directory, it seems to start up fine, but i get an error message: "Could not execute dot". Closing that window gives me the ability to do an 'Open', but git-viz does not seem to recognize any of my GIT repositories as such. (perhaps there's some GIT version dependency? I've got Linus' latest & greatest installed.) > The configure script should signal an error when it doesn't find > camlp4, I'll change that. fyi, it also didnt check for ocaml and lablgtk. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
* Olivier Andrieu <[EMAIL PROTECTED]> wrote: > There, here's a tarball : > http://oandrieu.nerim.net/monotone-viz/git-viz-0.1.tar.gz i'm trying to build it under Fedora Core 4 (devel), and there are two problems: - the build scripts seem to assume that "." is in PATH (or that the needed viz_style.ml/etc. scripts are in the PATH). adding "." to the PATH worked around this issue, a number of files built fine, but then it hit the next problem: ocamlopt.opt -I /usr/lib/ocaml/lablgtk2 -I glib -I crypto -pp -c viz_style.ml sh: - : invalid option Usage: sh [GNU long option] [option] ... sh [GNU long option] [option] script-file ... GNU long options: [...] Preprocessor error make: *** [viz_style.cmx] Error 2 i've straced the build, and the bug comes from ocamlopt.opt executing: sh -c "-c 'viz_style.ml'" which confuses the shell. i've got ocaml-3.08.3-1, lablgtk-2.4.0-2 installed. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] git-pasky-0.6.2 && heads-up on upcoming changes
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > Yeah, yeah, it looks different from "cvs update", but dammit, wouldn't > it be cool to just write "cg-" and see the command choices? > Or "cg-up" and get cg-update done for you.. add this line to your ~/.bashrc: complete -W "add addremote apply cancel ci commit diff export fork help init log ls lsobj lsremote merge pull rm seek status tag track version" git and you'll get all the argument completions, after "git " (it works on arguments, after the space). (you can even customize it to list only the ones you typically use, to make the completion faster) This first showed up in zsh but now bash knows it too. (see 'Programmable Completions' in man bash) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Change "pull" to _only_ download, and "git update"=pull+merge?
* Petr Baudis <[EMAIL PROTECTED]> wrote: > Dear diary, on Wed, Apr 20, 2005 at 09:01:57AM CEST, I got a letter > where Ingo Molnar <[EMAIL PROTECTED]> told me that... > > [...] > > fatal: unable to execute 'gitmerge-file.sh' > > fatal: merge program failed > > Pure stupidity of mine, I forgot to add gitmerge-file.sh to the list of > scripts which get installed. another thing is this annoying message: rsync: link_stat "/linux/kernel/people/torvalds/git.git/tags" (in pub) failed: No such file or directory (2) rsync error: some files could not be transferred (code 23) at main.c(812) client: nothing to do: perhaps you need to specify some filenames or the --recursive option? you said before that it's "harmless", but it's annoying nevertheless as one doesnt know for sure whether the pull went fine. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Change "pull" to _only_ download, and "git update"=pull+merge?
* Petr Baudis <[EMAIL PROTECTED]> wrote: > > yet another thing: what is the canonical 'pasky way' of simply nuking > > the current files and checking out the latest tree (according to > > .git/HEAD). Right now i'm using a script to: > > > > read-tree $(tree-id $(cat .git/HEAD)) > > checkout-cache -a > > > > (i first do an 'rm -f *' in the working directory) > > > > i guess there's an existing command for this already? > > git cancel hm, that's a pretty unintuitive name though. How about making it 'git checkout' and providing a 'git checkout -f' option to force the checkout? (or something like this) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-viz tool for visualising commit trees
* Petr Baudis <[EMAIL PROTECTED]> wrote: > Hi, > > just FYI, Olivier Andrieu was kind enough to port his monotone-viz > tool to git (http://oandrieu.nerim.net/monotone-viz/ - use the one > from the monotone repository). The tool visualizes the history flow > nicely; see > > http://rover.dkm.cz/~pasky/gitviz1.png > http://rover.dkm.cz/~pasky/gitviz2.png > http://rover.dkm.cz/~pasky/gitviz3.png > http://rover.dkm.cz/~pasky/gitviz4.png > http://rover.dkm.cz/~pasky/gitviz5.png > http://rover.dkm.cz/~pasky/gitviz6.png > http://rover.dkm.cz/~pasky/gitviz7.png > > for some screenshots. really nice stuff! Any plans to include it in git-pasky, via 'git gui' option or so? Also, which particular version has this included - the freshest tarball on the monotone-viz download site doesnt seem to include it. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > So to convert your old git setup to a new git setup, do the following: > [...] did this for two repositories (git and kernel-git), it works as advertised. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: enforcing DB immutability
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > perhaps having a new 'immutable hardlink' feature in the Linux VFS > > would help? I.e. a hardlink that can only be readonly followed, and > > can be removed, but cannot be chmod-ed to a writeable hardlink. That i > > think would be a large enough barrier for editors/build-tools not to > > play the tricks they already do that makes 'readonly' files virtually > > meaningless. > > immutable hardlinks have the following advantage: a hardlink by design > hides the information where the link comes from. So even if an editor > wanted to play stupid games and override the immutability - it doesnt > know where the DB object is. (sure, it could find it if it wants to, > but that needs real messing around - editors wont do _that_) so the only sensible thing the editor/tool can do when it wants to change the file is precisely what we want: it will copy the hardlinked files's contents to a new file, and will replace the old file with the new file - a copy on write. No accidental corruption of the DB's contents. (another in-kernel VFS solution would be to enforce that the files's name always matches the sha1 hash. So if someone edits a DB object it will automatically change its name. But this is complex, probably cannot be done atomically, and brings up other problems as well.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: enforcing DB immutability
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > perhaps having a new 'immutable hardlink' feature in the Linux VFS > would help? I.e. a hardlink that can only be readonly followed, and > can be removed, but cannot be chmod-ed to a writeable hardlink. That i > think would be a large enough barrier for editors/build-tools not to > play the tricks they already do that makes 'readonly' files virtually > meaningless. immutable hardlinks have the following advantage: a hardlink by design hides the information where the link comes from. So even if an editor wanted to play stupid games and override the immutability - it doesnt know where the DB object is. (sure, it could find it if it wants to, but that needs real messing around - editors wont do _that_) i think this might work. (the current chattr +i flag isnt quite what we need though because it works on the inode, and it's also a root-only feature so it puts us back to square one. What would be needed is an immutability flag on hardlinks, settable by unprivileged users.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
enforcing DB immutability
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Wed, 13 Apr 2005, Ingo Molnar wrote: > > > > well, the 'owned by another user' solution is valid though, and doesnt > > have this particular problem. (We've got a secure multiuser OS, so can > > as well use it to protect the DB against corruption.) > > So now you need root to set up new repositories? No thanks. yeah, it's a bit awkward to protect uncompressed repositories - but it will need some sort of kernel enforcement. (if userspace finds out the DB contains uncompressed blobs, it _will_ try to use them.) (perhaps having an in-kernel GIT-alike versioned filesystem will help - but that brings up the same 'I have to be root' issues. The FS will enforce the true immutability of objects.) perhaps having a new 'immutable hardlink' feature in the Linux VFS would help? I.e. a hardlink that can only be readonly followed, and can be removed, but cannot be chmod-ed to a writeable hardlink. That i think would be a large enough barrier for editors/build-tools not to play the tricks they already do that makes 'readonly' files virtually meaningless. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Change "pull" to _only_ download, and "git update"=pull+merge?
* Petr Baudis <[EMAIL PROTECTED]> wrote: > > I think pull is pull. If you are doing lots of local stuff and do not > > want it overwritten, it should have been in a forked branch. > > I disagree. This already forces you to have two branches (one to pull > from to get the data, mirroring the remote branch, one for your real > work) uselessly and needlessly. > > I think there is just no good name for what pull is doing now, and > update seems like a great name for what pull-and-merge really is. Pull > really is pull - it _pulls_ the data, while update also updates the > given tree. No surprises. yeah. In fact most of the times i did 'git pull pasky' in the past, the 'merge' phase was unsuccessful, and i had to nuke the tree and recreate it. All i did with the snapshots was to build them, so there were no local changes. Waiting a couple of days with doing a 'git pull pasky', or installing Linus' tree is a sure way to break the merging. e.g. to reproduce the last such failure i had today, do: cd git-pasky-base echo 8568e1a88c086d1b72b0e84ab24fa6888b5861b9 > .git/HEAD read-tree $(tree-id $(cat .git/HEAD)) checkout-cache -a -f make make install # make sure to use the older tools rm -rf .git/objects git pull pasky and i get: [...] fatal: unable to execute 'gitmerge-file.sh' fatal: merge program failed Conflicts during merge. Do git commit after resolving them. note that with earlier versions of pasky, i had other merge conflicts. Sometimes there were .rej files, sometimes some sort of script failure. So it seems rather unrobust at the moment. Especially if i happen to install Linus' tree and try to sync the pasky tree with those tools. another thing: it's confusing that during 'git pull', the rsync output is not visible. Especially during large rsyncs, it would be nice to see some progress. So i usually use a raw rsync not 'git pull', due to this. yet another thing: what is the canonical 'pasky way' of simply nuking the current files and checking out the latest tree (according to .git/HEAD). Right now i'm using a script to: read-tree $(tree-id $(cat .git/HEAD)) checkout-cache -a (i first do an 'rm -f *' in the working directory) i guess there's an existing command for this already? Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [script] ge: export commits as patches
* Petr Baudis <[EMAIL PROTECTED]> wrote: > Dear diary, on Tue, Apr 19, 2005 at 03:48:43PM CEST, I got a letter > where Ingo Molnar <[EMAIL PROTECTED]> told me that... > > is there any 'export commit as patch' support in git-pasky? I didnt find > > any such command (maybe it got added meanwhile), so i'm using the 'ge' > > hack below. > > > > e.g. i typically look at commits via 'git log', and then when i see > > something interesting, i look at the commit via the 'ge' script. E.g. > > "ge 834f6209b22af2941a8640f1e32b0f123c833061" done in the kernel tree > > will output a particular commit's header and the patch. > > Nice idea. I will add it, probably as 'git patch'. > > > TREE1=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^tree | cut -d' ' > > -f2) > > if [ "$TREE1" = "" ]; then echo 'ge '; exit -1; fi > > PARENT=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^parent | cut -d' > > ' -f2) > > if [ "$PARENT" = "" ]; then echo 'ge '; exit -1; fi > > TREE2=$(cat-file commit 2>/dev/null $PARENT | head -4 | grep ^tree | cut > > -d' ' -f2) > > if [ "$TREE2" = "" ]; then echo 'ge '; exit -1; fi > > commit-id and parent-id tools might be useful. ;-) find a cleaned up 'ge' script below. and please fix gitXnormid.sh to simply echo nothing and return with a -1 exit value when a nonsensical ID is passed to it. Right now the output is quite ugly if you do 'ge blah'. Ingo #!/bin/bash usage () { echo 'usage: ge ' exit -1 } if [ $# != 1 ]; then usage fi ME=$(commit-id $1); [ "$ME" = "" ] && usage PARENT=$(parent-id $ME); [ "$PARENT" = "" ] && usage TREE1= $(tree-id $ME); [ "$TREE1" = "" ] && usage TREE2= $(tree-id $PARENT); [ "$TREE2" = "" ] && usage cat-file commit $ME echo git diff -r $TREE2:$TREE1 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: missing: git api, reference, user manual and mission statement
* Kevin Smith <[EMAIL PROTECTED]> wrote: > Klaus Robert Suetterlin wrote: > > 1) There is no clear (e.g. by name) distinction between ``git as done > > by Linus'', which is a kind of content addressable database with added > > semantics, and ``git as done by the rest of You'', which is a kind of > > SCM on top of Linuses stuff. > > I also see this as one of the biggest obstacles right now. It would be > very helpful if we could achieve the clear separation between git and > non-git that has been part of the design since the beginning. > > Git is very immature, and currently should only be used by brave > pioneers. About the only way for a mortal to even try git is to stick > to git-pasky releases, and not try to track all the patches flying > around. hey, it's a 2 weeks old project, but it's certainly one of the fastest-growing projects i've ever seen: it has so much steam that it's scary :) It seems that a true emergency focused a massive, spontaneous concentration of OSS development power. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: naive question
* David Woodhouse <[EMAIL PROTECTED]> wrote: > On Tue, 2005-04-19 at 23:00 +1000, Paul Mackerras wrote: > > Is there a way to check out a tree without changing the mtime of any > > files that you have already checked out and which are the same as the > > version you are checking out? It seems that checkout-cache -a doesn't > > overwrite any existing files, and checkout-cache -f -a overwrites all > > files and gives them the current mtime. This is a pain if you are > > using make and your tree is large (like, for instance, the linux > > kernel :), because it means that after a checkout-cache -f -a you get > > to recompile everything. > > Corollary: why aren't we storing mtime in the tree objects? Check the "[bug] git: check-files mtime problem?" thread - i noticed this problem before and gave a few suggestions but the discussion got nowhere. But the problem is still very much present. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[script] ge: export commits as patches
is there any 'export commit as patch' support in git-pasky? I didnt find any such command (maybe it got added meanwhile), so i'm using the 'ge' hack below. e.g. i typically look at commits via 'git log', and then when i see something interesting, i look at the commit via the 'ge' script. E.g. "ge 834f6209b22af2941a8640f1e32b0f123c833061" done in the kernel tree will output a particular commit's header and the patch. Ingo #!/bin/bash if [ $# != 1 ]; then echo 'ge ' exit -1 fi TREE1=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^tree | cut -d' ' -f2) if [ "$TREE1" = "" ]; then echo 'ge '; exit -1; fi PARENT=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^parent | cut -d' ' -f2) if [ "$PARENT" = "" ]; then echo 'ge '; exit -1; fi TREE2=$(cat-file commit 2>/dev/null $PARENT | head -4 | grep ^tree | cut -d' ' -f2) if [ "$TREE2" = "" ]; then echo 'ge '; exit -1; fi cat-file commit $1 echo git diff -r $TREE2:$TREE1 - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] git: fix 1-byte overflow in show-files.c
* Petr Baudis <[EMAIL PROTECTED]> wrote: > > will attempt to append a "/" string to the directory name - resulting in > > a 1-byte overflow (a zero byte is written to offset 4097, which is > > outside the array). > > The name ends precisely at offset 4095 with its NUL character: > > {PATH_MAX} > Maximum number of bytes in a pathname, including the terminating > null character. > [ http://www.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html ] > > So, if I'm not mistaken, '/' will be written at offset 4095 instead of > the NUL and the NUL will be written at 4096. Everything's fine, right? yeah, you are right - ignore this patch. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > On Sun, 17 Apr 2005, Ingo Molnar wrote: > > > > in fact, this attack cannot even be proven to be malicious, purely via > > the email from Malice: it could be incredible bad luck that caused that > > good-looking patch to be mistakenly matching a dangerous object. > > I really hate theoretical discussions. i was only replying to your earlier point: > > > Almost all attacks on sha1 will depend on _replacing_ a file with > > > a bogus new one. So guys, instead of using sha256 or going > > > overboard, just make sure that when you synchronize, you NEVER > > > import a file you already have. which point i still believe is subtly wrong. You were suggesting to concentrate on file replacement to counter most of the practical attacks, while i pointed out an attack _using the same basic mechanism that your point above supposed_. [ if you can replace a file with a known hash, with a bogus new one, and you still have enough control over the contents of your bogus new file that it is 1) a valid file that builds 2) compromises the kernel, then you likely have the same amount of control my 'theoretical' attack requires. ] > And the thing is, _if_ somebody finds a way to make sha1 act as just a > complex parity bit, and comes up with generating a clashing object > that actually makes sense, then going to sha256 is likely pointless > too [...] yes, that's why i suggested to not actually trust the hash to be cryptographically secure, but to just assume it's a good generic hash we can design a DB around, and to turn -DCOLLISION_CHECK on and enforce consistency rules on boundaries. [ it's not bad to keep sha1 because even my suggested enhancement still leaves 'content-less trust-pointers to untrusted content via email' vectors open against attack (maintainer sends you an email that commit X in Malice's repository Y is fine to pull, and you pull it blindly, while the attacker has replaced his content with the compromised one meanwhile), but it at least validates the bulk traffic that goes into the DB: patches via emails and trusted repositories. ] so all i was suggesting was to extend your suggested 'overwrite collision check' to a stricter 'content we throw away and use the sha1 shortcut for needs to be checked against the in-DB content as well'. in other words, your suggested 'rename check' is checking for 'positive duplicate content', while my addition would also check for 'negative duplicate content' as well. but as usual, i could be wrong, so dont take this too serious :-) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > The compromise relies on you having reviewed something harmless, while > in reality what happened within the DB was far less harmless. And the > DB remains self-consistent: neither fsck, nor others importing your > tree will be able to detect the compromise. This attack can only be > detected when you apply the patch, after that point all the > information (except Malice's message in your inbox) is gone. in fact, this attack cannot even be proven to be malicious, purely via the email from Malice: it could be incredible bad luck that caused that good-looking patch to be mistakenly matching a dangerous object. In fact this could happen even today, _accidentally_. (but i'm willing to bet that hell will be freezing over first, and i'll have some really good odds ;) There's probably a much higher likelyhood of Linus' tree getting corrupted in some old fashioned way and introducing a security hole by accident) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Brad Roberts <[EMAIL PROTECTED]> wrote: > While I agree that a hash collision is bad and certainly worth > preventing during new object creation, for it to actually implant a > trojan in a build successfully it'd have to meet even more criteria > than you've layed out. It'd have to... > - provide all the public symbols the shadowed object provided so that it > would still build and link successfully that's not a problem. Most modules dont provide public symbols. Especially not drivers. Generally it's the modules that _dont_ have any global impact that get reviewed less stringently - an attacker would thus choose them for psychological reasons anyway. > - be shadowing an object that's part of an active tree > > Shadowing an object that's not part of the working tree means > something on another branch or obsoleted some time in the past is > still db corruption, but not nearly as big an issue from a trojan > standpoint. it's not DB corruption, it's a feature of GIT: it's a content _cache_, new and old alike. Nothing in GIT says that old objects in the repository (which are still very much part of history) cannot be revived in newer trees. (in fact it regularly happens - e.g. if a fix is undone manually.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > Almost all attacks on sha1 will depend on _replacing_ a file with a > bogus new one. So guys, instead of using sha256 or going overboard, > just make sure that when you synchronize, you NEVER import a file you > already have. here is a bit complex, but still practical attack that doesnt rely on replacement and which can only be detected if we check the sha1 uniqueness assumptions. If you can generate a duplicate sha1 key for an arbitrary 'target' file, and Malice sends you a GIT-generated patch that introduces a new file (which doesnt exist in the current tree) which you review (in the email) and which looks safe to apply & harmless. Maybe the patch has a bit weird formatting and some weird comments (which in reality Malice used to generate the proper sha1 key) but otherwise the patch is for some seldom used arcane driver that no-one used for quite some time and no-one really cares about, so you are happy to apply the patch. The compromise occurs when you apply the patch: the seemingly harmless patch has an sha1 key that Malice manufacured to match that of an already existing, 'dangerous' object in your database. With tens of thousands (or hundreds of thousands) of objects expected in the repository sooner or later, there's quite a selection to pick from. Once you apply the patch, instead of the expected new file that you reviewed and found safe, the attacker has the other object included in the official kernel. A dangerous object can be anything: e.g. a debugging hack that allows arbitrary kernel-space writes. Or a known-insecure module (which since then got fixed, but the buggy code still exists in the DB). The module is in a single file and is self-installing (e.g. it has __init code to register itself as some driver.) Malice might even previously plant a dangerous object as some 'firmware module' in another arcane driver, which doesnt get compiled by default, but still shows up in the DB. Or Malice might plant a dangerous object via an innocent-looking documentation file. (which contains some sample code and is called sample.txt) this type of 'false sharing attack' can only be prevented if an object is only 'shared' with another object if it has been memcmp-ed with the object in the repository. I.e. if we trust the sharing decision! Once the attack has occured it cannot be detected automatically: only people will notice it. (why did that weird unrelated module show up in that old driver?) The compromise relies on you having reviewed something harmless, while in reality what happened within the DB was far less harmless. And the DB remains self-consistent: neither fsck, nor others importing your tree will be able to detect the compromise. This attack can only be detected when you apply the patch, after that point all the information (except Malice's message in your inbox) is gone. so unless we actively check for collisions, once an sha1 key can be generated at will on near-arbitrary input, it's not a secure system anymore. We might be lucky and safe, but we wont be secure. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use libcurl to use HTTP to get repositories
* Daniel Barkalow <[EMAIL PROTECTED]> wrote: > Still leaks a bit of memory due to bug copied from read-tree. Linus, should i resend the 18 fixes i sent the other day? (as a GIT repository perhaps?) I found roughly 6 common memory leaks, 8 theoretical memory leaks, 2 overflows and did a couple of cleanups. One of the patches [the cache collision related thing] we agreed was not needed, the rest is still very much valid i think. I did some basic testing with the fixes applied, nothing seemed to break in any visible way in these tests. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a > > script that will apply all the patches in order and will create a > > pristine 2.6.12-rc2 tree. > > Hey, that's great. I got the CVS repo too, and I was looking at it, > but the more I looked at it, the more I felt that the main reason I > want to import it into git ends up being to validate that my size > estimates are at all realistic. > > I see that Thomas Gleixner seems to have done that already, and come > to a figure of 3.2GB for the last three years, which I'm very happy > with, mainly because it seems to match my estimates to a tee. [...] (yeah, we apparently worked in parallel - i only learned about his efforts after i sent my mail. He was using BK to extract info, i was using the CVS tree alone and no BK code whatsoever. (I dont think there will be any argument about who owns what, but i wanted to be on the safe side, and i also wanted to see how complete and usable the CVS metadata is - it's close to perfect i'd say, for the purposes i care about.)) > But I wonder if we actually want to actually populate the whole > history.. yeah, it definitely feels a bit brave to import 28,000 changesets into a source-code database project that will be a whopping 2 weeks old in 2 days ;) Even if we felt 100% confident about all the basics (which we do of course ;), it's just simply too young to tie things down via a 3.2GB database. It feels much more natural to grow it gradually, 28,000 changesets i'm afraid would just suffocate the 'project growth dynamics'. Not going too fast is just as important as not going too slow. I didnt generate the patchset to get it added into some central repository right now, i generated it to check that we _do_ have all the revision history in an easy to understand format which does generate today's kernel tree, so that we can lean back and worry about the full database once things get a bit more settled down (in a couple of months or so). It's also an easy testbed for GIT itself. but the revision history was one of the main reasons i used BK myself, so we'll need a merged database eventually. Occasionally i needed to check who was the one who touched a particular piece of code - was that fantastic new line of code written by me, or was that buggy piece of crap written by someone else? ;) Also, looking at a change and then going to the changeset that did it, and then looking at the full picture was pretty useful too. So that sort of annotation, and generally navigating around _quickly_ and looking at the 'flow' of changes going into a particular file was really useful (for me). Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* David Mansfield <[EMAIL PROTECTED]> wrote: > Ingo Molnar wrote: > >* Ingo Molnar <[EMAIL PROTECTED]> wrote: > > > > > >>the patches contain all the existing metadata, dates, log messages and > >>revision history. (What i think is missing is the BK tree merge > >>information, but i'm not sure we want/need to convert them to GIT.) > > > > > >author names are abbreviated, e.g. 'viro' instead of > >[EMAIL PROTECTED], and no committer information is > >included (albeit commiter ought to be Linus in most cases). These are > >limitations of the BK->CVS gateway i think. > > > > Glad to hear cvsps made it through! I'm curious what the manual > fixups required were, except for the binary file issue (logo.gif). --cvs-direct was needed to speed it up from 'several days to finish' to 'several hours to finish', but it crashed on a handful of patches [i used the latest devel snapshot so this isnt a complaint]. (one of the crashes was when generating 1860.patch.) Also, 'cvs rdiff' apparently emits an empty patch for diffs that remove a file that end without having a newline character - but this isnt cvsps's problem. (grep for +++ in the patchset to find those cases.) > As to the actual email addresses, for more recent patches, the > Signed-off should help. For earlier ones, isn't their some script > which 'knows' a bunch of canonical author->email mappings? (the > shortlog script or something)? yeah, that's not that much of a problem, most of the names are unique, and the rest can be fixed up too. > Is the full committer email address actually in the changeset in BK? > If so, given that we have the unique id (immutable I believe) of the > changset, could it be extracted directly from BK? i think it's included in BK. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > the patches contain all the existing metadata, dates, log messages and > revision history. (What i think is missing is the BK tree merge > information, but i'm not sure we want/need to convert them to GIT.) author names are abbreviated, e.g. 'viro' instead of [EMAIL PROTECTED], and no committer information is included (albeit commiter ought to be Linus in most cases). These are limitations of the BK->CVS gateway i think. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: using git directory cache code in darcs?
* David Roundy <[EMAIL PROTECTED]> wrote: > 2) Will a license be chosen soon for git? Or has one been chosen, and > I missed it? I can't really include git code in darcs without a > license. I'd prefer GPLv2 or later (since that's how darcs is > licensed), but as long as it's at least compabible with GPLv2, I'll be > all right. there's a license in the latest code, it's GPLv2. Here's the COPYING file: --- Note that the only valid version of the GPL as far as this project is concerned is _this_ particular version of the license (ie v2, not v2.2 or v3.x or whatever), unless explicitly otherwise stated. HOWEVER, in order to allow a migration to GPLv3 if that seems like a good idea, I also ask that people involved with the project make their preferences known. In particular, if you trust me to make that decision, you might note so in your copyright message, ie something like This file is licensed under the GPL v2, or a later version at the discretion of Linus. might avoid issues. But we can also just decide to synchronize and contact all copyright holders on record if/when the occasion arises. Linus Torvalds GNU GENERAL PUBLIC LICENSE Version 2, June 1991 Copyright (C) 1989, 1991 Free Software Foundation, Inc. 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed. Preamble The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public License is intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users. This General Public License applies to most of the Free Software Foundation's software and to any other program whose authors commit to using it. (Some other Free Software Foundation software is covered by the GNU Library General Public License instead.) You can apply it to your programs, too. When we speak of free software, we are referring to freedom, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish), that you receive source code or can get it if you want it, that you can change the software or use pieces of it in new free programs; and that you know you can do these things. To protect your rights, we need to make restrictions that forbid anyone to deny you these rights or to ask you to surrender the rights. These restrictions translate to certain responsibilities for you if you distribute copies of the software, or if you modify it. For example, if you distribute copies of such a program, whether gratis or for a fee, you must give the recipients all the rights that you have. You must make sure that they, too, receive or can get the source code. And you must show them these terms so they know their rights. We protect your rights with two steps: (1) copyright the software, and (2) offer you this license which gives you legal permission to copy, distribute and/or modify the software. Also, for each author's protection and ours, we want to make certain that everyone understands that there is no warranty for this free software. If the software is modified by someone else and passed on, we want its recipients to know that what they have is not the original, so that any problems introduced by others will not reflect on the original authors' reputations. Finally, any free program is threatened constantly by software patents. We wish to avoid the danger that redistributors of a free program will individually obtain patent licenses, in effect making the program proprietary. To prevent this, we have made it clear that any patent must be licensed for everyone's free use or not licensed at all. The precise terms and conditions for copying, distribution and modification follow. GNU GENERAL PUBLIC LICENSE TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION 0. This License applies to any program or other work which contains a notice placed by the copyright holder saying it may be distributed under the terms of this General Public License. The "Program", below, refers to any such program or work, and a "work based on the Program" means either the Program or any derivative work under copyright law: that is to say, a work containing the Program or a portion of it, either verbatim or with modifications and/or translated into another language. (Hereinafter, translation is included without limitation in the term "modification".) Each licensee is addressed as "you". Activities other than copying, distribution and modification are not covered by this License; they are outsid
full kernel history, in patchset format
i've converted the Linux kernel CVS tree into 'flat patchset' format, which gave a series of 28237 separate patches. (Each patch represents a changeset, in the order they were applied. I've used the cvsps utility.) the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a script that will apply all the patches in order and will create a pristine 2.6.12-rc2 tree. it needed many hours to finish, on a very fast server with tons of RAM, and it also needed a fair amount of manual work to extract it and to make it usable, so i guessed others might want to use the end result as well, to try and generate large GIT repositories from them (or to run analysis over the patches, etc.). the patches contain all the existing metadata, dates, log messages and revision history. (What i think is missing is the BK tree merge information, but i'm not sure we want/need to convert them to GIT.) it's a 136 MB tarball, which can be downloaded from: http://kernel.org/pub/linux/kernel/people/mingo/Linux-2.6-patchset/ the ./generate-2.6.12-rc2 script generates the 2.6.12-rc2 tree into linux/, from scratch. (No pre-existing kernel is needed, as 2.patch generates the full 2.4.0 kernel tree.) The patching takes a couple of minutes to finish, on a fast box. below i've attached a sample patch from the series. note: i kept the patches the cvsps utility generated as-is, to have a verifiable base to work on. There were a very small amount of deltas missed (about a dozen), probably resulting from CVS related errors, these are included in the diff-CVS-to-real patch. Also, the patch format cannot create the Documentation/logo.gif file, so the script does this too - just to be able to generate a complete 2.6.12-rc2 tree that is byte-for-byte identical to the real thing. Ingo - PatchSet 1234 Date: 2002/04/11 18:29:07 Author: viro Branch: HEAD Tag: (none) Log: [PATCH] crapectomy in include/linux/nfsd/syscall.h Removes an atavism in declaration of sys_nfsservctl() - sorry, I should've remove that junk when cond_syscall() thing was done. BKrev: 3cb5c7e3phTYgiz1YLsjQ_McTo9pOQ Members: ChangeSet:1.1234->1.1235 include/linux/nfsd/syscall.h:1.3->1.4 Index: linux/include/linux/nfsd/syscall.h === RCS file: /home/mingo/linux-CVS/linux/include/linux/nfsd/syscall.h,v retrieving revision 1.3 retrieving revision 1.4 diff -u -r1.3 -r1.4 --- linux/include/linux/nfsd/syscall.h 15 Mar 2002 23:06:06 - 1.3 +++ linux/include/linux/nfsd/syscall.h 11 Apr 2002 17:29:07 - 1.4 @@ -132,11 +132,7 @@ /* * Kernel syscall implementation. */ -#if defined(CONFIG_NFSD) || defined(CONFIG_NFSD_MODULE) extern asmlinkage long sys_nfsservctl(int, struct nfsctl_arg *, void *); -#else -#define sys_nfsservctl sys_ni_syscall -#endif extern int exp_addclient(struct nfsctl_client *ncp); extern int exp_delclient(struct nfsctl_client *ncp); extern int exp_export(struct nfsctl_export *nxp); - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
* David Lang <[EMAIL PROTECTED]> wrote: > this issue was raised a few days ago in the context of someone > tampering with the files and it was decided that the extra checks were > good enough to prevent this (at least for now), but what about > accidental collisions? > > if I am understanding things right the objects get saved in the > filesystem in filenames that are the SHA1 hash. of two legitimate > files have the same hash I don't see any way for both of them to > exist. > > yes the risk of any two files having the same has is low, but in the > earlier thread someone chimed in and said that they had two files on > their system that had the same hash.. you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision checking (disabled currently). If there indeed exist two files that have different content but the same hash, could someone send those two files? Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with git-pasky II.
* David Woodhouse <[EMAIL PROTECTED]> wrote: > On Fri, 2005-04-15 at 11:36 +0200, Ingo Molnar wrote: > > do such cases occur frequently? In the kernel at least it's not too > > typical. > > Isn't it? I thought it was a fairly accurate representation of the > process "I make a whole bunch of changes to files I maintain, pulling > from Linus while occasionally asking him to pull from my tree. > Sometimes my files are changed by someone else in Linus' tree, and > sometimes I change files that I don't actually own.". but the specific scenario you described would require _Linus'_ tree to be in limbo for a long time, and have uncommitted half-done edits. I.e.: (A1B2)--(A2B2)--(A2'B3) / \ /\ /\ / \ (A1B1) X (...) \/ \ / \ / \/ (A2B1)--(A2B2)--(A3B2') in the above scenario Linus' tree needs to 'cross' with a maintainer's tree. (maintainer's tree wont cross with another maintainer's tree, as maintainer-to-maintainer merges rare.) but for the scenario to occur, i think there needs to be a prolongued "limbo" period in Linus' tree for a 'crossing' to happen. But Linus' merges are typically almost atomic: they are done then they are pushed out. It's definitely not in the 'days, sometimes weeks' timescale as maintainer trees are. so for the scenario to occur, a maintainer, from whom Linus has just pulled an update and Linus is merging the tree manually without comitting, has to pull a file from the earlier Linus tree, and then Linus has to modify that same file again. This does not seem to be a common scenario. so i think to avoid the scenario, maintainers should not pull from each other - they should only pull/push to/from Linus' tree. Maybe this is an unacceptable limitation? Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with git-pasky II.
* David Woodhouse <[EMAIL PROTECTED]> wrote: > Consider a simple repository which contains two files A and B. We > start off with the first version of each ('A1B1'), and the owner of > each file takes a branch and modifies their own file. There is > cross-pulling between the two, and then each modifies the _other's_ > file as well as their own... > >(A1B2)--(A2B2)--(A2'B3) > / \ /\ >/\ / \ > (A1B1) X (...) >\/ \ / > \ / \/ >(A2B1)--(A2B2)--(A3B2') > > Now, we're trying to merge the two branches. It appears that the most > useful common ancestor to use for a three-way merge of file A is the > version from tree 'A2B1', while the most useful common ancestor for > merging file B is that in 'A1B2'. do such cases occur frequently? In the kernel at least it's not too typical. Would it be a problem to go for the simple solution of using (A1B1) as the common ancestor (based on the tree graph), and then to do a 3-way merge of all changes from that point on? Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: another perspective on renames.
* Paul Jackson <[EMAIL PROTECTED]> wrote: > Scott wrote: > > Anyway, maybe it's worth thinking a little about an SCM in which this is a > > feature, instead of (or in addition to) automatically assuming this is a > > bug we need to add infrastructure to work around. > > Agreed. > > To me, the main purpose in tracking renames is to obtain a deeper > history of the line-by-line changes in a file. > > ==> But that doesn't seem relevant here. > > Last I looked, git has no such history. A given file contents is the > indivisable atom of the git world, with no fine structure. > > This is quite unlike classic SCM's, built on file formats that track > source lines, not files, as the atomic unit. i believe the fundamental thing to think about is not file or line or namespace, but 'tracking developer intent'. While keeping in mind that GIT is not an SCM, all SCMs boil down to this single thing: being able to track what the developer did and why he did it - to be a useful tool later on. (SCMs are for humans with bad limitations, who have this fundamental design bug and keep forgetting things.) the basic question is, how much to track. The most extreme form of tracking (just for the sake of visualizing it) would be to have an eye-position recognizing software attached to a webcam looking at the developer, and then exactly mapping what he did, how long did he look at one particular line of code and exactly what did he type when doing that. [ Perhaps also a thought-reader module in addition, once one is available. (combined with another module that removes all the swearing)] but i think Linus is on the right track to suggest that "the file names dont matter all that much, it's all about the content". Global diffs might track most types of plain renames, and if it gets it wrong - do we care? Misdetection of renames can happen, but realistically only with small files and trivial code, which wont have alot of history. The only serious type of misdetection would be if two large modules in two different places in the namespace happen to have exactly the same content but have a different history (because e.g. they were merged in via two separate trees, one came from one tree, the other from the other tree), and the developer renamed both of them in the same commit: in such a case the global diff would have no way to figure out what the proper thread of history is. But is this a realistic scenario? If the two files are nontrivial and have the same content, why werent they merged in the namespace in the first place? the moment we allow 'namespace' into the picture, things get complex and ugly. Directory recursion is already a complexity that would have been nice to avoid. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > [...] Ie if you notice a rename, you first commit the rename (and you > can _see_ it's a rename, since the object didn't change, and the sha1 > stayed the same, which in git-speak means that it is the same object, > ie that _is_ a rename as far as git is concerned), and then you create > the "this is the data that changed" as a _second_ commit. ok, i accept your point of not putting this into such a low level as the object abstraction. Was a bad idea. but i dont think the above would be enough: there can be renames of objects that have the same sha1 hash as other objects in the same tree, and developers want to track individual objects, regardless of whether other files share the same content. So some formal operation would be needed to signal renames - e.g. to embedd it in the commit object, per David's suggestion. The thing i tried to avoid was to list long filenames in the commit (because of the tree hierarchy we'd need to do tree-absolute pathnames or something like that, and escape things, and do lookups - duplicating a VFS which is quite bad) - it would be better to identify the rename source and target via its tree object hash and its offset within that tree. Such information could be embedded in the commit object just fine. Something like: me bb95843a5a0f397270819462812735ee29796fb4 tree 1756b578489f93999ded68ae347bef7d6063101c parent 9f02d4d233223462d3f6217b5837b786e6286ba4 author committer rename 39021759c903a943a33a28cfbd5070d36d851581 15234 9f02d4d233223462d3f6217b5837b786e6286ba4 16163 ? Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Handling renames.
* David Woodhouse <[EMAIL PROTECTED]> wrote: > I've been looking at tracking file revisions. One proposed solution > was to have a separate revision history for individual files, with a > new kind of 'filecommit' object which parallels the existing 'commit', > referencing a blob instead of a tree. Then trees would reference such > objects instead of referencing blobs directly. > > I think that introduces a lot of redundancy though, because 99% of the > time, the revision history of the individual file is entirely > reproducible from the revision history of the tree. It's only when > files are renamed that we fall over -- and I think we can handle > renames fairly well if we just log them in the commit object. how about the following structure: - tree_new ---> - tree_old ---> rename_commit -> blob the rename_commit object just contains a pointer to the file content blob. If a rename happens then the old tree references the rename_commit object (instead of the blob), and the new tree references it too. This way there's no need to list the rename via namespace means: if a tree entry points to a rename_commit object then a rename happened and the rename_commit object is looked up in the old tree to get the old name. there's no redundancy caused by this method: only renames (which are rare) go through the rename_commit redirection. (to speed up the lookup the rename_commit object could cache the offset of the two names within their tree objects.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] git: fix memory leak #2 in read-cache.c
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > In other words, if the common case is that we update a couple of > entries in the active cache, we actually saved 1.6MB (+ malloc > overhead for the 17 _thousand_ allocations) by my approach. > > And the leak? There's none. We never actually update an existing entry > that was allocated with malloc(), unless the user does something > stupid. In other words, the only case where there is a "leak" is when > the user does something like > > update-cache file file file file file file .. > > with the same file listed several times. fair enough - as long as this is only used in a scripted environment, and not via some library and not within a repository server, web backend, etc. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leak #3 in update-cache.c
the patch below fixes a third memory leak in update-cache.c. (the mmap-ed file needs to be unmapped too) Ontop of the previous update-cache.c patches. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- update-cache.c.orig +++ update-cache.c @@ -32,6 +32,8 @@ static int index_fd(const char *path, in if (!out || (int)(long)in == -1) { free(out); free(metadata); + if ((int)(long)in != -1 && size) + munmap(in, size); return -1; } @@ -66,6 +68,8 @@ static int index_fd(const char *path, in free(out); free(metadata); + if (size) + munmap(in, size); return ret; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [patch] git: fix memory leak #2 in read-cache.c
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > this patch fixes a memory leak in read-cache.c: when there's cache > entry collision we should free the previous one. > + free(active_cache[pos]); > active_cache[pos] = ce; i'm having second thoughs about this one: active_cache entries are not always malloc()-ed - e.g. read_cache() will construct them from the mmap() of the index file. Which must not be free()d! one safe solution would be to malloc() all these entries and copy them over from the index file? Slightly slower but safer and free()-able when update-cache.c notices a collision. The (tested) patch below does this. this would also make Martin Schlemmer's update-cache.c fix safe. (without this second patch, free(active_cache[pos]) might crash, and that crash is would possibly be remote exploitable via a special repository that tricks the index file to look in a certain way.) Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- read-cache.c.orig +++ read-cache.c @@ -453,10 +453,17 @@ int read_cache(void) offset = sizeof(*hdr); for (i = 0; i < hdr->entries; i++) { - struct cache_entry *ce = map + offset; + struct cache_entry *ce = map + offset, *tmp; offset = offset + ce_size(ce); - active_cache[i] = ce; + + tmp = malloc(ce_size(ce)); + if (!tmp) + return error("malloc failed"); + memcpy(tmp, ce, ce_size(ce)); + active_cache[i] = tmp; } + munmap(map, size); + return active_nr; unmap: - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leak in write-tree.c
this patch fixes a memory leak in write-tree.c's write_tree() function - we didnt free the temporary output buffer. Depending on the size of the tree written out this could leak a significant amount of RAM. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- write-tree.c.orig +++ write-tree.c @@ -33,12 +33,12 @@ static int write_tree(struct cache_entry { unsigned char subdir_sha1[20]; unsigned long size, offset; - char *buffer; + char *buffer0, *buffer; int i, nr; /* Guess at some random initial size */ size = 8192; - buffer = malloc(size); + buffer0 = buffer = malloc(size); offset = ORIG_OFFSET; nr = 0; @@ -97,6 +97,8 @@ static int write_tree(struct cache_entry offset -= i; write_sha1_file(buffer, offset, returnsha1); + free(buffer0); + return nr; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: clean up add_file_to_cache() in update-cache.c
this patch cleans up add_file_to_cache() to free up all memory it allocates. This has no significance right now as the only user of add_file_to_cache() die()s immediately in the 'leak' paths - but if the function is ever used without dying then this uncleanliness could lead to a memory leak. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- update-cache.c.orig +++ update-cache.c @@ -120,10 +120,17 @@ static int add_file_to_cache(char *path) ce->st_size = st.st_size; ce->namelen = namelen; - if (index_fd(path, namelen, ce, fd, &st) < 0) + if (index_fd(path, namelen, ce, fd, &st) < 0) { + free(ce); return -1; + } - return add_cache_entry(ce, allow_add); + if (add_cache_entry(ce, allow_add)) { + free(ce); + return -1; + } + + return 0; } static int match_data(int fd, void *buffer, unsigned long size) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leaks in update-cache.c
this patch fixes two memory leaks in update-cache.c: we didnt free the temporary input and output buffers used for compression. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- update-cache.c.orig +++ update-cache.c @@ -23,13 +23,17 @@ static int index_fd(const char *path, in void *metadata = malloc(namelen + 200); void *in; SHA_CTX c; + int ret; in = ""; if (size) in = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0); close(fd); - if (!out || (int)(long)in == -1) + if (!out || (int)(long)in == -1) { + free(out); + free(metadata); return -1; + } memset(&stream, 0, sizeof(stream)); deflateInit(&stream, Z_BEST_COMPRESSION); @@ -58,7 +62,12 @@ static int index_fd(const char *path, in SHA1_Update(&c, out, stream.total_out); SHA1_Final(ce->sha1, &c); - return write_sha1_buffer(ce->sha1, out, stream.total_out); + ret = write_sha1_buffer(ce->sha1, out, stream.total_out); + + free(out); + free(metadata); + + return ret; } /* - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix 1-byte overflow in show-files.c
this patch fixes a 1-byte overflow in show-files.c (looks narrow is is probably not exploitable). A specially crafted db object (tree) might trigger this overflow. 'fullname' is an array of 4096+1 bytes, and we do readdir(), which produces entries that have strings with a length of 0-255 bytes. With a long enough 'base', it's possible to construct a tree with a name in it that has directory whose name ends precisely at offset 4095. At that point this code: case DT_DIR: memcpy(fullname + baselen + len, "/", 2); will attempt to append a "/" string to the directory name - resulting in a 1-byte overflow (a zero byte is written to offset 4097, which is outside the array). Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- show-files.c.orig +++ show-files.c @@ -49,7 +49,7 @@ static void read_directory(const char *p if (dir) { struct dirent *de; - char fullname[MAXPATHLEN + 1]; + char fullname[MAXPATHLEN + 2]; // +1 byte for trailing slash memcpy(fullname, base, baselen); while ((de = readdir(dir)) != NULL) { - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leak #2 in read-cache.c
this patch fixes a memory leak in read-cache.c: when there's cache entry collision we should free the previous one. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- read-cache.c.orig +++ read-cache.c @@ -369,6 +369,7 @@ int add_cache_entry(struct cache_entry * /* existing match? Just replace it */ if (pos >= 0) { + free(active_cache[pos]); active_cache[pos] = ce; return 0; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leaks in read-cache.c
this patch fixes a common and a rare memory leak in read-cache.c. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- read-cache.c.orig +++ read-cache.c @@ -226,8 +226,11 @@ int write_sha1_file(char *buf, unsigned SHA1_Update(&c, compressed, size); SHA1_Final(sha1, &c); - if (write_sha1_buffer(sha1, compressed, size) < 0) + if (write_sha1_buffer(sha1, compressed, size) < 0) { + free(compressed); return -1; + } + free(compressed); if (returnsha1) memcpy(returnsha1, sha1, 20); return 0; - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] cleanup: read_sha1_file() -> malloc_read_sha1_file()
this patch renames read_sha1_file() to malloc_read_sha1_file(). There were a handful of memory-leaks related to read_sha1_file(), and some of those could possibly have been found sooner if the name indicated that an implicit malloc() is occurs within read_sha1_file(). This patch is ontop of the previous patches i sent. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- cat-file.c.orig +++ cat-file.c @@ -14,7 +14,7 @@ int main(int argc, char **argv) if (argc != 3 || get_sha1_hex(argv[2], sha1)) usage("cat-file [-t | tagname] "); - buf = read_sha1_file(sha1, type, &size); + buf = malloc_read_sha1_file(sha1, type, &size); if (!buf) die("cat-file %s: bad file", argv[2]); if (!strcmp("-t", argv[1])) { --- merge-tree.c.orig +++ merge-tree.c @@ -11,7 +11,7 @@ static struct tree_entry *read_tree(unsi { char type[20]; unsigned long size; - void *buf = read_sha1_file(sha1, type, &size); + void *buf = malloc_read_sha1_file(sha1, type, &size); struct tree_entry *ret = NULL, **tp = &ret; if (!buf || strcmp(type, "tree")) --- diff-tree.c.orig +++ diff-tree.c @@ -62,7 +62,7 @@ static void show_file(const char *prefix char *newbase = malloc_base(base, path, strlen(path)); void *tree; - tree = read_sha1_file(sha1, type, &size); + tree = malloc_read_sha1_file(sha1, type, &size); if (!tree || strcmp(type, "tree")) die("corrupt tree sha %s", sha1_to_hex(sha1)); @@ -164,10 +164,10 @@ static int diff_tree_sha1(const unsigned char type[20]; int retval; - tree1 = read_sha1_file(old, type, &size1); + tree1 = malloc_read_sha1_file(old, type, &size1); if (!tree1 || strcmp(type, "tree")) die("unable to read source tree (%s)", sha1_to_hex(old)); - tree2 = read_sha1_file(new, type, &size2); + tree2 = malloc_read_sha1_file(new, type, &size2); if (!tree2 || strcmp(type, "tree")) die("unable to read destination tree (%s)", sha1_to_hex(new)); retval = diff_tree(tree1, size1, tree2, size2, base); --- show-diff.c.orig +++ show-diff.c @@ -25,7 +25,7 @@ static void show_diff_empty(struct cache int lines=0; unsigned char type[20], *p, *end; - old = read_sha1_file(ce->sha1, type, &size); + old = malloc_read_sha1_file(ce->sha1, type, &size); if (size > 0) { int startline = 1; int c = 0; @@ -99,7 +99,7 @@ int main(int argc, char **argv) if (silent) continue; - new = read_sha1_file(ce->sha1, type, &size); + new = malloc_read_sha1_file(ce->sha1, type, &size); show_differences(ce->name, new, size); free(new); } --- read-cache.c.orig +++ read-cache.c @@ -183,7 +183,7 @@ void * unpack_sha1_file(void *map, unsig return buf; } -void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size) +void * malloc_read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size) { unsigned long mapsize; void *map, *buf; --- update-cache.c.orig +++ update-cache.c @@ -141,7 +141,7 @@ static int compare_data(struct cache_ent unsigned long size; char type[20]; - buffer = read_sha1_file(ce->sha1, type, &size); + buffer = malloc_read_sha1_file(ce->sha1, type, &size); if (buffer) { if (size == expected_size && !strcmp(type, "blob")) match = match_data(fd, buffer, size); --- ls-tree.c.orig +++ ls-tree.c @@ -11,7 +11,7 @@ static int list(unsigned char *sha1) unsigned long size; char type[20]; - buffer = read_sha1_file(sha1, type, &size); + buffer = malloc_read_sha1_file(sha1, type, &size); if (!buffer) die("unable to read sha1 file"); if (strcmp(type, "tree")) --- checkout-cache.c.orig +++ checkout-cache.c @@ -72,7 +72,7 @@ static int write_entry(struct cache_entr long wrote; char type[20]; - new = read_sha1_file(ce->sha1, type, &size); + new = malloc_read_sha1_file(ce->sha1, type, &size); if (!new || strcmp(type, "blob")) { if (new) free(new); --- cache.h.orig +++ cache.h @@ -95,7 +95,7 @@ extern int write_sha1_buffer(const unsig /* Read and unpack a sha1 file into memory, write memory to a sha1 file */ extern void * map_sha1_file(const unsigned char *sha1, unsigned
[patch] git: fix overflow in update-cache.c
this patch fixes a 1-byte overflow in update-cache.c (probably not exploitable). A specially crafted db object might trigger this overflow. the bug is that normally the 'type' field is parsed by read_sha1_file(), via: if (sscanf(buffer, "%10s %lu", type, size) != 2) i.e. 0-10 long strings, which take 1-11 bytes of space. Normally the type strings are stored in char [20] arrays, but in update-cache.c that is char [10], so a 1 byte overflow might occur. This should not happen with a 'friendly' DB, as the longest type string ("commit") is 7 bytes long. The fix is to use the customary char [20]. (someone might want to clean those open-coded constants up with a TYPE_LEN define, they do tend to cause problems like this. I'm not against open-coded constants (they make code much more readable), but for fields that get filled in from possibly hostile objects this is playing with fire.) hey, this might be the first true security fix for GIT? ;-) Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- update-cache.c.orig +++ update-cache.c @@ -139,7 +139,7 @@ static int compare_data(struct cache_ent if (fd >= 0) { void *buffer; unsigned long size; - char type[10]; + char type[20]; buffer = read_sha1_file(ce->sha1, type, &size); if (buffer) { - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leak in show-diff.c
this patch fixes a memory leak in show-diff.c. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- show-diff.c.orig +++ show-diff.c @@ -53,6 +53,7 @@ static void show_diff_empty(struct cache printf("\n"); fflush(stdout); } + free(old); } int main(int argc, char **argv) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: report parse_commit() errors in rev-tree.c
make actual use of the parse_commit() return value and print a warning, instead of silently ignoring it. Should never trigger on a valid DB. (alternatively we might use a die() in the sanity check place and could remove all the return code handling?) Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- rev-tree.c.orig +++ rev-tree.c @@ -64,6 +64,7 @@ static unsigned long parse_commit_date(c static int parse_commit(unsigned char *sha1) { struct revision *rev = lookup_rev(sha1); + int ret = 0; if (!(rev->flags & SEEN)) { void *buffer, *bufptr; @@ -81,13 +82,13 @@ static int parse_commit(unsigned char *s bufptr += 46; /* "tree " + "hex sha1" + "\n" */ while (!memcmp(bufptr, "parent ", 7) && !get_sha1_hex(bufptr+7, parent)) { add_relationship(rev, parent); - parse_commit(parent); + ret |= parse_commit(parent); bufptr += 48; /* "parent " + "hex sha1" + "\n" */ } rev->date = parse_commit_date(bufptr); free(buffer); } - return 0; + return ret; } static void read_cache_file(const char *path) @@ -208,7 +209,8 @@ int main(int argc, char **argv) } if (nr >= MAX_COMMITS || get_sha1_hex(arg, sha1[nr])) usage("rev-tree [--edges] [--cache ] []"); - parse_commit(sha1[nr]); + if (parse_commit(sha1[nr])) + fprintf(stderr, "warning: rev-tree: bad commit!\n"); nr++; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix rare memory leak in rev-tree.c
this patch fixes a rare memory leak in rev-tree.c. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- rev-tree.c.orig +++ rev-tree.c @@ -73,8 +73,11 @@ static int parse_commit(unsigned char *s rev->flags |= SEEN; buffer = bufptr = read_sha1_file(sha1, type, &size); - if (!buffer || strcmp(type, "commit")) + if (!buffer || strcmp(type, "commit")) { + if (buffer) + free(buffer); return -1; + } bufptr += 46; /* "tree " + "hex sha1" + "\n" */ while (!memcmp(bufptr, "parent ", 7) && !get_sha1_hex(bufptr+7, parent)) { add_relationship(rev, parent); - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leaks in read-tree.c
this patch fixes one common, and 4 rare memory leaks in read-tree.c. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- read-tree.c.orig +++ read-tree.c @@ -23,23 +23,27 @@ static int read_one_entry(unsigned char static int read_tree(unsigned char *sha1, const char *base, int baselen) { - void *buffer; + void *buffer0, *buffer; unsigned long size; char type[20]; - buffer = read_sha1_file(sha1, type, &size); + buffer0 = buffer = read_sha1_file(sha1, type, &size); if (!buffer) return -1; - if (strcmp(type, "tree")) + if (strcmp(type, "tree")) { + free(buffer); return -1; + } while (size) { int len = strlen(buffer)+1; unsigned char *sha1 = buffer + len; char *path = strchr(buffer, ' ')+1; unsigned int mode; - if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1) + if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1) { + free(buffer0); return -1; + } buffer = sha1 + 20; size -= len + 20; @@ -53,13 +57,19 @@ static int read_tree(unsigned char *sha1 newbase[baselen + pathlen] = '/'; retval = read_tree(sha1, newbase, baselen + pathlen + 1); free(newbase); - if (retval) + if (retval) { + free(buffer0); return -1; + } continue; } - if (read_one_entry(sha1, base, baselen, path, mode) < 0) + if (read_one_entry(sha1, base, baselen, path, mode) < 0) { + free(buffer0); return -1; + } } + free(buffer0); + return 0; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: cleanup in ls-tree.c
cleanup: this patch adds a free() to ls-tree.c. (Technically it's not a memory leak yet because the buffer is allocated once by the function and then the utility exits - but it's a tad cleaner to not leave such assumptions in the code, so that if someone reuses the function (or extends the utility to include a loop) the uncleanliness doesnt develop into a real memory leak.) Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- ls-tree.c.orig +++ ls-tree.c @@ -33,6 +33,8 @@ static int list(unsigned char *sha1) type = S_ISDIR(mode) ? "tree" : "blob"; printf("%03o\t%s\t%s\t%s\n", mode, type, sha1_to_hex(sha1), path); } + free(buffer); + return 0; } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leak #2 in checkout-cache.c
this patch fixes another (very rare) memory leak in checkout-cache. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- checkout-cache.c.orig +++ checkout-cache.c @@ -74,6 +74,8 @@ static int write_entry(struct cache_entr new = read_sha1_file(ce->sha1, type, &size); if (!new || strcmp(type, "blob")) { + if (new) + free(new); return error("checkout-cache: unable to read sha1 file of %s (%s)", ce->name, sha1_to_hex(ce->sha1)); } - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] git: fix memory leak in checkout-cache.c
this patch fixes a memory leak in checkout-cache. Ingo Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]> --- checkout-cache.c.orig +++ checkout-cache.c @@ -48,6 +48,7 @@ static void create_directories(const cha buf[len] = 0; mkdir(buf, 0755); } + free(buf); } static int create_file(const char *path, unsigned int mode) - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
cache-cold repository performance
* Ingo Molnar <[EMAIL PROTECTED]> wrote: > i'd be surprised if it was twice as fast - cache-cold linear checkouts > are _seek_ limited, and it doesnt matter whether after a 1-2 msec > track-to-track disk seek the DMA engine spends another 30 microseconds > DMA-ing 60K uncompressed data instead of 30K compressed... (there are > other factors, but this is the main thing.) i've benchmarked cache-cold compressed vs. uncompressed performance, to shed some more light on the performance differences between flat and compressed repositories. i did alot of testing, and i primarily concentrated on being able to _trust_ the benchmark results, not to generate some quick numbers. The major problem was that the timing of the reads associated with 'checking out a large tree' is very unstable, even on a completely isolated testsystem with very common (and predictable) IO hardware. the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9). Average file size is 13.2 KB uncompressed, 5.9 KB compressed. Firstly, the timings are very sensitive to the way the tree was created. To have a 'fair' on-disk layout the trees have to be created in an identical fashion: e.g. it is not valid to copy the uncompressed tree and run gzip over it - that will create a 'sparse' on-disk layout penalizing the compressed layout and making it 30% slower than the uncompressed layout! I first created the two trees, then i "cp -a"-ed them over into a new directory one after each other, so that they get on similar on-disk positions as well. I also created 2 more pairs of such trees to make sure disk layout is fair. all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64 3200+, using a large, top of the line IDE disk. The kernel was 2.6.12-rc2, the filesystem was ext3 with enough free space to not be fragmented, both noatime and nodiratime was specified so that no write activities whatever occur during the 'checkout'. the operation timed was a simple: time find . -type f | xargs cat > /dev/null done in the root of the given tree. This generates the very same readonly IO pattern for each test. I've run the tests 10 times (i.e. have done 10 fresh reboots), but after every reboot i permutated the order of trees tested - to make sure there is no interaction between trees. (there was no interaction) here are the raw numbers, elapsed real time in seconds: flat-1: 29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4: 29.5 gzip-1: 41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9: 40.8 flat-2: 28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0: 27.9 gzip-2: 27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4: 27.2 flat-3: 27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8: 27.6 gzip-3: 25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7: 26.5 The final column is the average. (Standard deviation is below 0.1 sec, less than 0.3%.) flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1, gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of flat-2/gzip-2. note that gzip-1 is ~40% slower due to the 'sparse layout', so its results approximate a repository with 'bad' file layout. I'd not expect GIT repositories to have such a layout normally, so we can disregard it. flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results show that the on-disk layout cannot be constructed reliably - there's a 1% systematic difference between flat-2 and flat-3, and a 3% systematic difference between gzip-2 and gzip-3 - both systematic errors are larger than the 0.5% standard deviation, so they are not measurement errors but real layout properties of these trees. the most interesting result is that gzip-2 is 2.5% faster than flat-2, and gzip-3 is 4% faster than flat-3. These differences are close to the layout-related systematic error, but slightly above it, so i'd conclude that a compressed repository is 3% faster on this hardware. (since these results were in line with my expectations i double-checked everything again and did another 10 reboot tests - same results.) conclusion [*]: there's a negligible cache-cold performance hit from using an uncompressed repository, because cache-cold performance is dominated by number of seeks, which is almost identical in the two cases. Ingo [*] lots of conditionals apply: these werent flat/compressed GIT repositories (although they were quite similar to it), nor was the GIT workload measured (although the one measured should be quite close to it). - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Index/hash order
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > > I've run a few tests, just to get a few numbers of the overhead > > involved. I used the last ~8,000 changesets from the BKCVS kernel > > repository. With cold cache, a checkout from cold cache takes about > > 250 seconds on my laptop. I don't have git numbers, but a mere copy > > of the kernel tree needs 40 seconds. > > I will bet you that a git checkout is _faster_ than a kernel source > tree copy. The time will be dominated by the IO costs (in particular > the read costs), and the IO costs are lower thanks to compression. So > I think that the cold-cache case will beat your 40 seconds by a clear > margin. It generally compresses to half the size, so 20 seconds is not > impossible (although seek costs would tend to stay constant, so I'd > expect it to be somewhere in between the two). i'd be surprised if it was twice as fast - cache-cold linear checkouts are _seek_ limited, and it doesnt matter whether after a 1-2 msec track-to-track disk seek the DMA engine spends another 30 microseconds DMA-ing 60K uncompressed data instead of 30K compressed... (there are other factors, but this is the main thing.) Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html