'git describe' is very slow on development trees with lots of commits

2012-10-27 Thread Ingo Molnar

(Cc:-ed the Git development list.)

* David Ahern  wrote:

> PERF-VERSION-GEN and specifically the git commands are the 
> cause of more delay than the config checks, especially when 
> doing the build in a VM with the kernel source on an NFS 
> mount.

Yes, I have noticed that too.

So, the problem is that we use 'git describe' on the kernel tree 
to generate the version string, which is very, very slow if we 
are far away from any tagged release - which is the case for the 
-tip tree:

 comet:~/tip> perf stat --null --repeat 3 git describe
 v3.7-rc2-2007-g83e8223
 v3.7-rc2-2007-g83e8223
 v3.7-rc2-2007-g83e8223

'git describe' is much faster if we are on or near to a tag:

 $ git checkout v3.6
 $ perf stat --null --repeat 3 git describe
 v3.6
 v3.6
 v3.6

 Performance counter stats for 'git describe' (3 runs):

   0.020171640 seconds time elapsed 
 ( +-  3.64% )

 $ git checkout b34e5f55a1e6

 $ perf stat --null --repeat 3 git describe
 v3.6-41-gb34e5f5
 v3.6-41-gb34e5f5
 v3.6-41-gb34e5f5

 Performance counter stats for 'git describe' (3 runs):

   0.155603676 seconds time elapsed 
 ( +-  0.23% )

The cost on this pretty fast machine is about 1 msecs per commit 
- which adds up to about 2.5 seconds during much of the 
development cycle.

So maybe we should be using a different version string, for 
example, instead of:

 v3.7-rc2-2007-g83e8223

this would be perfectly fine:

 v3.7-rc2-g83e8223

the 'commit count' is informative but not essential - and in 
counting the number of off-tag commits is where much of the 
overhead is:

#
# Overhead  Command   Shared Object  
Symbol
#   ...  ..  
..
#
39.79%  git  libz.so.1.2.5   [.] 0xc1fe 
   
26.39%  git  libz.so.1.2.5   [.] inflate
   
22.42%  git  git [.] 0x0009bd1e 
   
 2.99%  git  libz.so.1.2.5   [.] adler32
   
 1.23%  git  libc-2.15.so[.] _int_malloc
   
 0.72%  git  libc-2.15.so[.] __GI_strtoull_l_internal   
   
 0.67%  git  libc-2.15.so[.] _int_free  
   
 0.62%  git  libc-2.15.so[.] malloc_consolidate 
   
 0.54%  git  [kernel.kallsyms]   [k] clear_page_c   
   
 0.32%  git  [kernel.kallsyms]   [k] page_fault 
   

So by switching to the shorter version string that still embedds 
the tag and the exact sha1 we'd be able to run this script a 
*lot* faster.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] tile: support GENERIC_KERNEL_THREAD and GENERIC_KERNEL_EXECVE

2012-10-23 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Wed, Oct 24, 2012 at 12:25 AM, Thomas Gleixner  wrote:
> >>
> >> It is spelled:
> >>
> >>   git notes add -m  SHA1
> >
> > Cool!
> 
> Don't use them for anything global.
> 
> Use them for local codeflow, but don't expect them to be 
> distributed. It's a separate "flow", and while it *can* be 
> distributed, it's not going to be for the kernel, for example. 
> So no, don't start using this to ack things, because the acks 
> *will* get lost.

I'd also add a small meta argument: that it would be actively 
wrong to *allow* 'belated' acks to be added. In practice acks 
are most useful *before* a commit gets created and they often 
have a mostly buerocratic role afterwards.

So we should encourage timely acks (which actually help 
development), and accept ack-less patches as long as they are 
correct and create no problems. More utility, less buerocracy. 
Incorrect, ack-less patches causing problems will get all the 
flames they deserve.

Thanks,

Ingo
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qgit-0.7

2005-07-10 Thread Ingo Molnar

* Marco Costalba <[EMAIL PROTECTED]> wrote:

> So I cannot reproduce the bug. [...]

weird - i cannot reproduce it either anymore, and annotate works now as 
advertised - it's fast and accurate as far as i checked. But i synced to 
the latest tree meanwhile. Perhaps i had an inconsistent tree?

(i'll keep an eye on this, i've uncommented that printout, so if it ever 
happens again i'll have the data.)

now that everything is working fine, may i suggest improvements? :-) 
Firstly, now i'm listed as the author for most portions of sched.c, 
which is accurate for a fair portion of that, but is only done by qgit 
because i happened to be the author of the first commit. So it would be 
more accurate to denote version 1's author as empty (or with some other, 
nonintrusive string that shows that this file came here due to the 
initial commit)? We dont know the full history yet, because the current 
DB's history starts at 2.6.12-rc2, with a full sched.c file. (I think an 
empty author field would reflect version #1's authorship most 
accurately, and would be the visually least intrusive.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: qgit-0.7

2005-07-10 Thread Ingo Molnar

* Marco Costalba <[EMAIL PROTECTED]> wrote:

> Here is qgit-0.7, a GUI git viewer.
> 
> you can download from:
> 
> http://prdownloads.sourceforge.net/qgit/qgit-0.7.tar.gz?download
> 
> 
> This time a small changelog, but a lot of work ;-)
> 
> - rewrite of graph drawing
> - start-up loading: switch to use git-rev-list --topo-order
> - final fixes to annotation
> - cache of file lists to speed-up loading of file names
> - added color background on heads

the good news: it's really fast now and very usable for browsing 
changes.  Kudos!

the bad news: except for annotations. I started qgit in the current 
kernel GIT repository, and clicked on the following commit:

  5bbcfd9000887c0da7d57cc7b3ac869fc0dd5aa9

then i clicked on sched.c to see the annotated file. Firstly, it took 
roughly 2 minutes (!) for the annotated sched.c to show up. All the qgit 
windows were fully frozen during that time, no refreshes or anything.  
My kernel tree was fully cached in RAM, so it was pure CPU overhead 
(qgit was taking 99% of CPU time). It is clearly not usable in this 
form.

then the annotations were plain wrong. Almost all lines are attributed 
to Tony Luck, while much of the file comes from the initial repository.  
So something's quite fishy here. Also, a number of lines were attributed 
to 'merge', which isnt very informative.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-21 Thread Ingo Molnar

* Olivier Andrieu <[EMAIL PROTECTED]> wrote:

>  > - naming the boxes by key is quite meaningless. It would be more 
>  >   informative to see the author's email shortcuts in the boxes. Also, it 
>  >   would be nice to see some simple graphical feedback about the size and 
>  >   scope of a changeset, without having to zoom on it.
> 
> That's interesting. What do you mean exactly by scope ?

usually there are two interesting things about a patchset: how many 
lines does it change, and how many files. Patches that change lots of 
files (but change only a couple of lines in every file) have a 'large' 
scope.  Patches that change 1-2 files have a 'small' scope. A pure 
'number of lines changed' metric is useful too. Generally patches that 
have either a large linecount or a large scope are more interesting.  

(I'm not sure how this could be displayed - perhaps the size of the 
rectangle could vary to a certain degree? Perhaps the shape too?  
Something non-numeric, so that one gets immediate visual feedback.)

>  > i guess you know it, and i'm definitely not complaining about prototype 
>  > code, but rendering is quite slow: drawing the 340 changesets in the 
>  > current kernel repository takes 15 seconds on a 2 GHz P4. Drawing the 
>  > full kernel history (63,000 changesets) would take more than 45 minutes 
>  > on this box.
>  > 
>  > the current rate of kernel development is ~2000 changesets per month, so 
>  > drawing the kernel history will get 3 seconds slower every day - it will 
>  > exceed 1 minute in 20 days, so this will become a pressing issue quite 
>  > soon i suspect.
> 
> Right, it is slow. From what I could understand with a bit of 
> profiling, the problem is with the "text" canvas item for the boxes' 
> labels. I guess libgnomecanvas isn't using Pango properly or 
> something: it lookups the font with fontconfig each time I create such 
> an item. I'm not sure what I can do about this.

when the redrawing happens in the visible area then one can see how 
really slow it is: 100-200 msec per rectangle (!).

> It works with Petr Baudis' git-pasky (it calls `git diff'). I don't 
> know how to do that with the canonical git.

ah, ok. I guess it will start working once Petr's changes are merged 
into Linus' tree.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-21 Thread Ingo Molnar

is the 'diff with ancestor' feature supposed to work at this early 
stage? (it just does nothing when i click on it. It correctly offers two 
ancestors for merge points, but does nothing there either.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-21 Thread Ingo Molnar

another thing, when i 'zoom out' of the graph far away (so that the 
whole graph becomes visible on the screen), i'm getting lots of such 
error messages:

 *** attempt to put segment in horiz list twice
 *** attempt to put segment in horiz list twice
 *** attempt to put segment in horiz list twice
 *** attempt to put segment in horiz list twice
 *** attempt to put segment in horiz list twice

this doesnt seem to impact anything though.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-21 Thread Ingo Molnar

* Olivier Andrieu <[EMAIL PROTECTED]> wrote:

> Yes, git-viz uses the `dot' program from the graphviz package (it's in 
> Fedora Extra too I believe).

ah - that resolved all issues and i'm now running git-viz without any 
problems.

I just checked how the kernel repository looks like with it, and i'm 
impressed! The GUI is top-notch, and the whole graph output and 
navigation is very mature visually. Kudos!

a couple of suggestions that are in the 'taste' category:

- isnt left-to-right layout the more natural thing instead of top-down 
  (as it aligns with the reading direction)? It's selectable in the 
  preferences, but you might want to make it default. OTOH, top-down
  creates a more compressed view of the graph.

- there doesnt seem to be any performance difference between non-colored 
  and colored rendering - so you might as well want to make 'color by 
  author' (or color by branch) the default coloring, instead of 
  uncolored?

- naming the boxes by key is quite meaningless. It would be more 
  informative to see the author's email shortcuts in the boxes. Also, it 
  would be nice to see some simple graphical feedback about the size and 
  scope of a changeset, without having to zoom on it.

i guess you know it, and i'm definitely not complaining about prototype 
code, but rendering is quite slow: drawing the 340 changesets in the 
current kernel repository takes 15 seconds on a 2 GHz P4. Drawing the 
full kernel history (63,000 changesets) would take more than 45 minutes 
on this box.

the current rate of kernel development is ~2000 changesets per month, so 
drawing the kernel history will get 3 seconds slower every day - it will 
exceed 1 minute in 20 days, so this will become a pressing issue quite 
soon i suspect.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-21 Thread Ingo Molnar

* Olivier Andrieu <[EMAIL PROTECTED]> wrote:

>  >  Preprocessor error
>  >  make: *** [viz_style.cmx] Error 2
> 
> That's probably because the configure script didn't find camlp4. 
> Camlp4 is a preprocessor for ocaml, it's needed for compiling this 
> file (viz_style.ml). Camlp4 is built with the ocaml compilers but some 
> package it separately. Try to find and install some ocaml-camlp4 (or 
> camlp4) package and then re-run configure.

ah, ok. I installed camlp4-3.08.3-1, and this also solved the other 
build problem as well that looked like to be a PATH problem.

when i run git-viz in a git-controlled directory, it seems to start up 
fine, but i get an error message: "Could not execute dot". Closing that 
window gives me the ability to do an 'Open', but git-viz does not seem 
to recognize any of my GIT repositories as such. (perhaps there's some 
GIT version dependency? I've got Linus' latest & greatest installed.)
 
> The configure script should signal an error when it doesn't find 
> camlp4, I'll change that.

fyi, it also didnt check for ocaml and lablgtk.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-21 Thread Ingo Molnar

* Olivier Andrieu <[EMAIL PROTECTED]> wrote:

> There, here's a tarball :
>   http://oandrieu.nerim.net/monotone-viz/git-viz-0.1.tar.gz

i'm trying to build it under Fedora Core 4 (devel), and there are two 
problems:

 - the build scripts seem to assume that "." is in PATH (or that the 
   needed viz_style.ml/etc. scripts are in the PATH).

adding "." to the PATH worked around this issue, a number of files built 
fine, but then it hit the next problem:

 ocamlopt.opt  -I /usr/lib/ocaml/lablgtk2 -I glib -I crypto -pp  -c viz_style.ml
 sh: - : invalid option
 Usage:  sh [GNU long option] [option] ...
 sh [GNU long option] [option] script-file ...
 GNU long options:
 [...]
 Preprocessor error
 make: *** [viz_style.cmx] Error 2

i've straced the build, and the bug comes from ocamlopt.opt executing:

  sh -c "-c 'viz_style.ml'"

which confuses the shell.

i've got ocaml-3.08.3-1, lablgtk-2.4.0-2 installed.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [ANNOUNCE] git-pasky-0.6.2 && heads-up on upcoming changes

2005-04-21 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Yeah, yeah, it looks different from "cvs update", but dammit, wouldn't 
> it be cool to just write "cg-" and see the command choices?  
> Or "cg-up" and get cg-update done for you..

add this line to your ~/.bashrc:

 complete -W "add addremote apply cancel ci commit diff export fork help init 
log ls lsobj lsremote merge pull rm seek status tag track version" git

and you'll get all the argument completions, after "git " (it works on 
arguments, after the space). (you can even customize it to list only the 
ones you typically use, to make the completion faster) This first showed 
up in zsh but now bash knows it too. (see 'Programmable Completions' in 
man bash)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change "pull" to _only_ download, and "git update"=pull+merge?

2005-04-20 Thread Ingo Molnar

* Petr Baudis <[EMAIL PROTECTED]> wrote:

> Dear diary, on Wed, Apr 20, 2005 at 09:01:57AM CEST, I got a letter
> where Ingo Molnar <[EMAIL PROTECTED]> told me that...
> >  [...]
> >  fatal: unable to execute 'gitmerge-file.sh'
> >  fatal: merge program failed
> 
> Pure stupidity of mine, I forgot to add gitmerge-file.sh to the list of
> scripts which get installed.

another thing is this annoying message:

 rsync: link_stat "/linux/kernel/people/torvalds/git.git/tags" (in pub) 
 failed: No such file or directory (2)
 rsync error: some files could not be transferred (code 23) at 
 main.c(812)
 client: nothing to do: perhaps you need to specify some filenames or 
 the --recursive option?

you said before that it's "harmless", but it's annoying nevertheless as 
one doesnt know for sure whether the pull went fine.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change "pull" to _only_ download, and "git update"=pull+merge?

2005-04-20 Thread Ingo Molnar

* Petr Baudis <[EMAIL PROTECTED]> wrote:

> > yet another thing: what is the canonical 'pasky way' of simply nuking 
> > the current files and checking out the latest tree (according to 
> > .git/HEAD). Right now i'm using a script to:
> > 
> >   read-tree $(tree-id $(cat .git/HEAD))
> >   checkout-cache -a
> > 
> > (i first do an 'rm -f *' in the working directory)
> > 
> > i guess there's an existing command for this already?
> 
> git cancel

hm, that's a pretty unintuitive name though. How about making it 'git 
checkout' and providing a 'git checkout -f' option to force the 
checkout? (or something like this)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-viz tool for visualising commit trees

2005-04-20 Thread Ingo Molnar

* Petr Baudis <[EMAIL PROTECTED]> wrote:

>   Hi,
> 
>   just FYI, Olivier Andrieu was kind enough to port his monotone-viz 
> tool to git (http://oandrieu.nerim.net/monotone-viz/ - use the one 
> from the monotone repository). The tool visualizes the history flow 
> nicely; see
> 
>   http://rover.dkm.cz/~pasky/gitviz1.png
>   http://rover.dkm.cz/~pasky/gitviz2.png
>   http://rover.dkm.cz/~pasky/gitviz3.png
>   http://rover.dkm.cz/~pasky/gitviz4.png
>   http://rover.dkm.cz/~pasky/gitviz5.png
>   http://rover.dkm.cz/~pasky/gitviz6.png
>   http://rover.dkm.cz/~pasky/gitviz7.png
> 
> for some screenshots.

really nice stuff! Any plans to include it in git-pasky, via 'git gui' 
option or so? Also, which particular version has this included - the 
freshest tarball on the monotone-viz download site doesnt seem to 
include it.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> So to convert your old git setup to a new git setup, do the following:
> [...]

did this for two repositories (git and kernel-git), it works as 
advertised.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: enforcing DB immutability

2005-04-20 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> > perhaps having a new 'immutable hardlink' feature in the Linux VFS 
> > would help? I.e. a hardlink that can only be readonly followed, and 
> > can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
> > think would be a large enough barrier for editors/build-tools not to 
> > play the tricks they already do that makes 'readonly' files virtually 
> > meaningless.
> 
> immutable hardlinks have the following advantage: a hardlink by design 
> hides the information where the link comes from. So even if an editor 
> wanted to play stupid games and override the immutability - it doesnt 
> know where the DB object is. (sure, it could find it if it wants to, 
> but that needs real messing around - editors wont do _that_)

so the only sensible thing the editor/tool can do when it wants to 
change the file is precisely what we want: it will copy the hardlinked 
files's contents to a new file, and will replace the old file with the 
new file - a copy on write. No accidental corruption of the DB's 
contents.

(another in-kernel VFS solution would be to enforce that the files's 
name always matches the sha1 hash. So if someone edits a DB object it 
will automatically change its name. But this is complex, probably cannot 
be done atomically, and brings up other problems as well.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: enforcing DB immutability

2005-04-20 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> perhaps having a new 'immutable hardlink' feature in the Linux VFS 
> would help? I.e. a hardlink that can only be readonly followed, and 
> can be removed, but cannot be chmod-ed to a writeable hardlink. That i 
> think would be a large enough barrier for editors/build-tools not to 
> play the tricks they already do that makes 'readonly' files virtually 
> meaningless.

immutable hardlinks have the following advantage: a hardlink by design 
hides the information where the link comes from. So even if an editor 
wanted to play stupid games and override the immutability - it doesnt 
know where the DB object is. (sure, it could find it if it wants to, but 
that needs real messing around - editors wont do _that_)

i think this might work.

(the current chattr +i flag isnt quite what we need though because it 
works on the inode, and it's also a root-only feature so it puts us back 
to square one. What would be needed is an immutability flag on 
hardlinks, settable by unprivileged users.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


enforcing DB immutability

2005-04-20 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Wed, 13 Apr 2005, Ingo Molnar wrote:
> > 
> > well, the 'owned by another user' solution is valid though, and doesnt 
> > have this particular problem. (We've got a secure multiuser OS, so can 
> > as well use it to protect the DB against corruption.)
> 
> So now you need root to set up new repositories? No thanks.

yeah, it's a bit awkward to protect uncompressed repositories - but it 
will need some sort of kernel enforcement. (if userspace finds out the 
DB contains uncompressed blobs, it _will_ try to use them.)

(perhaps having an in-kernel GIT-alike versioned filesystem will help - 
but that brings up the same 'I have to be root' issues. The FS will 
enforce the true immutability of objects.)

perhaps having a new 'immutable hardlink' feature in the Linux VFS would 
help? I.e. a hardlink that can only be readonly followed, and can be 
removed, but cannot be chmod-ed to a writeable hardlink. That i think 
would be a large enough barrier for editors/build-tools not to play the 
tricks they already do that makes 'readonly' files virtually 
meaningless.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Change "pull" to _only_ download, and "git update"=pull+merge?

2005-04-20 Thread Ingo Molnar

* Petr Baudis <[EMAIL PROTECTED]> wrote:

> > I think pull is pull.  If you are doing lots of local stuff and do not
> > want it overwritten, it should have been in a forked branch.
> 
> I disagree. This already forces you to have two branches (one to pull 
> from to get the data, mirroring the remote branch, one for your real 
> work) uselessly and needlessly.
> 
> I think there is just no good name for what pull is doing now, and 
> update seems like a great name for what pull-and-merge really is. Pull 
> really is pull - it _pulls_ the data, while update also updates the 
> given tree. No surprises.

yeah. In fact most of the times i did 'git pull pasky' in the past, the 
'merge' phase was unsuccessful, and i had to nuke the tree and recreate 
it.  All i did with the snapshots was to build them, so there were no 
local changes. Waiting a couple of days with doing a 'git pull pasky', 
or installing Linus' tree is a sure way to break the merging.

e.g. to reproduce the last such failure i had today, do:

 cd git-pasky-base
 echo 8568e1a88c086d1b72b0e84ab24fa6888b5861b9 > .git/HEAD
 read-tree $(tree-id $(cat .git/HEAD))
 checkout-cache -a -f
 make
 make install   # make sure to use the older tools
 rm -rf .git/objects
 git pull pasky

and i get:

 [...]
 fatal: unable to execute 'gitmerge-file.sh'
 fatal: merge program failed

Conflicts during merge. Do git commit after resolving them.

note that with earlier versions of pasky, i had other merge conflicts.  
Sometimes there were .rej files, sometimes some sort of script failure.  
So it seems rather unrobust at the moment. Especially if i happen to 
install Linus' tree and try to sync the pasky tree with those tools.

another thing: it's confusing that during 'git pull', the rsync output 
is not visible. Especially during large rsyncs, it would be nice to see 
some progress. So i usually use a raw rsync not 'git pull', due to this.

yet another thing: what is the canonical 'pasky way' of simply nuking 
the current files and checking out the latest tree (according to 
.git/HEAD). Right now i'm using a script to:

  read-tree $(tree-id $(cat .git/HEAD))
  checkout-cache -a

(i first do an 'rm -f *' in the working directory)

i guess there's an existing command for this already?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [script] ge: export commits as patches

2005-04-19 Thread Ingo Molnar

* Petr Baudis <[EMAIL PROTECTED]> wrote:

> Dear diary, on Tue, Apr 19, 2005 at 03:48:43PM CEST, I got a letter
> where Ingo Molnar <[EMAIL PROTECTED]> told me that...
> > is there any 'export commit as patch' support in git-pasky? I didnt find 
> > any such command (maybe it got added meanwhile), so i'm using the 'ge' 
> > hack below.
> > 
> > e.g. i typically look at commits via 'git log', and then when i see 
> > something interesting, i look at the commit via the 'ge' script. E.g.  
> > "ge 834f6209b22af2941a8640f1e32b0f123c833061" done in the kernel tree 
> > will output a particular commit's header and the patch.
> 
> Nice idea. I will add it, probably as 'git patch'.
> 
> > TREE1=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^tree | cut -d' ' 
> > -f2)
> > if [ "$TREE1" = "" ]; then echo 'ge '; exit -1; fi
> > PARENT=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^parent | cut -d' 
> > ' -f2)
> > if [ "$PARENT" = "" ]; then echo 'ge '; exit -1; fi
> > TREE2=$(cat-file commit 2>/dev/null $PARENT | head -4 | grep ^tree | cut 
> > -d' ' -f2)
> > if [ "$TREE2" = "" ]; then echo 'ge '; exit -1; fi
> 
> commit-id and parent-id tools might be useful. ;-)

find a cleaned up 'ge' script below.

and please fix gitXnormid.sh to simply echo nothing and return with a -1 
exit value when a nonsensical ID is passed to it. Right now the output 
is quite ugly if you do 'ge blah'.

Ingo

#!/bin/bash

usage ()
{
 echo 'usage: ge '
 exit -1
}

if [ $# != 1 ]; then
 usage
fi

ME=$(commit-id $1);  [ "$ME" = "" ] && usage
PARENT=$(parent-id $ME); [ "$PARENT" = "" ] && usage
TREE1= $(tree-id   $ME); [ "$TREE1"  = "" ] && usage
TREE2= $(tree-id   $PARENT); [ "$TREE2"  = "" ] && usage

cat-file commit $ME
echo
git diff -r $TREE2:$TREE1

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: missing: git api, reference, user manual and mission statement

2005-04-19 Thread Ingo Molnar

* Kevin Smith <[EMAIL PROTECTED]> wrote:

> Klaus Robert Suetterlin wrote:
> > 1) There is no clear (e.g. by name) distinction between ``git as done
> > by Linus'', which is a kind of content addressable database with added
> > semantics, and ``git as done by the rest of You'', which is a kind of
> > SCM on top of Linuses stuff.
> 
> I also see this as one of the biggest obstacles right now. It would be 
> very helpful if we could achieve the clear separation between git and 
> non-git that has been part of the design since the beginning.
> 
> Git is very immature, and currently should only be used by brave 
> pioneers. About the only way for a mortal to even try git is to stick 
> to git-pasky releases, and not try to track all the patches flying 
> around.

hey, it's a 2 weeks old project, but it's certainly one of the 
fastest-growing projects i've ever seen: it has so much steam that it's 
scary :) It seems that a true emergency focused a massive, spontaneous 
concentration of OSS development power.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: naive question

2005-04-19 Thread Ingo Molnar

* David Woodhouse <[EMAIL PROTECTED]> wrote:

> On Tue, 2005-04-19 at 23:00 +1000, Paul Mackerras wrote:
> > Is there a way to check out a tree without changing the mtime of any
> > files that you have already checked out and which are the same as the
> > version you are checking out?  It seems that checkout-cache -a doesn't
> > overwrite any existing files, and checkout-cache -f -a overwrites all
> > files and gives them the current mtime.  This is a pain if you are
> > using make and your tree is large (like, for instance, the linux
> > kernel :), because it means that after a checkout-cache -f -a you get
> > to recompile everything.
> 
> Corollary: why aren't we storing mtime in the tree objects?

Check the "[bug] git: check-files mtime problem?" thread - i noticed 
this problem before and gave a few suggestions but the discussion got 
nowhere. But the problem is still very much present.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[script] ge: export commits as patches

2005-04-19 Thread Ingo Molnar

is there any 'export commit as patch' support in git-pasky? I didnt find 
any such command (maybe it got added meanwhile), so i'm using the 'ge' 
hack below.

e.g. i typically look at commits via 'git log', and then when i see 
something interesting, i look at the commit via the 'ge' script. E.g.  
"ge 834f6209b22af2941a8640f1e32b0f123c833061" done in the kernel tree 
will output a particular commit's header and the patch.

Ingo

#!/bin/bash

if [ $# != 1 ]; then
 echo 'ge '
 exit -1
fi
TREE1=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^tree | cut -d' ' -f2)
if [ "$TREE1" = "" ]; then echo 'ge '; exit -1; fi
PARENT=$(cat-file commit 2>/dev/null $1 | head -4 | grep ^parent | cut -d' ' 
-f2)
if [ "$PARENT" = "" ]; then echo 'ge '; exit -1; fi
TREE2=$(cat-file commit 2>/dev/null $PARENT | head -4 | grep ^tree | cut -d' ' 
-f2)
if [ "$TREE2" = "" ]; then echo 'ge '; exit -1; fi

cat-file commit $1
echo
git diff -r $TREE2:$TREE1

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] git: fix 1-byte overflow in show-files.c

2005-04-18 Thread Ingo Molnar

* Petr Baudis <[EMAIL PROTECTED]> wrote:

> > will attempt to append a "/" string to the directory name - resulting in 
> > a 1-byte overflow (a zero byte is written to offset 4097, which is 
> > outside the array).
> 
> The name ends precisely at offset 4095 with its NUL character:
> 
>  {PATH_MAX}
>  Maximum number of bytes in a pathname, including the terminating
> null character.
> [ http://www.opengroup.org/onlinepubs/009695399/basedefs/limits.h.html ]
> 
> So, if I'm not mistaken, '/' will be written at offset 4095 instead of 
> the NUL and the NUL will be written at 4096. Everything's fine, right?

yeah, you are right - ignore this patch.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-18 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> On Sun, 17 Apr 2005, Ingo Molnar wrote:
> > 
> > in fact, this attack cannot even be proven to be malicious, purely via 
> > the email from Malice: it could be incredible bad luck that caused that 
> > good-looking patch to be mistakenly matching a dangerous object.
> 
> I really hate theoretical discussions.

i was only replying to your earlier point:

> > > Almost all attacks on sha1 will depend on _replacing_ a file with 
> > > a bogus new one. So guys, instead of using sha256 or going 
> > > overboard, just make sure that when you synchronize, you NEVER 
> > > import a file you already have.

which point i still believe is subtly wrong. You were suggesting to 
concentrate on file replacement to counter most of the practical 
attacks, while i pointed out an attack _using the same basic mechanism 
that your point above supposed_.

[ if you can replace a file with a known hash, with a bogus new one, and 
  you still have enough control over the contents of your bogus new file 
  that it is 1) a valid file that builds 2) compromises the kernel, then 
  you likely have the same amount of control my 'theoretical' attack
  requires. ]

> And the thing is, _if_ somebody finds a way to make sha1 act as just a 
> complex parity bit, and comes up with generating a clashing object 
> that actually makes sense, then going to sha256 is likely pointless 
> too [...]

yes, that's why i suggested to not actually trust the hash to be 
cryptographically secure, but to just assume it's a good generic hash we 
can design a DB around, and to turn -DCOLLISION_CHECK on and enforce 
consistency rules on boundaries.

[ it's not bad to keep sha1 because even my suggested enhancement still
  leaves 'content-less trust-pointers to untrusted content via email'
  vectors open against attack (maintainer sends you an email that commit
  X in Malice's repository Y is fine to pull, and you pull it blindly,
  while the attacker has replaced his content with the compromised one
  meanwhile), but it at least validates the bulk traffic that goes into
  the DB: patches via emails and trusted repositories. ]

so all i was suggesting was to extend your suggested 'overwrite 
collision check' to a stricter 'content we throw away and use the sha1 
shortcut for needs to be checked against the in-DB content as well'.

in other words, your suggested 'rename check' is checking for 'positive 
duplicate content', while my addition would also check for 'negative 
duplicate content' as well.

but as usual, i could be wrong, so dont take this too serious :-)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-17 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> The compromise relies on you having reviewed something harmless, while 
> in reality what happened within the DB was far less harmless. And the 
> DB remains self-consistent: neither fsck, nor others importing your 
> tree will be able to detect the compromise. This attack can only be 
> detected when you apply the patch, after that point all the 
> information (except Malice's message in your inbox) is gone.

in fact, this attack cannot even be proven to be malicious, purely via 
the email from Malice: it could be incredible bad luck that caused that 
good-looking patch to be mistakenly matching a dangerous object.

In fact this could happen even today, _accidentally_. (but i'm willing 
to bet that hell will be freezing over first, and i'll have some really 
good odds ;) There's probably a much higher likelyhood of Linus' tree 
getting corrupted in some old fashioned way and introducing a security 
hole by accident)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-17 Thread Ingo Molnar

* Brad Roberts <[EMAIL PROTECTED]> wrote:

> While I agree that a hash collision is bad and certainly worth 
> preventing during new object creation, for it to actually implant a 
> trojan in a build successfully it'd have to meet even more criteria 
> than you've layed out. It'd have to...

>   - provide all the public symbols the shadowed object provided so that it
> would still build and link successfully

that's not a problem. Most modules dont provide public symbols.  
Especially not drivers. Generally it's the modules that _dont_ have any 
global impact that get reviewed less stringently - an attacker would 
thus choose them for psychological reasons anyway.

>   - be shadowing an object that's part of an active tree
>
> Shadowing an object that's not part of the working tree means 
> something on another branch or obsoleted some time in the past is 
> still db corruption, but not nearly as big an issue from a trojan 
> standpoint.

it's not DB corruption, it's a feature of GIT: it's a content _cache_, 
new and old alike. Nothing in GIT says that old objects in the 
repository (which are still very much part of history) cannot be revived 
in newer trees. (in fact it regularly happens - e.g. if a fix is undone 
manually.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-17 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> Almost all attacks on sha1 will depend on _replacing_ a file with a 
> bogus new one. So guys, instead of using sha256 or going overboard, 
> just make sure that when you synchronize, you NEVER import a file you 
> already have.

here is a bit complex, but still practical attack that doesnt rely on 
replacement and which can only be detected if we check the sha1 
uniqueness assumptions.

If you can generate a duplicate sha1 key for an arbitrary 'target' file, 
and Malice sends you a GIT-generated patch that introduces a new file 
(which doesnt exist in the current tree) which you review (in the email) 
and which looks safe to apply & harmless. Maybe the patch has a bit 
weird formatting and some weird comments (which in reality Malice used 
to generate the proper sha1 key) but otherwise the patch is for some 
seldom used arcane driver that no-one used for quite some time and 
no-one really cares about, so you are happy to apply the patch.

The compromise occurs when you apply the patch: the seemingly harmless 
patch has an sha1 key that Malice manufacured to match that of an 
already existing, 'dangerous' object in your database.

With tens of thousands (or hundreds of thousands) of objects expected in 
the repository sooner or later, there's quite a selection to pick from.  
Once you apply the patch, instead of the expected new file that you 
reviewed and found safe, the attacker has the other object included in 
the official kernel.

A dangerous object can be anything: e.g. a debugging hack that allows 
arbitrary kernel-space writes. Or a known-insecure module (which since 
then got fixed, but the buggy code still exists in the DB). The module 
is in a single file and is self-installing (e.g. it has __init code to 
register itself as some driver.)

Malice might even previously plant a dangerous object as some 'firmware 
module' in another arcane driver, which doesnt get compiled by default, 
but still shows up in the DB. Or Malice might plant a dangerous object 
via an innocent-looking documentation file.  (which contains some sample 
code and is called sample.txt)

this type of 'false sharing attack' can only be prevented if an object 
is only 'shared' with another object if it has been memcmp-ed with the 
object in the repository. I.e. if we trust the sharing decision! Once 
the attack has occured it cannot be detected automatically: only people 
will notice it. (why did that weird unrelated module show up in that old 
driver?)

The compromise relies on you having reviewed something harmless, while 
in reality what happened within the DB was far less harmless. And the DB 
remains self-consistent: neither fsck, nor others importing your tree 
will be able to detect the compromise. This attack can only be detected 
when you apply the patch, after that point all the information (except 
Malice's message in your inbox) is gone.

so unless we actively check for collisions, once an sha1 key can be 
generated at will on near-arbitrary input, it's not a secure system 
anymore. We might be lucky and safe, but we wont be secure.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Ingo Molnar

* Daniel Barkalow <[EMAIL PROTECTED]> wrote:

> Still leaks a bit of memory due to bug copied from read-tree.

Linus, should i resend the 18 fixes i sent the other day? (as a GIT 
repository perhaps?) I found roughly 6 common memory leaks, 8 
theoretical memory leaks, 2 overflows and did a couple of cleanups. One 
of the patches [the cache collision related thing] we agreed was not 
needed, the rest is still very much valid i think. I did some basic 
testing with the fixes applied, nothing seemed to break in any visible 
way in these tests.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
> > script that will apply all the patches in order and will create a 
> > pristine 2.6.12-rc2 tree.
> 
> Hey, that's great. I got the CVS repo too, and I was looking at it, 
> but the more I looked at it, the more I felt that the main reason I 
> want to import it into git ends up being to validate that my size 
> estimates are at all realistic.
> 
> I see that Thomas Gleixner seems to have done that already, and come 
> to a figure of 3.2GB for the last three years, which I'm very happy 
> with, mainly because it seems to match my estimates to a tee. [...]

(yeah, we apparently worked in parallel - i only learned about his 
efforts after i sent my mail. He was using BK to extract info, i was 
using the CVS tree alone and no BK code whatsoever. (I dont think there 
will be any argument about who owns what, but i wanted to be on the safe 
side, and i also wanted to see how complete and usable the CVS metadata 
is - it's close to perfect i'd say, for the purposes i care about.))

> But I wonder if we actually want to actually populate the whole 
> history..

yeah, it definitely feels a bit brave to import 28,000 changesets into a 
source-code database project that will be a whopping 2 weeks old in 2 
days ;) Even if we felt 100% confident about all the basics (which we do 
of course ;), it's just simply too young to tie things down via a 3.2GB 
database. It feels much more natural to grow it gradually, 28,000 
changesets i'm afraid would just suffocate the 'project growth 
dynamics'. Not going too fast is just as important as not going too 
slow.

I didnt generate the patchset to get it added into some central 
repository right now, i generated it to check that we _do_ have all the 
revision history in an easy to understand format which does generate 
today's kernel tree, so that we can lean back and worry about the full 
database once things get a bit more settled down (in a couple of months 
or so). It's also an easy testbed for GIT itself.

but the revision history was one of the main reasons i used BK myself, 
so we'll need a merged database eventually. Occasionally i needed to 
check who was the one who touched a particular piece of code - was that 
fantastic new line of code written by me, or was that buggy piece of 
crap written by someone else? ;) Also, looking at a change and then 
going to the changeset that did it, and then looking at the full picture 
was pretty useful too. So that sort of annotation, and generally 
navigating around _quickly_ and looking at the 'flow' of changes going 
into a particular file was really useful (for me).

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* David Mansfield <[EMAIL PROTECTED]> wrote:

> Ingo Molnar wrote:
> >* Ingo Molnar <[EMAIL PROTECTED]> wrote:
> >
> >
> >>the patches contain all the existing metadata, dates, log messages and 
> >>revision history. (What i think is missing is the BK tree merge 
> >>information, but i'm not sure we want/need to convert them to GIT.)
> >
> >
> >author names are abbreviated, e.g. 'viro' instead of 
> >[EMAIL PROTECTED], and no committer information is 
> >included (albeit commiter ought to be Linus in most cases). These are 
> >limitations of the BK->CVS gateway i think.
> >
> 
> Glad to hear cvsps made it through!  I'm curious what the manual 
> fixups required were, except for the binary file issue (logo.gif).

--cvs-direct was needed to speed it up from 'several days to finish' to 
'several hours to finish', but it crashed on a handful of patches [i 
used the latest devel snapshot so this isnt a complaint]. (one of the 
crashes was when generating 1860.patch.) Also, 'cvs rdiff' apparently 
emits an empty patch for diffs that remove a file that end without 
having a newline character - but this isnt cvsps's problem.  (grep for 
+++ in the patchset to find those cases.)

> As to the actual email addresses, for more recent patches, the 
> Signed-off should help.  For earlier ones, isn't their some script 
> which 'knows' a bunch of canonical author->email mappings? (the 
> shortlog script or something)?

yeah, that's not that much of a problem, most of the names are unique, 
and the rest can be fixed up too.

> Is the full committer email address actually in the changeset in BK?  
> If so, given that we have the unique id (immutable I believe) of the 
> changset, could it be extracted directly from BK?

i think it's included in BK.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> the patches contain all the existing metadata, dates, log messages and 
> revision history. (What i think is missing is the BK tree merge 
> information, but i'm not sure we want/need to convert them to GIT.)

author names are abbreviated, e.g. 'viro' instead of 
[EMAIL PROTECTED], and no committer information is 
included (albeit commiter ought to be Linus in most cases). These are 
limitations of the BK->CVS gateway i think.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: using git directory cache code in darcs?

2005-04-16 Thread Ingo Molnar

* David Roundy <[EMAIL PROTECTED]> wrote:

> 2) Will a license be chosen soon for git? Or has one been chosen, and 
> I missed it? I can't really include git code in darcs without a 
> license.  I'd prefer GPLv2 or later (since that's how darcs is 
> licensed), but as long as it's at least compabible with GPLv2, I'll be 
> all right.

there's a license in the latest code, it's GPLv2. Here's the COPYING 
file:

---
 Note that the only valid version of the GPL as far as this project
 is concerned is _this_ particular version of the license (ie v2, not
 v2.2 or v3.x or whatever), unless explicitly otherwise stated.

 HOWEVER, in order to allow a migration to GPLv3 if that seems like
 a good idea, I also ask that people involved with the project make
 their preferences known. In particular, if you trust me to make that
 decision, you might note so in your copyright message, ie something
 like

This file is licensed under the GPL v2, or a later version
at the discretion of Linus.

  might avoid issues. But we can also just decide to synchronize and
  contact all copyright holders on record if/when the occasion arises.

Linus Torvalds



GNU GENERAL PUBLIC LICENSE
   Version 2, June 1991

 Copyright (C) 1989, 1991 Free Software Foundation, Inc.
   59 Temple Place, Suite 330, Boston, MA  02111-1307  USA
 Everyone is permitted to copy and distribute verbatim copies
 of this license document, but changing it is not allowed.

Preamble

  The licenses for most software are designed to take away your
freedom to share and change it.  By contrast, the GNU General Public
License is intended to guarantee your freedom to share and change free
software--to make sure the software is free for all its users.  This
General Public License applies to most of the Free Software
Foundation's software and to any other program whose authors commit to
using it.  (Some other Free Software Foundation software is covered by
the GNU Library General Public License instead.)  You can apply it to
your programs, too.

  When we speak of free software, we are referring to freedom, not
price.  Our General Public Licenses are designed to make sure that you
have the freedom to distribute copies of free software (and charge for
this service if you wish), that you receive source code or can get it
if you want it, that you can change the software or use pieces of it
in new free programs; and that you know you can do these things.

  To protect your rights, we need to make restrictions that forbid
anyone to deny you these rights or to ask you to surrender the rights.
These restrictions translate to certain responsibilities for you if you
distribute copies of the software, or if you modify it.

  For example, if you distribute copies of such a program, whether
gratis or for a fee, you must give the recipients all the rights that
you have.  You must make sure that they, too, receive or can get the
source code.  And you must show them these terms so they know their
rights.

  We protect your rights with two steps: (1) copyright the software, and
(2) offer you this license which gives you legal permission to copy,
distribute and/or modify the software.

  Also, for each author's protection and ours, we want to make certain
that everyone understands that there is no warranty for this free
software.  If the software is modified by someone else and passed on, we
want its recipients to know that what they have is not the original, so
that any problems introduced by others will not reflect on the original
authors' reputations.

  Finally, any free program is threatened constantly by software
patents.  We wish to avoid the danger that redistributors of a free
program will individually obtain patent licenses, in effect making the
program proprietary.  To prevent this, we have made it clear that any
patent must be licensed for everyone's free use or not licensed at all.

  The precise terms and conditions for copying, distribution and
modification follow.

GNU GENERAL PUBLIC LICENSE
   TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION

  0. This License applies to any program or other work which contains
a notice placed by the copyright holder saying it may be distributed
under the terms of this General Public License.  The "Program", below,
refers to any such program or work, and a "work based on the Program"
means either the Program or any derivative work under copyright law:
that is to say, a work containing the Program or a portion of it,
either verbatim or with modifications and/or translated into another
language.  (Hereinafter, translation is included without limitation in
the term "modification".)  Each licensee is addressed as "you".

Activities other than copying, distribution and modification are not
covered by this License; they are outsid

full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

i've converted the Linux kernel CVS tree into 'flat patchset' format, 
which gave a series of 28237 separate patches. (Each patch represents a 
changeset, in the order they were applied. I've used the cvsps utility.)

the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
script that will apply all the patches in order and will create a 
pristine 2.6.12-rc2 tree.

it needed many hours to finish, on a very fast server with tons of RAM, 
and it also needed a fair amount of manual work to extract it and to 
make it usable, so i guessed others might want to use the end result as 
well, to try and generate large GIT repositories from them (or to run 
analysis over the patches, etc.).

the patches contain all the existing metadata, dates, log messages and 
revision history. (What i think is missing is the BK tree merge 
information, but i'm not sure we want/need to convert them to GIT.)

it's a 136 MB tarball, which can be downloaded from:

   http://kernel.org/pub/linux/kernel/people/mingo/Linux-2.6-patchset/

the ./generate-2.6.12-rc2 script generates the 2.6.12-rc2 tree into 
linux/, from scratch. (No pre-existing kernel is needed, as 2.patch 
generates the full 2.4.0 kernel tree.) The patching takes a couple of 
minutes to finish, on a fast box.

below i've attached a sample patch from the series.

note: i kept the patches the cvsps utility generated as-is, to have a 
verifiable base to work on. There were a very small amount of deltas 
missed (about a dozen), probably resulting from CVS related errors, 
these are included in the diff-CVS-to-real patch. Also, the patch format 
cannot create the Documentation/logo.gif file, so the script does this 
too - just to be able to generate a complete 2.6.12-rc2 tree that is 
byte-for-byte identical to the real thing.

Ingo

-
PatchSet 1234 
Date: 2002/04/11 18:29:07
Author: viro
Branch: HEAD
Tag: (none) 
Log:
[PATCH] crapectomy in include/linux/nfsd/syscall.h

Removes an atavism in declaration of sys_nfsservctl() - sorry, I should've
remove that junk when cond_syscall() thing was done.

BKrev: 3cb5c7e3phTYgiz1YLsjQ_McTo9pOQ

Members: 
ChangeSet:1.1234->1.1235 
include/linux/nfsd/syscall.h:1.3->1.4 

Index: linux/include/linux/nfsd/syscall.h
===
RCS file: /home/mingo/linux-CVS/linux/include/linux/nfsd/syscall.h,v
retrieving revision 1.3
retrieving revision 1.4
diff -u -r1.3 -r1.4
--- linux/include/linux/nfsd/syscall.h  15 Mar 2002 23:06:06 -  1.3
+++ linux/include/linux/nfsd/syscall.h  11 Apr 2002 17:29:07 -  1.4
@@ -132,11 +132,7 @@
 /*
  * Kernel syscall implementation.
  */
-#if defined(CONFIG_NFSD) || defined(CONFIG_NFSD_MODULE)
 extern asmlinkage long sys_nfsservctl(int, struct nfsctl_arg *, void *);
-#else
-#define sys_nfsservctl sys_ni_syscall
-#endif
 extern int exp_addclient(struct nfsctl_client *ncp);
 extern int exp_delclient(struct nfsctl_client *ncp);
 extern int exp_export(struct nfsctl_export *nxp);
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Ingo Molnar

* David Lang <[EMAIL PROTECTED]> wrote:

> this issue was raised a few days ago in the context of someone 
> tampering with the files and it was decided that the extra checks were 
> good enough to prevent this (at least for now), but what about 
> accidental collisions?
> 
> if I am understanding things right the objects get saved in the 
> filesystem in filenames that are the SHA1 hash. of two legitimate 
> files have the same hash I don't see any way for both of them to 
> exist.
> 
> yes the risk of any two files having the same has is low, but in the 
> earlier thread someone chimed in and said that they had two files on 
> their system that had the same hash..

you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision 
checking (disabled currently). If there indeed exist two files that have 
different content but the same hash, could someone send those two files?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-15 Thread Ingo Molnar

* David Woodhouse <[EMAIL PROTECTED]> wrote:

> On Fri, 2005-04-15 at 11:36 +0200, Ingo Molnar wrote:
> > do such cases occur frequently? In the kernel at least it's not too 
> > typical. 
> 
> Isn't it? I thought it was a fairly accurate representation of the 
> process "I make a whole bunch of changes to files I maintain, pulling 
> from Linus while occasionally asking him to pull from my tree. 
> Sometimes my files are changed by someone else in Linus' tree, and 
> sometimes I change files that I don't actually own.".

but the specific scenario you described would require _Linus'_ tree to 
be in limbo for a long time, and have uncommitted half-done edits. I.e.:

   (A1B2)--(A2B2)--(A2'B3)
/  \   /\
   /\ /  \
 (A1B1)  X   (...)
   \/ \  /
\  /   \/
   (A2B1)--(A2B2)--(A3B2')

in the above scenario Linus' tree needs to 'cross' with a maintainer's 
tree.  (maintainer's tree wont cross with another maintainer's tree, as 
maintainer-to-maintainer merges rare.)

but for the scenario to occur, i think there needs to be a prolongued 
"limbo" period in Linus' tree for a 'crossing' to happen. But Linus' 
merges are typically almost atomic: they are done then they are pushed 
out. It's definitely not in the 'days, sometimes weeks' timescale as 
maintainer trees are.

so for the scenario to occur, a maintainer, from whom Linus has just 
pulled an update and Linus is merging the tree manually without 
comitting, has to pull a file from the earlier Linus tree, and then 
Linus has to modify that same file again. This does not seem to be a 
common scenario.

so i think to avoid the scenario, maintainers should not pull from each 
other - they should only pull/push to/from Linus' tree. Maybe this is an 
unacceptable limitation?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-15 Thread Ingo Molnar

* David Woodhouse <[EMAIL PROTECTED]> wrote:

> Consider a simple repository which contains two files A and B. We 
> start off with the first version of each ('A1B1'), and the owner of 
> each file takes a branch and modifies their own file. There is 
> cross-pulling between the two, and then each modifies the _other's_ 
> file as well as their own...
> 
>(A1B2)--(A2B2)--(A2'B3)
> /  \   /\
>/\ /  \
>  (A1B1)  X   (...)
>\/ \  /
> \  /   \/
>(A2B1)--(A2B2)--(A3B2')
> 
> Now, we're trying to merge the two branches. It appears that the most 
> useful common ancestor to use for a three-way merge of file A is the 
> version from tree 'A2B1', while the most useful common ancestor for 
> merging file B is that in 'A1B2'.

do such cases occur frequently? In the kernel at least it's not too 
typical. Would it be a problem to go for the simple solution of using 
(A1B1) as the common ancestor (based on the tree graph), and then to do 
a 3-way merge of all changes from that point on?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: another perspective on renames.

2005-04-15 Thread Ingo Molnar

* Paul Jackson <[EMAIL PROTECTED]> wrote:

> Scott wrote:
> > Anyway, maybe it's worth thinking a little about an SCM in which this is a 
> > feature, instead of (or in addition to) automatically assuming this is a 
> > bug we need to add infrastructure to work around.
> 
> Agreed.
> 
> To me, the main purpose in tracking renames is to obtain a deeper
> history of the line-by-line changes in a file.
> 
>   ==> But that doesn't seem relevant here.
> 
> Last I looked, git has no such history.  A given file contents is the 
> indivisable atom of the git world, with no fine structure.
> 
> This is quite unlike classic SCM's, built on file formats that track 
> source lines, not files, as the atomic unit.

i believe the fundamental thing to think about is not file or line or 
namespace, but 'tracking developer intent'. While keeping in mind that 
GIT is not an SCM, all SCMs boil down to this single thing: being able 
to track what the developer did and why he did it - to be a useful tool 
later on. (SCMs are for humans with bad limitations, who have this 
fundamental design bug and keep forgetting things.)

the basic question is, how much to track. The most extreme form of 
tracking (just for the sake of visualizing it) would be to have an 
eye-position recognizing software attached to a webcam looking at the 
developer, and then exactly mapping what he did, how long did he look at 
one particular line of code and exactly what did he type when doing 
that. [ Perhaps also a thought-reader module in addition, once one is 
available. (combined with another module that removes all the swearing)]

but i think Linus is on the right track to suggest that "the file names 
dont matter all that much, it's all about the content". Global diffs 
might track most types of plain renames, and if it gets it wrong - do we 
care? Misdetection of renames can happen, but realistically only with 
small files and trivial code, which wont have alot of history.

The only serious type of misdetection would be if two large modules in 
two different places in the namespace happen to have exactly the same 
content but have a different history (because e.g. they were merged in 
via two separate trees, one came from one tree, the other from the other 
tree), and the developer renamed both of them in the same commit: in 
such a case the global diff would have no way to figure out what the 
proper thread of history is. But is this a realistic scenario?  If the 
two files are nontrivial and have the same content, why werent they 
merged in the namespace in the first place?

the moment we allow 'namespace' into the picture, things get complex and 
ugly. Directory recursion is already a complexity that would have been 
nice to avoid.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> [...] Ie if you notice a rename, you first commit the rename (and you 
> can _see_ it's a rename, since the object didn't change, and the sha1 
> stayed the same, which in git-speak means that it is the same object, 
> ie that _is_ a rename as far as git is concerned), and then you create 
> the "this is the data that changed" as a _second_ commit.

ok, i accept your point of not putting this into such a low level as the 
object abstraction. Was a bad idea.

but i dont think the above would be enough: there can be renames of 
objects that have the same sha1 hash as other objects in the same tree, 
and developers want to track individual objects, regardless of whether 
other files share the same content. So some formal operation would be 
needed to signal renames - e.g. to embedd it in the commit object, per 
David's suggestion.

The thing i tried to avoid was to list long filenames in the commit 
(because of the tree hierarchy we'd need to do tree-absolute pathnames 
or something like that, and escape things, and do lookups - duplicating 
a VFS which is quite bad) - it would be better to identify the rename 
source and target via its tree object hash and its offset within that 
tree. Such information could be embedded in the commit object just fine.  
Something like:

me bb95843a5a0f397270819462812735ee29796fb4
tree 1756b578489f93999ded68ae347bef7d6063101c
parent 9f02d4d233223462d3f6217b5837b786e6286ba4
author
committer
rename 39021759c903a943a33a28cfbd5070d36d851581 15234 
9f02d4d233223462d3f6217b5837b786e6286ba4 16163

?

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Handling renames.

2005-04-14 Thread Ingo Molnar

* David Woodhouse <[EMAIL PROTECTED]> wrote:

> I've been looking at tracking file revisions. One proposed solution 
> was to have a separate revision history for individual files, with a 
> new kind of 'filecommit' object which parallels the existing 'commit', 
> referencing a blob instead of a tree. Then trees would reference such 
> objects instead of referencing blobs directly.
> 
> I think that introduces a lot of redundancy though, because 99% of the 
> time, the revision history of the individual file is entirely 
> reproducible from the revision history of the tree. It's only when 
> files are renamed that we fall over -- and I think we can handle 
> renames fairly well if we just log them in the commit object.

how about the following structure:

- tree_new --->
- tree_old ---> rename_commit -> blob

the rename_commit object just contains a pointer to the file content 
blob. If a rename happens then the old tree references the rename_commit 
object (instead of the blob), and the new tree references it too. This 
way there's no need to list the rename via namespace means: if a tree 
entry points to a rename_commit object then a rename happened and the 
rename_commit object is looked up in the old tree to get the old name.

there's no redundancy caused by this method: only renames (which are 
rare) go through the rename_commit redirection. (to speed up the lookup 
the rename_commit object could cache the offset of the two names within 
their tree objects.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] git: fix memory leak #2 in read-cache.c

2005-04-14 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> In other words, if the common case is that we update a couple of 
> entries in the active cache, we actually saved 1.6MB (+ malloc 
> overhead for the 17 _thousand_ allocations) by my approach.
> 
> And the leak? There's none. We never actually update an existing entry 
> that was allocated with malloc(), unless the user does something 
> stupid.  In other words, the only case where there is a "leak" is when 
> the user does something like
> 
>   update-cache file file file file file file .. 
> 
> with the same file listed several times.

fair enough - as long as this is only used in a scripted environment, 
and not via some library and not within a repository server, web 
backend, etc.

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leak #3 in update-cache.c

2005-04-14 Thread Ingo Molnar

the patch below fixes a third memory leak in update-cache.c. (the 
mmap-ed file needs to be unmapped too) Ontop of the previous 
update-cache.c patches.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- update-cache.c.orig
+++ update-cache.c
@@ -32,6 +32,8 @@ static int index_fd(const char *path, in
if (!out || (int)(long)in == -1) {
free(out);
free(metadata);
+   if ((int)(long)in != -1 && size)
+   munmap(in, size);
return -1;
}
 
@@ -66,6 +68,8 @@ static int index_fd(const char *path, in

free(out);
free(metadata);
+   if (size)
+   munmap(in, size);
 
return ret;
 }
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [patch] git: fix memory leak #2 in read-cache.c

2005-04-14 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> this patch fixes a memory leak in read-cache.c: when there's cache 
> entry collision we should free the previous one.

> + free(active_cache[pos]);
>   active_cache[pos] = ce;

i'm having second thoughs about this one: active_cache entries are not 
always malloc()-ed - e.g. read_cache() will construct them from the 
mmap() of the index file. Which must not be free()d!

one safe solution would be to malloc() all these entries and copy them 
over from the index file? Slightly slower but safer and free()-able when 
update-cache.c notices a collision. The (tested) patch below does this.

this would also make Martin Schlemmer's update-cache.c fix safe.

(without this second patch, free(active_cache[pos]) might crash, and 
that crash is would possibly be remote exploitable via a special 
repository that tricks the index file to look in a certain way.)

    Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- read-cache.c.orig
+++ read-cache.c
@@ -453,10 +453,17 @@ int read_cache(void)
 
offset = sizeof(*hdr);
for (i = 0; i < hdr->entries; i++) {
-   struct cache_entry *ce = map + offset;
+   struct cache_entry *ce = map + offset, *tmp;
offset = offset + ce_size(ce);
-   active_cache[i] = ce;
+
+   tmp = malloc(ce_size(ce));
+   if (!tmp)
+   return error("malloc failed");
+   memcpy(tmp, ce, ce_size(ce));
+   active_cache[i] = tmp;
}
+   munmap(map, size);
+
return active_nr;
 
 unmap:

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leak in write-tree.c

2005-04-14 Thread Ingo Molnar

this patch fixes a memory leak in write-tree.c's write_tree() function - 
we didnt free the temporary output buffer. Depending on the size of the 
tree written out this could leak a significant amount of RAM.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- write-tree.c.orig
+++ write-tree.c
@@ -33,12 +33,12 @@ static int write_tree(struct cache_entry
 {
unsigned char subdir_sha1[20];
unsigned long size, offset;
-   char *buffer;
+   char *buffer0, *buffer;
int i, nr;
 
/* Guess at some random initial size */
size = 8192;
-   buffer = malloc(size);
+   buffer0 = buffer = malloc(size);
offset = ORIG_OFFSET;
 
nr = 0;
@@ -97,6 +97,8 @@ static int write_tree(struct cache_entry
offset -= i;
 
write_sha1_file(buffer, offset, returnsha1);
+   free(buffer0);
+
return nr;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: clean up add_file_to_cache() in update-cache.c

2005-04-14 Thread Ingo Molnar

this patch cleans up add_file_to_cache() to free up all memory it 
allocates. This has no significance right now as the only user of 
add_file_to_cache() die()s immediately in the 'leak' paths - but if the 
function is ever used without dying then this uncleanliness could lead 
to a memory leak.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- update-cache.c.orig
+++ update-cache.c
@@ -120,10 +120,17 @@ static int add_file_to_cache(char *path)
ce->st_size = st.st_size;
ce->namelen = namelen;
 
-   if (index_fd(path, namelen, ce, fd, &st) < 0)
+   if (index_fd(path, namelen, ce, fd, &st) < 0) {
+   free(ce);
return -1;
+   }
 
-   return add_cache_entry(ce, allow_add);
+   if (add_cache_entry(ce, allow_add)) {
+   free(ce);
+   return -1;
+   }
+
+   return 0;
 }
 
 static int match_data(int fd, void *buffer, unsigned long size)

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leaks in update-cache.c

2005-04-14 Thread Ingo Molnar

this patch fixes two memory leaks in update-cache.c: we didnt free the 
temporary input and output buffers used for compression.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- update-cache.c.orig
+++ update-cache.c
@@ -23,13 +23,17 @@ static int index_fd(const char *path, in
void *metadata = malloc(namelen + 200);
void *in;
SHA_CTX c;
+   int ret;
 
in = "";
if (size)
in = mmap(NULL, size, PROT_READ, MAP_PRIVATE, fd, 0);
close(fd);
-   if (!out || (int)(long)in == -1)
+   if (!out || (int)(long)in == -1) {
+   free(out);
+   free(metadata);
return -1;
+   }
 
memset(&stream, 0, sizeof(stream));
deflateInit(&stream, Z_BEST_COMPRESSION);
@@ -58,7 +62,12 @@ static int index_fd(const char *path, in
SHA1_Update(&c, out, stream.total_out);
SHA1_Final(ce->sha1, &c);
 
-   return write_sha1_buffer(ce->sha1, out, stream.total_out);
+   ret = write_sha1_buffer(ce->sha1, out, stream.total_out);
+   
+   free(out);
+   free(metadata);
+
+   return ret;
 }
 
 /*
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix 1-byte overflow in show-files.c

2005-04-14 Thread Ingo Molnar

this patch fixes a 1-byte overflow in show-files.c (looks narrow is is 
probably not exploitable). A specially crafted db object (tree) might 
trigger this overflow.

'fullname' is an array of 4096+1 bytes, and we do readdir(), which 
produces entries that have strings with a length of 0-255 bytes. With a 
long enough 'base', it's possible to construct a tree with a name in it 
that has directory whose name ends precisely at offset 4095. At that 
point this code:

case DT_DIR:
memcpy(fullname + baselen + len, "/", 2);

will attempt to append a "/" string to the directory name - resulting in 
a 1-byte overflow (a zero byte is written to offset 4097, which is 
outside the array).

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- show-files.c.orig
+++ show-files.c
@@ -49,7 +49,7 @@ static void read_directory(const char *p
 
if (dir) {
struct dirent *de;
-   char fullname[MAXPATHLEN + 1];
+   char fullname[MAXPATHLEN + 2]; // +1 byte for trailing slash
memcpy(fullname, base, baselen);
 
while ((de = readdir(dir)) != NULL) {

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leak #2 in read-cache.c

2005-04-14 Thread Ingo Molnar

this patch fixes a memory leak in read-cache.c: when there's cache entry 
collision we should free the previous one.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- read-cache.c.orig
+++ read-cache.c
@@ -369,6 +369,7 @@ int add_cache_entry(struct cache_entry *
 
/* existing match? Just replace it */
if (pos >= 0) {
+   free(active_cache[pos]);
active_cache[pos] = ce;
return 0;
}
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leaks in read-cache.c

2005-04-14 Thread Ingo Molnar

this patch fixes a common and a rare memory leak in read-cache.c.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- read-cache.c.orig
+++ read-cache.c
@@ -226,8 +226,11 @@ int write_sha1_file(char *buf, unsigned 
SHA1_Update(&c, compressed, size);
SHA1_Final(sha1, &c);
 
-   if (write_sha1_buffer(sha1, compressed, size) < 0)
+   if (write_sha1_buffer(sha1, compressed, size) < 0) {
+   free(compressed);
return -1;
+   }
+   free(compressed);
if (returnsha1)
memcpy(returnsha1, sha1, 20);
return 0;
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] cleanup: read_sha1_file() -> malloc_read_sha1_file()

2005-04-14 Thread Ingo Molnar

this patch renames read_sha1_file() to malloc_read_sha1_file().

There were a handful of memory-leaks related to read_sha1_file(), and 
some of those could possibly have been found sooner if the name 
indicated that an implicit malloc() is occurs within read_sha1_file().  
This patch is ontop of the previous patches i sent.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- cat-file.c.orig
+++ cat-file.c
@@ -14,7 +14,7 @@ int main(int argc, char **argv)
 
if (argc != 3 || get_sha1_hex(argv[2], sha1))
usage("cat-file [-t | tagname] ");
-   buf = read_sha1_file(sha1, type, &size);
+   buf = malloc_read_sha1_file(sha1, type, &size);
if (!buf)
die("cat-file %s: bad file", argv[2]);
if (!strcmp("-t", argv[1])) {
--- merge-tree.c.orig
+++ merge-tree.c
@@ -11,7 +11,7 @@ static struct tree_entry *read_tree(unsi
 {
char type[20];
unsigned long size;
-   void *buf = read_sha1_file(sha1, type, &size);
+   void *buf = malloc_read_sha1_file(sha1, type, &size);
struct tree_entry *ret = NULL, **tp = &ret;
 
if (!buf || strcmp(type, "tree"))
--- diff-tree.c.orig
+++ diff-tree.c
@@ -62,7 +62,7 @@ static void show_file(const char *prefix
char *newbase = malloc_base(base, path, strlen(path));
void *tree;
 
-   tree = read_sha1_file(sha1, type, &size);
+   tree = malloc_read_sha1_file(sha1, type, &size);
if (!tree || strcmp(type, "tree"))
die("corrupt tree sha %s", sha1_to_hex(sha1));
 
@@ -164,10 +164,10 @@ static int diff_tree_sha1(const unsigned
char type[20];
int retval;
 
-   tree1 = read_sha1_file(old, type, &size1);
+   tree1 = malloc_read_sha1_file(old, type, &size1);
if (!tree1 || strcmp(type, "tree"))
die("unable to read source tree (%s)", sha1_to_hex(old));
-   tree2 = read_sha1_file(new, type, &size2);
+   tree2 = malloc_read_sha1_file(new, type, &size2);
if (!tree2 || strcmp(type, "tree"))
die("unable to read destination tree (%s)", sha1_to_hex(new));
retval = diff_tree(tree1, size1, tree2, size2, base);
--- show-diff.c.orig
+++ show-diff.c
@@ -25,7 +25,7 @@ static void show_diff_empty(struct cache
int lines=0;
unsigned char type[20], *p, *end;
 
-   old = read_sha1_file(ce->sha1, type, &size);
+   old = malloc_read_sha1_file(ce->sha1, type, &size);
if (size > 0) {
int startline = 1;
int c = 0;
@@ -99,7 +99,7 @@ int main(int argc, char **argv)
if (silent)
continue;
 
-   new = read_sha1_file(ce->sha1, type, &size);
+   new = malloc_read_sha1_file(ce->sha1, type, &size);
show_differences(ce->name, new, size);
free(new);
}
--- read-cache.c.orig
+++ read-cache.c
@@ -183,7 +183,7 @@ void * unpack_sha1_file(void *map, unsig
return buf;
 }
 
-void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long 
*size)
+void * malloc_read_sha1_file(const unsigned char *sha1, char *type, unsigned 
long *size)
 {
unsigned long mapsize;
void *map, *buf;
--- update-cache.c.orig
+++ update-cache.c
@@ -141,7 +141,7 @@ static int compare_data(struct cache_ent
unsigned long size;
char type[20];
 
-   buffer = read_sha1_file(ce->sha1, type, &size);
+   buffer = malloc_read_sha1_file(ce->sha1, type, &size);
if (buffer) {
if (size == expected_size && !strcmp(type, "blob"))
match = match_data(fd, buffer, size);
--- ls-tree.c.orig
+++ ls-tree.c
@@ -11,7 +11,7 @@ static int list(unsigned char *sha1)
unsigned long size;
char type[20];
 
-   buffer = read_sha1_file(sha1, type, &size);
+   buffer = malloc_read_sha1_file(sha1, type, &size);
if (!buffer)
die("unable to read sha1 file");
if (strcmp(type, "tree"))
--- checkout-cache.c.orig
+++ checkout-cache.c
@@ -72,7 +72,7 @@ static int write_entry(struct cache_entr
long wrote;
char type[20];
 
-   new = read_sha1_file(ce->sha1, type, &size);
+   new = malloc_read_sha1_file(ce->sha1, type, &size);
if (!new || strcmp(type, "blob")) {
if (new)
free(new);
--- cache.h.orig
+++ cache.h
@@ -95,7 +95,7 @@ extern int write_sha1_buffer(const unsig
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern void * map_sha1_file(const unsigned char *sha1, unsigned 

[patch] git: fix overflow in update-cache.c

2005-04-14 Thread Ingo Molnar

this patch fixes a 1-byte overflow in update-cache.c (probably not 
exploitable). A specially crafted db object might trigger this overflow.

the bug is that normally the 'type' field is parsed by read_sha1_file(), 
via:

if (sscanf(buffer, "%10s %lu", type, size) != 2)

i.e. 0-10 long strings, which take 1-11 bytes of space. Normally the 
type strings are stored in char [20] arrays, but in update-cache.c that 
is char [10], so a 1 byte overflow might occur.

This should not happen with a 'friendly' DB, as the longest type string 
("commit") is 7 bytes long. The fix is to use the customary char [20].

(someone might want to clean those open-coded constants up with a 
TYPE_LEN define, they do tend to cause problems like this. I'm not 
against open-coded constants (they make code much more readable), but 
for fields that get filled in from possibly hostile objects this is 
playing with fire.)

hey, this might be the first true security fix for GIT? ;-)

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- update-cache.c.orig
+++ update-cache.c
@@ -139,7 +139,7 @@ static int compare_data(struct cache_ent
if (fd >= 0) {
void *buffer;
unsigned long size;
-   char type[10];
+   char type[20];
 
buffer = read_sha1_file(ce->sha1, type, &size);
if (buffer) {
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leak in show-diff.c

2005-04-14 Thread Ingo Molnar

this patch fixes a memory leak in show-diff.c.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- show-diff.c.orig
+++ show-diff.c
@@ -53,6 +53,7 @@ static void show_diff_empty(struct cache
printf("\n");
fflush(stdout);
}
+   free(old);
 }
 
 int main(int argc, char **argv)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: report parse_commit() errors in rev-tree.c

2005-04-14 Thread Ingo Molnar

make actual use of the parse_commit() return value and print a warning, 
instead of silently ignoring it. Should never trigger on a valid DB.

(alternatively we might use a die() in the sanity check place and could 
remove all the return code handling?)

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- rev-tree.c.orig
+++ rev-tree.c
@@ -64,6 +64,7 @@ static unsigned long parse_commit_date(c
 static int parse_commit(unsigned char *sha1)
 {
struct revision *rev = lookup_rev(sha1);
+   int ret = 0;
 
if (!(rev->flags & SEEN)) {
void *buffer, *bufptr;
@@ -81,13 +82,13 @@ static int parse_commit(unsigned char *s
bufptr += 46; /* "tree " + "hex sha1" + "\n" */
while (!memcmp(bufptr, "parent ", 7) && !get_sha1_hex(bufptr+7, 
parent)) {
add_relationship(rev, parent);
-   parse_commit(parent);
+   ret |= parse_commit(parent);
bufptr += 48;   /* "parent " + "hex sha1" + "\n" */
}
rev->date = parse_commit_date(bufptr);
free(buffer);
}
-   return 0;   
+   return ret;
 }
 
 static void read_cache_file(const char *path)
@@ -208,7 +209,8 @@ int main(int argc, char **argv)
}
if (nr >= MAX_COMMITS || get_sha1_hex(arg, sha1[nr]))
usage("rev-tree [--edges] [--cache ] 
 []");
-   parse_commit(sha1[nr]);
+   if (parse_commit(sha1[nr]))
+   fprintf(stderr, "warning: rev-tree: bad commit!\n");
nr++;
}
 
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix rare memory leak in rev-tree.c

2005-04-14 Thread Ingo Molnar

this patch fixes a rare memory leak in rev-tree.c.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- rev-tree.c.orig
+++ rev-tree.c
@@ -73,8 +73,11 @@ static int parse_commit(unsigned char *s
 
rev->flags |= SEEN;
buffer = bufptr = read_sha1_file(sha1, type, &size);
-   if (!buffer || strcmp(type, "commit"))
+   if (!buffer || strcmp(type, "commit")) {
+   if (buffer)
+   free(buffer);
return -1;
+   }
bufptr += 46; /* "tree " + "hex sha1" + "\n" */
while (!memcmp(bufptr, "parent ", 7) && !get_sha1_hex(bufptr+7, 
parent)) {
add_relationship(rev, parent);
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leaks in read-tree.c

2005-04-14 Thread Ingo Molnar

this patch fixes one common, and 4 rare memory leaks in read-tree.c.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- read-tree.c.orig
+++ read-tree.c
@@ -23,23 +23,27 @@ static int read_one_entry(unsigned char 
 
 static int read_tree(unsigned char *sha1, const char *base, int baselen)
 {
-   void *buffer;
+   void *buffer0, *buffer;
unsigned long size;
char type[20];
 
-   buffer = read_sha1_file(sha1, type, &size);
+   buffer0 = buffer = read_sha1_file(sha1, type, &size);
if (!buffer)
return -1;
-   if (strcmp(type, "tree"))
+   if (strcmp(type, "tree")) {
+   free(buffer);
return -1;
+   }
while (size) {
int len = strlen(buffer)+1;
unsigned char *sha1 = buffer + len;
char *path = strchr(buffer, ' ')+1;
unsigned int mode;
 
-   if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+   if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1) {
+   free(buffer0);
return -1;
+   }
 
buffer = sha1 + 20;
size -= len + 20;
@@ -53,13 +57,19 @@ static int read_tree(unsigned char *sha1
newbase[baselen + pathlen] = '/';
retval = read_tree(sha1, newbase, baselen + pathlen + 
1);
free(newbase);
-   if (retval)
+   if (retval) {
+   free(buffer0);
return -1;
+   }
continue;
}
-   if (read_one_entry(sha1, base, baselen, path, mode) < 0)
+   if (read_one_entry(sha1, base, baselen, path, mode) < 0) {
+   free(buffer0);
return -1;
+   }
}
+   free(buffer0);
+
return 0;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: cleanup in ls-tree.c

2005-04-14 Thread Ingo Molnar

cleanup: this patch adds a free() to ls-tree.c.

(Technically it's not a memory leak yet because the buffer is allocated 
once by the function and then the utility exits - but it's a tad cleaner 
to not leave such assumptions in the code, so that if someone reuses the 
function (or extends the utility to include a loop) the uncleanliness 
doesnt develop into a real memory leak.)

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- ls-tree.c.orig
+++ ls-tree.c
@@ -33,6 +33,8 @@ static int list(unsigned char *sha1)
type = S_ISDIR(mode) ? "tree" : "blob";
printf("%03o\t%s\t%s\t%s\n", mode, type, sha1_to_hex(sha1), 
path);
}
+   free(buffer);
+
return 0;
 }
 
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leak #2 in checkout-cache.c

2005-04-14 Thread Ingo Molnar

this patch fixes another (very rare) memory leak in checkout-cache.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- checkout-cache.c.orig
+++ checkout-cache.c
@@ -74,6 +74,8 @@ static int write_entry(struct cache_entr
 
new = read_sha1_file(ce->sha1, type, &size);
if (!new || strcmp(type, "blob")) {
+   if (new)
+   free(new);
return error("checkout-cache: unable to read sha1 file of %s 
(%s)",
ce->name, sha1_to_hex(ce->sha1));
}
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[patch] git: fix memory leak in checkout-cache.c

2005-04-14 Thread Ingo Molnar

this patch fixes a memory leak in checkout-cache.

Ingo

Signed-off-by: Ingo Molnar <[EMAIL PROTECTED]>

--- checkout-cache.c.orig
+++ checkout-cache.c
@@ -48,6 +48,7 @@ static void create_directories(const cha
buf[len] = 0;
mkdir(buf, 0755);
}
+   free(buf);
 }
 
 static int create_file(const char *path, unsigned int mode)
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


cache-cold repository performance

2005-04-14 Thread Ingo Molnar

* Ingo Molnar <[EMAIL PROTECTED]> wrote:

> i'd be surprised if it was twice as fast - cache-cold linear checkouts 
> are _seek_ limited, and it doesnt matter whether after a 1-2 msec 
> track-to-track disk seek the DMA engine spends another 30 microseconds 
> DMA-ing 60K uncompressed data instead of 30K compressed... (there are 
> other factors, but this is the main thing.)

i've benchmarked cache-cold compressed vs. uncompressed performance, to 
shed some more light on the performance differences between flat and 
compressed repositories.

i did alot of testing, and i primarily concentrated on being able to 
_trust_ the benchmark results, not to generate some quick numbers. The 
major problem was that the timing of the reads associated with 'checking 
out a large tree' is very unstable, even on a completely isolated 
testsystem with very common (and predictable) IO hardware.

the content i tested was a vanilla 2.6.10 kernel tree, with 19042 files 
in it, taking 246 MB uncompressed, and 110 MB compressed (via gzip -9).  
Average file size is 13.2 KB uncompressed, 5.9 KB compressed.

Firstly, the timings are very sensitive to the way the tree was created.  
To have a 'fair' on-disk layout the trees have to be created in an 
identical fashion: e.g. it is not valid to copy the uncompressed tree 
and run gzip over it - that will create a 'sparse' on-disk layout 
penalizing the compressed layout and making it 30% slower than the 
uncompressed layout! I first created the two trees, then i "cp -a"-ed 
them over into a new directory one after each other, so that they get on 
similar on-disk positions as well. I also created 2 more pairs of such 
trees to make sure disk layout is fair.

all timings were taken fresh after reboot, on a UP 1 GB RAM Athlon64 
3200+, using a large, top of the line IDE disk. The kernel was 
2.6.12-rc2, the filesystem was ext3 with enough free space to not be 
fragmented, both noatime and nodiratime was specified so that no write 
activities whatever occur during the 'checkout'.

the operation timed was a simple:

time find . -type f | xargs cat > /dev/null

done in the root of the given tree. This generates the very same 
readonly IO pattern for each test. I've run the tests 10 times (i.e.  
have done 10 fresh reboots), but after every reboot i permutated the 
order of trees tested - to make sure there is no interaction between 
trees. (there was no interaction)

here are the raw numbers, elapsed real time in seconds:

 flat-1:  29.7 29.5 29.4 29.4 29.5 29.5 29.7 29.6 29.4 29.6 29.5 29.4:  29.5
 gzip-1:  41.2 40.9 40.7 40.7 40.5 41.7 41.0 40.3 40.6 40.8 40.8 40.9:  40.8

 flat-2:  28.0 28.2 27.7 27.9 27.8 27.9 27.7 27.9 27.9 28.1 27.9 28.0:  27.9
 gzip-2:  27.2 27.4 27.4 27.2 27.2 27.2 27.2 27.2 27.1 27.3 27.2 27.4:  27.2
 flat-3:  27.0 27.8 27.6 27.7 27.8 27.8 27.8 27.7 27.8 27.6 27.8 27.8:  27.6
 gzip-3:  25.8 26.8 26.6 26.5 26.5 26.5 26.6 26.4 26.5 26.7 26.6 26.7:  26.5

The final column is the average. (Standard deviation is below 0.1 sec, 
less than 0.3%.)

flat-1 is the original tree, created via tar. gzip-1 is a cp -a copy of 
it, per-file compressed afterwards. flat-2 is a cp -a copy of flat-1, 
gzip-2 is a cp -a copy of gzip-1. flat-3/gzip-3 are cp -a copies of 
flat-2/gzip-2.

note that gzip-1 is ~40% slower due to the 'sparse layout', so its 
results approximate a repository with 'bad' file layout. I'd not expect 
GIT repositories to have such a layout normally, so we can disregard it.

flat-2/3 and gzip-2/3 can be directly compared. Firstly, the results 
show that the on-disk layout cannot be constructed reliably - there's a 
1% systematic difference between flat-2 and flat-3, and a 3% systematic 
difference between gzip-2 and gzip-3 - both systematic errors are larger 
than the 0.5% standard deviation, so they are not measurement errors but 
real layout properties of these trees.

the most interesting result is that gzip-2 is 2.5% faster than flat-2, 
and gzip-3 is 4% faster than flat-3. These differences are close to the 
layout-related systematic error, but slightly above it, so i'd conclude 
that a compressed repository is 3% faster on this hardware.

(since these results were in line with my expectations i double-checked 
everything again and did another 10 reboot tests - same results.)

conclusion [*]: there's a negligible cache-cold performance hit from 
using an uncompressed repository, because cache-cold performance is 
dominated by number of seeks, which is almost identical in the two 
cases.

Ingo

[*] lots of conditionals apply: these werent flat/compressed GIT 
repositories (although they were quite similar to it), nor was the GIT 
workload measured (although the one measured should be quite close to 
it).

-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Index/hash order

2005-04-14 Thread Ingo Molnar

* Linus Torvalds <[EMAIL PROTECTED]> wrote:

> > I've run a few tests, just to get a few numbers of the overhead
> > involved.  I used the last ~8,000 changesets from the BKCVS kernel
> > repository.  With cold cache, a checkout from cold cache takes about
> > 250 seconds on my laptop.  I don't have git numbers, but a mere copy
> > of the kernel tree needs 40 seconds.
> 
> I will bet you that a git checkout is _faster_ than a kernel source 
> tree copy. The time will be dominated by the IO costs (in particular 
> the read costs), and the IO costs are lower thanks to compression. So 
> I think that the cold-cache case will beat your 40 seconds by a clear 
> margin. It generally compresses to half the size, so 20 seconds is not 
> impossible (although seek costs would tend to stay constant, so I'd 
> expect it to be somewhere in between the two).

i'd be surprised if it was twice as fast - cache-cold linear checkouts 
are _seek_ limited, and it doesnt matter whether after a 1-2 msec 
track-to-track disk seek the DMA engine spends another 30 microseconds 
DMA-ing 60K uncompressed data instead of 30K compressed... (there are 
other factors, but this is the main thing.)

Ingo
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html