Re: full kernel history, in patchset format

2005-04-19 Thread Catalin Marinas
David Mansfield [EMAIL PROTECTED] wrote:
 Catalin Marinas wrote:
 AFAIK, cvsps uses the date/time to create the changesets. There is a
 problem with the BKCVS export since some files in the same commit can
 have a different time (by an hour). I posted a mail some time ago
 about this -
 http://marc.theaimsgroup.com/?l=linux-kernelm=110026570201544w=2
 I read that the old history won't be merged into the new repository
 but, if you are interested, I have a script that can do this based on
 the (Logical change ...) string in the file commit logs and it is
 quite fast at generating the patches.


 Hmmm.  I read that message just now.  Is it a matter of 'perfection'
 that is the issue here, or actual correctness when applying the
 patches in order?

I see it as a matter of correctness since in a given BKCVS changeset
(i.e. revision in the ChangeSet,v file) you may miss files. You would
eventually get them, with the same log, but in a different patch. If
you don't care about this, you can call it 'perfection'.

At that time I thought about modifying cvsps to use the (Logical
change ...) string instead of time/date for grouping the files but I
realised it is easier with a shell script.

 (perhaps this has now been fixed).

There was no reply to this e-mail. It might have been fixed in the
meantime but I don't think the history was fixed as well.

-- 
Catalin

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-17 Thread Petr Baudis
Dear diary, on Mon, Apr 18, 2005 at 01:31:36AM CEST, I got a letter
where David Woodhouse [EMAIL PROTECTED] told me that...
 Note that any given copy of a tree doesn't _need_ to keep all the
 history back the beginning of time. It's OK if the oldest commit object
 in your tree actually refers back to a parent which doesn't exist
 locally. I can well imagine that some people will want to keep their
 trees pruned to keep only a few weeks of history, while other copies of
 the tree will keep everything.

I think this is bad, bad, bad. If you don't keep around all the
_commits_, you get into all sorts of troubles - when merging, when doing
git log, etc. And the commits themselves are probably actually pretty
small portion of the thing. I didn't do any actual measurement but I
would be pretty surprised if it would be much more than few megabytes of
data for the kernel history.

Of course an entirely different thing are _trees_ associated with those
commits. As long as you stay with a simple three-way merge, you
basically never want to look at trees which aren't heads and which you
don't specifically request to look at. And the trees and what they carry
inside is the main bulk of data.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-17 Thread David Woodhouse
On Mon, 2005-04-18 at 02:50 +0200, Petr Baudis wrote:
 I think I will make git-pasky's default behaviour (when we get
 http-pull, that is) to keep the complete commit history but only trees
 you need/want; togglable to both sides.

I think the default behaviour should probably be to fetch everything.

-- 
dwmw2

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 the patches contain all the existing metadata, dates, log messages and 
 revision history. (What i think is missing is the BK tree merge 
 information, but i'm not sure we want/need to convert them to GIT.)

author names are abbreviated, e.g. 'viro' instead of 
[EMAIL PROTECTED], and no committer information is 
included (albeit commiter ought to be Linus in most cases). These are 
limitations of the BK-CVS gateway i think.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Francois Romieu
Ingo Molnar [EMAIL PROTECTED] :
[...]
 the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
 script that will apply all the patches in order and will create a 
 pristine 2.6.12-rc2 tree.

127 weeks of bk-commit mail for the 2.6 branch alone since october 2002
provides more than 44000 messages here. The figures are surprisingly
different.

 it needed many hours to finish, on a very fast server with tons of RAM, 
 and it also needed a fair amount of manual work to extract it and to 
 make it usable, so i guessed others might want to use the end result as 
 well, to try and generate large GIT repositories from them (or to run 
 analysis over the patches, etc.).

Has anyone already compared the (split/digested) content of the ChangeLog
file with the commit messages ? It raises the interesting question of
inserting the merge messages/patches in the sequence at the right place
but I'd like to know if someone met other issues.

--
Ueimor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Ingo Molnar wrote:
 
 i've converted the Linux kernel CVS tree into 'flat patchset' format, 
 which gave a series of 28237 separate patches. (Each patch represents a 
 changeset, in the order they were applied. I've used the cvsps utility.)
 
 the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
 script that will apply all the patches in order and will create a 
 pristine 2.6.12-rc2 tree.

Hey, that's great. I got the CVS repo too, and I was looking at it, but 
the more I looked at it, the more I felt that the main reason I want to 
import it into git ends up being to validate that my size estimates are at 
all realistic.

I see that Thomas Gleixner seems to have done that already, and come to a 
figure of 3.2GB for the last three years, which I'm very happy with, 
mainly because it seems to match my estimates to a tee. Which means that I 
just feel that much more confident about git actually being able to handle 
the kernel long-term, and not just as a stop-gap measure.

But I wonder if we actually want to actually populate the whole history.. 
Now that my size estimates have been verified, I have little actual real 
reason to put the history into git. There are no visualization tools done 
for git yet, and no helpers to actually find problems, and by the time 
there will be, we'll have new history.

So I'd _almost_ suggest just starting from a clean slate after all.  
Keeping the old history around, of course, but not necessarily putting it
into git now. It would just force everybody who is getting used to git in 
the first place to work with a 3GB archive from day one, rather than 
getting into it a bit more gradually.

What do people think? I'm not so much worried about the data itself: the
git architecture is _so_ damn simple that now that the size estimate has
been confirmed, that I don't think it would be a problem per se to put
3.2GB into the archive. But it will bog down rsync horribly, so it will
actually hurt synchronization untill somebody writes the rev-tree-like
stuff to communicate changes more efficiently..

IOW, it smells to me like we don't have the infrastructure to really work 
with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can 
build up the infrastructure in parallell with starting to really need it.

But it's _great_ to have the history in this format, especially since 
looking at CVS just reminded me how much I hated it.

Comments?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Thomas Gleixner
On Sat, 2005-04-16 at 10:04 -0700, Linus Torvalds wrote:

 So I'd _almost_ suggest just starting from a clean slate after all.  
 Keeping the old history around, of course, but not necessarily putting it
 into git now. It would just force everybody who is getting used to git in 
 the first place to work with a 3GB archive from day one, rather than 
 getting into it a bit more gradually.

Sure. We can export the 2.6.12-rc2 version of the git'ed history tree
and start from there. Then the first changeset has a parent, which just
lives in a different place. 
Thats the only difference to your repository, but it would change the
sha1 sums of all your changesets.

 What do people think? I'm not so much worried about the data itself: the
 git architecture is _so_ damn simple that now that the size estimate has
 been confirmed, that I don't think it would be a problem per se to put
 3.2GB into the archive. But it will bog down rsync horribly, so it will
 actually hurt synchronization untill somebody writes the rev-tree-like
 stuff to communicate changes more efficiently..

We have all the tracking information in SQL and we will post the data
base dump soon, so people interested in revision tracking can use this
as an information base.

 But it's _great_ to have the history in this format, especially since 
 looking at CVS just reminded me how much I hated it.

:)

One remark on the tree blob storage format. 
The binary storage of the sha1sum of the refered object is a PITA for
scripting. 
Converting the ASCII - binary for the sha1sum comparision should not
take much longer than the binary - ASCII conversion for the file
reference. Can this be changed ?

tglx


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: full kernel history, in patchset format

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter
where Thomas Gleixner [EMAIL PROTECTED] told me that...
 One remark on the tree blob storage format. 
 The binary storage of the sha1sum of the refered object is a PITA for
 scripting. 
 Converting the ASCII - binary for the sha1sum comparision should not
 take much longer than the binary - ASCII conversion for the file
 reference. Can this be changed ?

Huh, you aren't supposed to peek into trees directly. What's wrong with
ls-tree?

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Mike Taht

 * A script git-archive-tar is used to create a base tarball
   that roughly corresponds to linux-*.tar.gz.  This works as
   follows:
$ git-archive-tar C [B1 B2...]
   This reads the named commit C, grabs the associated tree
   (i.e.  its sub-tree objects and the blob they refer to), and
   makes a tarball of ??/??
   files.  The tarball does not have to contain any extra
   information to reproduce any ancestor of the named commit.
alternatively, git-archive-torrent to create a list of files for a 
bittorrent feed

--
Mike Taht
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: full kernel history, in patchset format

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 08:32:32PM CEST, I got a letter
where Petr Baudis [EMAIL PROTECTED] told me that...
 Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter
 where Thomas Gleixner [EMAIL PROTECTED] told me that...
  One remark on the tree blob storage format. 
  The binary storage of the sha1sum of the refered object is a PITA for
  scripting. 
  Converting the ASCII - binary for the sha1sum comparision should not
  take much longer than the binary - ASCII conversion for the file
  reference. Can this be changed ?
 
 Huh, you aren't supposed to peek into trees directly. What's wrong with
 ls-tree?

(I meant, you aren't supposed to peek into trees from scripts. Or well,
not not supposed, but it does not make much sense when you have
ls-tree.)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: full kernel history, in patchset format

2005-04-16 Thread Thomas Gleixner
On Sat, 2005-04-16 at 20:32 +0200, Petr Baudis wrote:
 Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter
 where Thomas Gleixner [EMAIL PROTECTED] told me that...
  One remark on the tree blob storage format. 
  The binary storage of the sha1sum of the refered object is a PITA for
  scripting. 
  Converting the ASCII - binary for the sha1sum comparision should not
  take much longer than the binary - ASCII conversion for the file
  reference. Can this be changed ?
 
 Huh, you aren't supposed to peek into trees directly. What's wrong with
 ls-tree?

Why I'm not supposed ? Is this evil ?

My export script has all the data available, so I write the tree refs
directly. The full export runs ~1 hour. Thats long enough :) I tried the
git way and it slows me down by factor BIG (I dont remember the
number)

Also for reference tracking all the information might be available e.g.
by a database. Why should the revtool then use some tool to retrieve
information which is already there ?

tglx


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Thomas Gleixner wrote:
 
 One remark on the tree blob storage format. 
 The binary storage of the sha1sum of the refered object is a PITA for
 scripting. 
 Converting the ASCII - binary for the sha1sum comparision should not
 take much longer than the binary - ASCII conversion for the file
 reference. Can this be changed ?

I'd really rather not. Why don't you just use ls-tree for scripting? 
That's why it exists in the first place. 

It might make sense to have some simple selection capabilities built into 
ls-tree (ie ls-tree --match drivers/char/ -z treesha1 to get just a 
subtree out), but that depends entirely on how you end up using it.

The fact is, there should _never_ any reason to look at the objects
themselves directly. cat-file is a debugging aid, it shouldn't be
scripted (with the possible exception of cat-file blob  to just
extract the blob contents, since that object doesn't have any internal
structure).

That level of abstraction (we never look directly at the objects) is 
what allows us to change the object structure later. For example, we 
already changed the commit date thing once, and the tree object has 
obviously evolved a bit, and if we ever change the hash, the objects will 
change too, but if you always just script them using nice helper tools, 
you won't ever need to _care_. And that's how it should be.

If there's a tool missing, holler. THAT is the part I've been trying to
write: all the plumbing so that you _can_ script the thing sanely, and not
worry about how objects are created and worked with. 

For example, that index file format likely _will_ change. I ended up
doing the new stage flags in a way that kept the index file compatible
with old ones, but I did that mainly because it also happened to be the
easiest way to enforce the rule I wanted to enforce (ie the stage really
_is_ a part of the filename from a compare filenames standpoint, in
order to make sure that the stages are always ordered).

So if the index file change hadn't had that property, I'd have just said
I'll change the format, and anybody who tried to parse the index file
would have been _broken_.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Junio C Hamano
 JCH == Junio C Hamano [EMAIL PROTECTED] writes:

JCH I have been cooking this idea before I dove into the merge stuff
JCH and did not have time to implement it myself (Hint Hint), but I
JCH think something along the following lines would work nicely:

It should be fairly obvious from the context what I meant to
say, but in case somebody gets confused by my inaccurate
description of small details (or, before somebody nitpicks ;-),
I'd add some clarifications and corrections.

JCH  * Run diff-tree between neighboring commits [*1*] to find out
JCHthe set of blobs that are related.  Extract those related
JCHblobs and run diff [*2*] between them to see if it produces
JCHa patch smaller than the whole thing when compressed.  If
JCHdiff+patch is a win, then we do not have to transmit the blob
JCHthat we could reproduce by sending the diff.  Note that fact.

I talked only about blobs here, but I really mean all types:
commits, trees and blobs here.  Nothing prevents us from
extracting the raw data for trees and commits and run diff
between them.  We can use cat-file to do that today.

What we do not have is the reverse of $ cat-file type rawdata
(i.e. $ write-file type rawdata), but that is trivial to
write.  The raw data for related tree objects should delta well.
I do not think it is worth the effort to attempt delta for
commit objects.  Anything that git-archive-tar decides not to
send in diff+patch form, be it blob or tree or commit, should be
noted here, not just blob as my previous message incorrectly
implies.

JCH Given the above, the operation of git-archive-patch is also
JCH quite obvious.  Extract the diff package tarball into the
JCH objects/ directory that has (at least) the full Bn, uncompress
JCH the patch file part, and run patch on it. 

Of course after you ran patch to reproduce the raw data for the
blob or tree, we need the reverse of cat-file to register such
data under object/ hierarchy.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Thomas Gleixner
On Sat, 2005-04-16 at 11:44 -0700, Linus Torvalds wrote:

 That level of abstraction (we never look directly at the objects) is 
 what allows us to change the object structure later. For example, we 
 already changed the commit date thing once, and the tree object has 
 obviously evolved a bit, and if we ever change the hash, the objects will 
 change too, but if you always just script them using nice helper tools, 
 you won't ever need to _care_. And that's how it should be.

For the export stuff its terrible slow. :(

I agree that using common tools is good. But we talk also about an open
format, so using a script to speed up certain tasks is not bad at all.

tglx



-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: full kernel history, in patchset format

2005-04-16 Thread Christopher Li
On Sat, Apr 16, 2005 at 07:43:27PM +0200, Petr Baudis wrote:
 Dear diary, on Sat, Apr 16, 2005 at 07:04:31PM CEST, I got a letter
 where Linus Torvalds [EMAIL PROTECTED] told me that...
  So I'd _almost_ suggest just starting from a clean slate after all.  
  Keeping the old history around, of course, but not necessarily putting it
  into git now. It would just force everybody who is getting used to git in 
  the first place to work with a 3GB archive from day one, rather than 
  getting into it a bit more gradually.
  
  Comments?
 
 FWIW, it looks pretty reasonable to me. Perhaps we should have a
 separate GIT repository with the previous history though, and in the
 first new commit the parent could point to the last commit from the
 other repository.
 
 Just if it isn't too much work, though. :-)

I think we can make the git using stackable repository. When it fail
to find an object, it will try it's to read from parent repository.
It is useful to slice the history.

I can have local repository that all the new object create by me will
store in my tree instead of the official one. Clean up the object in the
my local tree will be much easier it only need to work on a much smaller
repository. If all my change is merge to official tree, I just simply
empty my local repository.

About the kernel git repository. I think it is much easier just put
them in one tree.  So I don't need to worry about if I need to see
pre 2.6.12, I need to do this. And the full repository  need to
store in the server some where any way.

However I totally agree that people should not deal with unnecessary the history
when they start using the git tools. We should just make the tools
by default don't download all the histories. Only get it when user specific 
ask for it.

Why 2.6.12-rc2? When kernel grows to 2.6.15, a new user might not even need
pre 2.6.13 most of the time. If we make it very easier for people to get
history if they need, it will make them less motivate to store unnecessary
history locally (just in case I need it).

I think we should not advise using rsync to sync the whole git tree as
way to get update. We need to get use to only have a slice of the history
and get more if we needed.
The server should should provide some small metadata file like the
the rev-tool cache, so the SCM tools can download it to figure out what file
is needed to download to get to certain revision. Instead of download the
whole repository to figure out what is new.

We can even slice that metadata information to smaller pieces base on major 
release point.

Chris
 
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Jan-Benedict Glaw
On Sat, 2005-04-16 10:04:31 -0700, Linus Torvalds [EMAIL PROTECTED]
wrote in message [EMAIL PROTECTED]:

 What do people think? I'm not so much worried about the data itself: the
 git architecture is _so_ damn simple that now that the size estimate has
 been confirmed, that I don't think it would be a problem per se to put
 3.2GB into the archive. But it will bog down rsync horribly, so it will
 actually hurt synchronization untill somebody writes the rev-tree-like
 stuff to communicate changes more efficiently..
 
 IOW, it smells to me like we don't have the infrastructure to really work 
 with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can 
 build up the infrastructure in parallell with starting to really need it.

3GB is quite some data, but I'd accept and prefer to download it from
somewhere. I think that it's worth it.

I accept that there are people out there which would love to get a
smaller archive, but at least most developers that would actually use it
for day-to-day work *do* have the bandwidth to download it. Maybe we'd
also prepare (from time to time) bzip'ed tarballs, which I expect to be
a tad smaller.

MfG, JBG

-- 
Jan-Benedict Glaw   [EMAIL PROTECTED]. +49-172-7608481 _ O _
Eine Freie Meinung in  einem Freien Kopf| Gegen Zensur | Gegen Krieg  _ _ O
 fuer einen Freien Staat voll Freier Brger | im Internet! |   im Irak!   O O 
O
ret = do_actions((curr | FREE_SPEECH)  ~(NEW_COPYRIGHT_LAW | DRM | TCPA));


signature.asc
Description: Digital signature


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Thomas Gleixner wrote:
 
 For the export stuff its terrible slow. :(

I don't really see your point.

If you already know what the tree is like you say, you don't care about
the tree object. And if you don't know what the tree is, what _are_ you
doing?

In other words, show us what you're complaining about. If you're looking
into the trees yourself, then the binary representation of the sha1 is
already what you want. That _is_ the hash. So why do you want it in ASCII?  
And if you're not looking into the tree directly, but using cat-file
tree and you were hoping to see ASCII data, then that's certainly not
going to be any faster than just doing ls-tree instead.

In other words, I don't see your point. Either you want ascii output for 
scripting, or you don't. First you claimed that you did, and that you 
would want the tree object to change in order to do so. Now you claim that 
you can't use ls-tree because it's too slow. 

That just isn't making any sense. You're mixing two totally different
levels, and complaining about performance when scripting things. Yet
you're talking about a 20-byte data structure that is trivial to convert
to any format you want.

What kind of _strange_ scripting architecture is so fast that there's a
difference between cat-file and ls-tree and can handle 17,000 files in
60,000 revisions, yet so slow that you can't trivially convert 20 bytes of 
data?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* David Mansfield [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
 
 the patches contain all the existing metadata, dates, log messages and 
 revision history. (What i think is missing is the BK tree merge 
 information, but i'm not sure we want/need to convert them to GIT.)
 
 
 author names are abbreviated, e.g. 'viro' instead of 
 [EMAIL PROTECTED], and no committer information is 
 included (albeit commiter ought to be Linus in most cases). These are 
 limitations of the BK-CVS gateway i think.
 
 
 Glad to hear cvsps made it through!  I'm curious what the manual 
 fixups required were, except for the binary file issue (logo.gif).

--cvs-direct was needed to speed it up from 'several days to finish' to 
'several hours to finish', but it crashed on a handful of patches [i 
used the latest devel snapshot so this isnt a complaint]. (one of the 
crashes was when generating 1860.patch.) Also, 'cvs rdiff' apparently 
emits an empty patch for diffs that remove a file that end without 
having a newline character - but this isnt cvsps's problem.  (grep for 
+++ in the patchset to find those cases.)

 As to the actual email addresses, for more recent patches, the 
 Signed-off should help.  For earlier ones, isn't their some script 
 which 'knows' a bunch of canonical author-email mappings? (the 
 shortlog script or something)?

yeah, that's not that much of a problem, most of the names are unique, 
and the rest can be fixed up too.

 Is the full committer email address actually in the changeset in BK?  
 If so, given that we have the unique id (immutable I believe) of the 
 changset, could it be extracted directly from BK?

i think it's included in BK.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

  the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
  script that will apply all the patches in order and will create a 
  pristine 2.6.12-rc2 tree.
 
 Hey, that's great. I got the CVS repo too, and I was looking at it, 
 but the more I looked at it, the more I felt that the main reason I 
 want to import it into git ends up being to validate that my size 
 estimates are at all realistic.
 
 I see that Thomas Gleixner seems to have done that already, and come 
 to a figure of 3.2GB for the last three years, which I'm very happy 
 with, mainly because it seems to match my estimates to a tee. [...]

(yeah, we apparently worked in parallel - i only learned about his 
efforts after i sent my mail. He was using BK to extract info, i was 
using the CVS tree alone and no BK code whatsoever. (I dont think there 
will be any argument about who owns what, but i wanted to be on the safe 
side, and i also wanted to see how complete and usable the CVS metadata 
is - it's close to perfect i'd say, for the purposes i care about.))

 But I wonder if we actually want to actually populate the whole 
 history..

yeah, it definitely feels a bit brave to import 28,000 changesets into a 
source-code database project that will be a whopping 2 weeks old in 2 
days ;) Even if we felt 100% confident about all the basics (which we do 
of course ;), it's just simply too young to tie things down via a 3.2GB 
database. It feels much more natural to grow it gradually, 28,000 
changesets i'm afraid would just suffocate the 'project growth 
dynamics'. Not going too fast is just as important as not going too 
slow.

I didnt generate the patchset to get it added into some central 
repository right now, i generated it to check that we _do_ have all the 
revision history in an easy to understand format which does generate 
today's kernel tree, so that we can lean back and worry about the full 
database once things get a bit more settled down (in a couple of months 
or so). It's also an easy testbed for GIT itself.

but the revision history was one of the main reasons i used BK myself, 
so we'll need a merged database eventually. Occasionally i needed to 
check who was the one who touched a particular piece of code - was that 
fantastic new line of code written by me, or was that buggy piece of 
crap written by someone else? ;) Also, looking at a change and then 
going to the changeset that did it, and then looking at the full picture 
was pretty useful too. So that sort of annotation, and generally 
navigating around _quickly_ and looking at the 'flow' of changes going 
into a particular file was really useful (for me).

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread David Lang
On Sat, 16 Apr 2005, Thomas Gleixner wrote:
On Sat, 2005-04-16 at 10:04 -0700, Linus Torvalds wrote:
So I'd _almost_ suggest just starting from a clean slate after all.
Keeping the old history around, of course, but not necessarily putting it
into git now. It would just force everybody who is getting used to git in
the first place to work with a 3GB archive from day one, rather than
getting into it a bit more gradually.
Sure. We can export the 2.6.12-rc2 version of the git'ed history tree
and start from there. Then the first changeset has a parent, which just
lives in a different place.
Thats the only difference to your repository, but it would change the
sha1 sums of all your changesets.
at least start with a full release. say 2.6.11
the history won't be blank, but it's far more likly that people will care 
about the details between 2.6.11 and 2.6.12 and will want to go back 
before -rc2

David Lang
--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html