[EMAIL PROTECTED] (H. Peter Anvin) wrote on 11.04.05 in <[EMAIL PROTECTED]>:
> Followup to: <[EMAIL PROTECTED]>
> By author:Christopher Li <[EMAIL PROTECTED]>
> In newsgroup: linux.dev.kernel
> >
> > There is one problem though. How about the SHA1 hash collision?
> > Even the chance is very
On Thu, Apr 14, 2005 at 01:42:11AM +0200, Krzysztof Halasa wrote:
> Matt Mackall <[EMAIL PROTECTED]> writes:
>
> > Now if you can assume that blobs never change and are never deleted,
> > you can simply append them all onto a log, and then index them with a
> > separate file containing an htree of
Matt Mackall <[EMAIL PROTECTED]> writes:
> Now if you can assume that blobs never change and are never deleted,
> you can simply append them all onto a log, and then index them with a
> separate file containing an htree of (sha1, offset, length) or the
> like.
That mean a problem with rsync, thou
On Tue, Apr 12, 2005 at 06:10:27PM -0700, Linus Torvalds wrote:
>
>
> On Wed, 13 Apr 2005, Andrea Arcangeli wrote:
> >
> > I wasn't suggesting to use CVS. I meant that for a newly developed SCM,
> > the CVS/SCCS format as storage may be more appealing than the current
> > git format.
>
> Go wil
On Wed, 13 Apr 2005, Russell King wrote:
>
> And my entire 2.6.12-rc2 BK tree, unchecked out, is about 220MB, which
> is more dense than CVS.
>
> BK is also a lot better than CVS. So _your_ point is?
Hey, anybody who wants to argue that BK is getter than GIT won't be
getting any counter-argu
On Tue, Apr 12, 2005 at 06:10:27PM -0700, Linus Torvalds wrote:
> Go wild. I did mine in six days, and you've been whining about other
> peoples SCM's for three years.
Even if I spend 6 days doing git, you'd never have thrown away BK in
exchange for git.
> In other words - go and _do_ something
On Wed, Apr 13, 2005 at 10:30:52AM +0100, Russell King wrote:
> And my entire 2.6.12-rc2 BK tree, unchecked out, is about 220MB, which
> is more dense than CVS.
Yep, this is why I mentioned SCCS format too, I didn't know it was even
smaller, but I expected a similar density from SCCS.
> Note: I'm
On Tue, Apr 12, 2005 at 04:45:07PM -0700, Linus Torvalds wrote:
> On Wed, 13 Apr 2005, Andrea Arcangeli wrote:
> > At the rate of 9M for every 198 changeset checkins, that means I'll have
> > to download 2.7G _uncompressible_ (i.e. already compressed with a bad
> > per-file ratio due the too-small
Hi, Linus Torvalds schrub am Tue, 12 Apr 2005 15:49:07 -0700:
>> Have to tried to import it?
>
> It would take days.
You can always import it later and then graft it into the commit tree.
That would of course change *every* commit node, but so what? They're
small, and you can delete the old o
On Wed, 13 Apr 2005, Andrea Arcangeli wrote:
>
> I wasn't suggesting to use CVS. I meant that for a newly developed SCM,
> the CVS/SCCS format as storage may be more appealing than the current
> git format.
Go wild. I did mine in six days, and you've been whining about other
peoples SCM's for
On Tue, Apr 12, 2005 at 04:45:07PM -0700, Linus Torvalds wrote:
> Yes. CVS is much denser.
>
> CVS is also total crap. So your point is?
I wasn't suggesting to use CVS. I meant that for a newly developed SCM,
the CVS/SCCS format as storage may be more appealing than the current
git format. I guess
On Tue, Apr 12, 2005 at 02:21:58PM -0700, Linus Torvalds wrote:
> The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one
> and a test-run of 198 patches from Andrew) is 111MB. In other words,
> adding 198 "full" new kernels only grew the archive by 9MB (that's all
> "actual disk u
Hi David,
On Tue, Apr 12, 2005 at 06:36:23PM -0400, David Eger wrote:
> > No. A tree is not the full data. A tree contains enough information
> > to
> > _recreate_ the full data, but the tree itself just tells you _how_
> > to do
> > that. It doesn't contain very much of the data itself at all.
On Wed, 13 Apr 2005, Andrea Arcangeli wrote:
>
> At the rate of 9M for every 198 changeset checkins, that means I'll have
> to download 2.7G _uncompressible_ (i.e. already compressed with a bad
> per-file ratio due the too-small files) for a whole pack including all
> changesets without accounti
On Wed, 13 Apr 2005, Krzysztof Halasa wrote:
>
> Does that mean that the 64 K changes imported from bk would take ~ 3 GB?
> Is that real?
That's a _guess_.
> Have to tried to import it?
It would take days.
> I'm going to import the CVS data (with cvsps) - as the CVS "misses" half
> the chan
Linus Torvalds <[EMAIL PROTECTED]> writes:
> The full .git archive for 199 versions of the kernel (the 2.6.12-rc2 one
> and a test-run of 198 patches from Andrew) is 111MB. In other words,
> adding 198 "full" new kernels only grew the archive by 9MB (that's all
> "actual disk usage" btw - the file
On Tue, Apr 12, 2005 at 02:21:58PM -0700, Linus Torvalds wrote:
>
> Yes. A tree is defined by the blobs it references (and the subtrees) but
> it doesn't _contain_ them. It just contains a pointer to them.
A pointer to them? You mean a SHA1 hash of them? or what?
Where is the *real* data stored
On Tue, 12 Apr 2005, David Eger wrote:
>
> The reason I am questioning this point is the GIT README file.
>
> Linus makes explicit that a "blob" is just the "file contents," and that
> really, a "blob" is not just the SHA1 of the "blob":
>
> > In particular, the "current directory cache" certa
The reason I am questioning this point is the GIT README file.
Linus makes explicit that a "blob" is just the "file contents," and that
really, a "blob" is not just the SHA1 of the "blob":
> In particular, the "current directory cache" certainly does not need to
> be consistent with the current
On Sun, Apr 10, 2005 at 09:01:22AM -0700, Linus Torvalds wrote:
>
> So I was for a while debating having a totally flat directory space, but
> since there are _some_ downsides (linear lookup for cold-cache, and just
> that "ls -l" ends up being O(n**2) and things), I decided that a single
> fan
Dear diary, on Tue, Apr 12, 2005 at 06:05:19AM CEST, I got a letter
where David Eger <[EMAIL PROTECTED]> told me that...
> So with git, *every* changeset is an entire (compressed) copy of the
> kernel. Really? Every patch you accept adds 37 MB to your hard disk?
>
> Am I missing something here?
On Mon, Apr 11, 2005 at 10:14:13PM -0700, David Lang wrote:
> I've been reading this and have another thought for you guys to keep in
> mind for this tool.
>
> version control of system config files on linux systems.
I've been thinking about this too. (I won't have time to implement this
however
So with git, *every* changeset is an entire (compressed) copy of the
kernel. Really? Every patch you accept adds 37 MB to your hard disk?
Am I missing something here?
-dte
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to [EMAIL PROTECTED]
Mor
David wrote:
> and version control your entire system
Yeah - that works. That's how I back up my system. Not
git actually, but a similar sort of store (no compression,
a line oriented ascii 'index' file).
See my post on "Kernel SCM saga..", Sat, 9 Apr 2005 08:15:53 -0700,
Message-Id: <[EMAIL PR
;, [EMAIL PROTECTED], [EMAIL PROTECTED],
[EMAIL PROTECTED], linux-kernel@vger.kernel.org
Subject: Re: more git updates..
I see. It just need some basic set operation (+, -, and)
and some way to select a set:
sha5--->
/
/
sha1-->sha2-->sha3--
\/
\ /
>
Dear diary, on Mon, Apr 11, 2005 at 05:49:31PM CEST, I got a letter
where "Randy.Dunlap" <[EMAIL PROTECTED]> told me that...
> On Sun, 10 Apr 2005 16:38:00 -0700 (PDT) Linus Torvalds wrote:
..snip..
> | Yes. Crappy old tree, but it can still read my git.git directory, so you
> | can use it to upda
On Sat, Apr 09, 2005 at 12:45:52PM -0700, Linus Torvalds wrote:
> Can you guys re-send the scripts you wrote? They probably need some
> updating for the new semantics. Sorry about that ;(
I've been off email this weekend, so have fallen a bit behind here.
I'll forgo updating my stuff, since it lo
On Sun, 10 Apr 2005 16:38:00 -0700 (PDT) Linus Torvalds wrote:
|
|
| On Sun, 10 Apr 2005, Paul Jackson wrote:
| >
| > Useful explanation - thanks, Linus.
|
| Hey. You're welcome. Especially when you create good documentation for
| this thing.
|
| Because:
|
| > Is this picture and descriptio
Followup to: <[EMAIL PROTECTED]>
By author:Christopher Li <[EMAIL PROTECTED]>
In newsgroup: linux.dev.kernel
>
> There is one problem though. How about the SHA1 hash collision?
> Even the chance is very remote, you don't want to lose some data do due
> to "software" error. I think it is OK th
On Mon, 2005-04-11 at 01:04 +0200, Bernd Eckenfels wrote:
> In article <[EMAIL PROTECTED]> you wrote:
> > (I repeat the xxx in the leaf name - easier to code.)
>
> It is a bit OT, but just a note: there are file systems (hash functions) out
> there who dont like a lot of files named the same way.
bert hubert <[EMAIL PROTECTED]> writes:
> On Sun, Apr 10, 2005 at 03:38:39PM -0700, Linus Torvalds wrote:
>
> > compressed with zlib, they are all named by the sha1 file, and they all
>
> Now I know this is a concious decision, but recent zlib allows you to write
> out gzip content, at a cost o
On Sun, Apr 10, 2005 at 03:38:39PM -0700, Linus Torvalds wrote:
> compressed with zlib, they are all named by the sha1 file, and they all
Now I know this is a concious decision, but recent zlib allows you to write
out gzip content, at a cost of 14 bytes I think per file, by adding 32 to
the wind
I see. It just need some basic set operation (+, -, and)
and some way to select a set:
sha5--->
/
/
sha1-->sha2-->sha3--
\/
\ /
>sha4
list sha1 # all the file list in changeset sha1
# {sha1}
list sha1,sha1
Linus writes:
> Hey. You're welcome. Especially when you create good documentation for
> this thing.
Glad to be of service. Sounds like the umbrella in your foofy
drink drink will come in handy - keeping off the rain.
--
I won't rest till it's the best ...
P
Dear diary, on Mon, Apr 11, 2005 at 01:14:57AM CEST, I got a letter
where Paul Jackson <[EMAIL PROTECTED]> told me that...
> Useful explanation - thanks, Linus.
>
> Is this picture and description accurate:
>
> ==
>
>
> <
On Sun, 10 Apr 2005, Paul Jackson wrote:
>
> Useful explanation - thanks, Linus.
Hey. You're welcome. Especially when you create good documentation for
this thing.
Because:
> Is this picture and description accurate:
[ deleted, but I'll probably try to put it in an explanation file
somewh
On Sun, 10 Apr 2005, Christopher Li wrote:
>
> How about deleting trees from the caches? I don't need to delete stuff from
> the official tree. It is more for my local version control.
I have a plan. Namely to have a "list-needed" command, which you give one
commit, and a flag implying how much
Useful explanation - thanks, Linus.
Is this picture and description accurate:
==
< working directory files (foo.c) >
^
^|
| upward ops|downwar
In article <[EMAIL PROTECTED]> you wrote:
> (I repeat the xxx in the leaf name - easier to code.)
It is a bit OT, but just a note: there are file systems (hash functions) out
there who dont like a lot of files named the same way. For example NTFS with
the 8.3 short names.
Greetings
Bernd
-
To uns
On Sun, Apr 10, 2005 at 03:38:39PM -0700, Linus Torvalds wrote:
>
>
> On Sun, 10 Apr 2005, Christopher Li wrote:
> >
> > BTW, one thing I learn from ext3 is that it is very useful to have some
> > compatible flag for future development. I think if we want to reserve some
> > room in the file for
On Sun, 10 Apr 2005, Christopher Li wrote:
>
> BTW, one thing I learn from ext3 is that it is very useful to have some
> compatible flag for future development. I think if we want to reserve some
> room in the file format for further development of git
Way ahead of you.
This is (one reason) wh
Dear diary, on Sun, Apr 10, 2005 at 08:42:53PM CEST, I got a letter
where Christopher Li <[EMAIL PROTECTED]> told me that...
> I totally agree that odds is really really small.
> That is why it is not worthy to handle the case. People hit that
> can just add a new line or some thing to avoid it, if
On Sun, Apr 10, 2005 at 01:57:33PM -0700, Linus Torvalds wrote:
>
> > That way of thinking really doesn't work well here.
> >
> > I will have to look more closely at pasky's GIT toolkit
> > if I want to see an SCM style interface.
>
> Yes. You really should think of GIT as a filesystem, and of m
Dear diary, on Mon, Apr 11, 2005 at 12:07:37AM CEST, I got a letter
where "Luck, Tony" <[EMAIL PROTECTED]> told me that...
..snip..
> >Hey, I may end up being wrong, and yes, maybe I should have done a
> >two-level one. The good news is that we can trivially fix it later (even
> >dynamically - we
>Also, I did actually debate that issue with myself, and decided that even
>if we do have tons of files per directory, git doesn't much care. The
>reason? Git never _searches_ for them. Assuming you have enough memory to
>cache the tree, you just end up doing a "lookup", and inside the kernel
>that
I totally agree that odds is really really small.
That is why it is not worthy to handle the case. People hit that
can just add a new line or some thing to avoid it, if
it happen after all.
It is the little peace of mind to know for sure that did
not happen. I am just paranoid.
Chris
On Sun, Ap
On Sun, 10 Apr 2005, Paul Jackson wrote:
>
> Ah ha - that explains the read-tree and write-tree names.
>
> The read-tree pulls stuff out of this file system into
> your working files, clobbering local edits. This is like
> the read(2) system call, which clobbers stuff in your
> read buffer.
Y
> Some thing like the following patch, may be turn off able.
Take out an old envelope and compute on it the odds of this
happening.
Say we have 10,000 kernel hackers, each producing one
new file every minute, for 100 hours a week. And we've
cloned a small army of Andrew Morton's to integrate
the
Linus wrote:
> It's a filesystem - although a
> fairly strange one.
Ah ha - that explains the read-tree and write-tree names.
The read-tree pulls stuff out of this file system into
your working files, clobbering local edits. This is like
the read(2) system call, which clobbers stuff in your
rea
Tony wrote:
> Or maybe the files should be named objects/xx/yy/?
I tend to size these things with the square root of the number of
leaf nodes. If I have 2,560,000 leaves (your 10,000 files in each
of 16*16 directories), then I will aim for 1600 directories of
1600 leaves each.
My
* Rik van Riel <[EMAIL PROTECTED]> wrote:
> GCC 4 isn't very happy. Mostly sign changes, but also something that
> looks like a real error:
>
> gcc -g -O3 -Wall -c -o fsck-cache.o fsck-cache.c
> fsck-cache.c: In function 'main':
> fsck-cache.c:59: warning: control may reach end of non-void f
Ralph wrote:
> but good enough for
> most uses that people will get caught out when it fails.
Exactly.
If Linus persists in this diff-tree output format, using two lines for
changed files, then I will have to add the following sed script to my
arsenal:
sed '/^/ / }'
It collapses pairs of line
On Sat, 9 Apr 2005, Linus Torvalds wrote:
> I've rsync'ed the new git repository to kernel.org, it should all be there
> in /pub/linux/kernel/people/torvalds/git.git/ (and it looks like the
> mirror scripts already picked it up on the public side too).
GCC 4 isn't very happy. Mostly sign changes
On Sun, Apr 10, 2005 at 08:44:56AM -0700, Linus Torvalds wrote:
>
>
> On Sun, 10 Apr 2005, Junio C Hamano wrote:
> >
> > But I am wondering what your plans are to handle renames---or
> > does git already represent them?
>
> You can represent renames on top of git - git itself really doesn't car
On Sat, 9 Apr 2005 [EMAIL PROTECTED] wrote:
>
> With 60,000 changesets in the current tree, we will start out our git
> repository with about 600,000 files. Assuming the first byte of the
> SHA1 hash is random, that means an average of 2343 files in each of the
> objects/xx directories. Give it
On Sun, 10 Apr 2005, Junio C Hamano wrote:
>
> But I am wondering what your plans are to handle renames---or
> does git already represent them?
You can represent renames on top of git - git itself really doesn't care.
In many ways you can just see git as a filesystem - it's content-
addressab
>In other words, each "commit" file is very small and cheap, but since
>almost every commit will also imply a totally new tree-file, "git" is
>going to have an overhead of half a megabyte per commit. Oops.
>
>Damn, that's painful. I suspect I will have to change the format somehow.
Having dodged
Hi,
Christopher Li wrote:
> On Sat, Apr 09, 2005 at 04:31:10PM -0700, Linus Torvalds wrote:
> > NOTE! This means that each "tree" file basically tracks just a
> > single directory. The old style of "every file in one tree file"
> > still works, but fsck-cache will warn about it. Happily, the git
>handle by pure rename only plus the extra delta. The current git don't
>have per file change history. From git's point of view some file deleted
>and the other file appeared with same content.
>
>It is the top level SCM to handle that correctly.
>Rename a directory will be even more fun.
But from
On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> Listing the file paths and their sigs included in a tree to make
> a snapshot of a tree state sounds fine, and diffing two trees by
> looking at the sigs between two such files sounds fine as well.
>
> But I am wondering what your p
Hi Paul,
> Ralph wrote:
> > Watch out for when xargs invokes do_something more than once and the
> > `<' is parsed by a different one than the `>'.
>
> It will take a pretty long list to do that. It seems that GNU xargs
> on top of a Linux kernel has a 128 KByte ARG_MAX.
I didn't realise it wa
On Sun, Apr 10, 2005 at 11:41:53AM +0200, Petr Baudis wrote:
> Dear diary, on Sun, Apr 10, 2005 at 07:53:40AM CEST, I got a letter
> where Christopher Li <[EMAIL PROTECTED]> told me that...
> > On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> > >
> > > But I am wondering what your
On Sun, Apr 10, 2005 at 02:28:54AM -0700, Junio C Hamano wrote:
> > "CL" == Christopher Li <[EMAIL PROTECTED]> writes:
>
> CL> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> >>
> >> But I am wondering what your plans are to handle renames---or
> >> does git already represen
On Sat, Apr 09, 2005 at 04:31:10PM -0700, Linus Torvalds wrote:
>
> Done, and pushed out. The current git.git repository seems to do all of
> this correctly.
>
> NOTE! This means that each "tree" file basically tracks just a single
> directory. The old style of "every file in one tree file" stil
Dear diary, on Sun, Apr 10, 2005 at 11:28:54AM CEST, I got a letter
where Junio C Hamano <[EMAIL PROTECTED]> told me that...
> > "CL" == Christopher Li <[EMAIL PROTECTED]> writes:
>
> CL> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> >>
> >> But I am wondering what your pl
Dear diary, on Sun, Apr 10, 2005 at 07:53:40AM CEST, I got a letter
where Christopher Li <[EMAIL PROTECTED]> told me that...
> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
> >
> > But I am wondering what your plans are to handle renames---or
> > does git already represent them?
Previously Christopher Li wrote:
> Rename should just work. It will create a new tree object and you
> will notice that in the entry that changed, the hash for the blob
> object is the same.
What if you rename and change a file within a changeset?
Wichert.
--
Wichert Akkerman <[EMAIL PROTECTED
> "CL" == Christopher Li <[EMAIL PROTECTED]> writes:
CL> On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
>>
>> But I am wondering what your plans are to handle renames---or
>> does git already represent them?
>>
CL> Rename should just work. It will create a new tree object
On Sun, Apr 10, 2005 at 12:51:59AM -0700, Junio C Hamano wrote:
>
> But I am wondering what your plans are to handle renames---or
> does git already represent them?
>
Rename should just work. It will create a new tree object and you
will notice that in the entry that changed, the hash for the bl
Listing the file paths and their sigs included in a tree to make
a snapshot of a tree state sounds fine, and diffing two trees by
looking at the sigs between two such files sounds fine as well.
But I am wondering what your plans are to handle renames---or
does git already represent them?
-
To uns
Dear diary, on Sun, Apr 10, 2005 at 01:31:10AM CEST, I got a letter
where Linus Torvalds <[EMAIL PROTECTED]> told me that...
> On Sat, 9 Apr 2005, Linus Torvalds wrote:
> >
> > Actually, I guess I wouldn't have to change the format. I could just
> > extend the existing "tree" object to be able to
>From before:
The sha1 (ascii) digests for 16817 files take:
689497 bytes before compression
397475 bytes after minigzip
New numbers:
The sha1 (binary) digests for 16817 files take:
336340 bytes before compression
334943 bytes after minigzip
So compressing bina
> Then a "tree" object would point to a "directory" object,
Ah - light bulb flickers - in _separate_ files.
Yes, that obviously makes a difference.
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[EMAIL PR
Linus wrote:
> Damn, that's painful. I suspect I will have to change the format somehow.
The sha1 (ascii) digests for 16817 files take:
689497 bytes before compression
397475 bytes after minigzip
The pathnames, relative to top of tree, for these 16817
files take:
503983
Bernd wrote:
> more parser friendly to have single records for diffs.
good point
[looks like you trimmed the cc list - folks around here don't like that ;)]
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul Jackson <[
In article <[EMAIL PROTECTED]> you wrote:
> Ralph wrote:
>> Watch out for when xargs invokes do_something more than once and the `<'
>> is parsed by a different one than the `>'.
> It will take a pretty long list to do that. It seems that
> GNU xargs on top of a Linux kernel has a 128 KByte ARG_MA
Ralph wrote:
> Watch out for when xargs invokes do_something more than once and the `<'
> is parsed by a different one than the `>'.
It will take a pretty long list to do that. It seems that
GNU xargs on top of a Linux kernel has a 128 KByte ARG_MAX.
In the old days, with 4 KByte ARG_MAX limits,
On Sat, 9 Apr 2005, Linus Torvalds wrote:
>
> Actually, I guess I wouldn't have to change the format. I could just
> extend the existing "tree" object to be able to point to other trees, and
> that's it.
Done, and pushed out. The current git.git repository seems to do all of
this correctly.
Hi Linus,
> Btw, the NUL-termination makes this really easy to use even in shell
> scripts, ie you can do
>
> diff-tree | xargs -0 do_something
>
> and you'll get each line as one nice argument to your "do_something"
> script. So a do_diff could be based on something like
>
> #!/
Linus wrote:
> the NUL-termination makes this really easy to use even in shell
grumble ...
> I still use the old tools I learnt to use fifteen years ago
new comer ;)
--
I won't rest till it's the best ...
Programmer, Linux Scalability
Paul
On Sat, 9 Apr 2005, Linus Torvalds wrote:
>
> I suspect that I have to change the file format. Maybe make the "tree"
> object a two-level thing, and have a "directory" object.
>
> Then a "tree" object would point to a "directory" object, which would in
> turn point to the individual files (and
On Sat, 9 Apr 2005, Petr Baudis wrote:
>
> > Also, I wrote the "diff-tree" thing I talked about:
> ..snip..
>
> Hmm, I wonder, is this better done in C instead of a simple shell
> script, like my gitdiff.sh?
With 17,000 files in the kernel, and most commits just changing a small
number of th
Hello,
Dear diary, on Sat, Apr 09, 2005 at 09:45:52PM CEST, I got a letter
where Linus Torvalds <[EMAIL PROTECTED]> told me that...
> The good news is, the data structures/indexes haven't changed, but many of
> the tools to interface with them have new (and improved!) semantics:
>
> In particula
On Sat, 9 Apr 2005, Linus Torvalds wrote:
>
> To actually change the working directory, you'd first get the index file
> setup, and then you do a "checkout-cache -a" to update the files in your
> working directory with the files from the sha1 database.
Btw, this will not overwrite any old files
84 matches
Mail list logo