Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread David Woodhouse
On Wed, 2005-04-20 at 07:59 -0700, Linus Torvalds wrote: > external-parent > comment for this parent > > and the nice thing about that is that now that information allows you to > add external parents at any point. > > Why do it like this? First off, I think that the "

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Linus Torvalds wrote: > > It would be nicer for the cache to make the index file "header" be a > "footer", and write it out last - that way we'd be able to do the SHA1 as > we write rather than doing a two-pass thing. That's for another time. That other time was now. The

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Chris Mason wrote: > > Well, the difference there should be pretty hard to see with any benchmark. > But I was being lazy...new patch attached. This one gets the same perf > numbers, if this is still wrong then I really need some more coffee. I did my preferred version. M

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 13:52, Linus Torvalds wrote: > On Wed, 20 Apr 2005, Chris Mason wrote: > > The patch below with your current tree brings my 100 patch test down to > > 22 seconds again. > > If you ever have a cache_entry bigger than 16384, your code will write > things out in the wrong or

Re: [PATCH] write-tree performance problems

2005-04-20 Thread David S. Miller
On Wed, 20 Apr 2005 10:06:15 -0700 (PDT) Linus Torvalds <[EMAIL PROTECTED]> wrote: > I bet your SHA1 implementation is done with hand-optimized and scheduled > x86 MMX code or something, while my poor G5 is probably using some slow > generic routine. As a result, it only improved by 33% for me sin

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Chris Mason wrote: > > The patch below with your current tree brings my 100 patch test down to 22 > seconds again. If you ever have a cache_entry bigger than 16384, your code will write things out in the wrong order (write the new cache without flushing the old buffer).

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 13:06, Linus Torvalds wrote: > On Wed, 20 Apr 2005, Chris Mason wrote: > > At any rate, the time for a single write-tree is pretty consistent. > > Before it was around .5 seconds, and with this change it goes down to > > .128s. > > Oh, wow. > > I bet your SHA1 implementa

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Chris Mason wrote: > > At any rate, the time for a single write-tree is pretty consistent. Before > it > was around .5 seconds, and with this change it goes down to .128s. Oh, wow. I bet your SHA1 implementation is done with hand-optimized and scheduled x86 MMX code or

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Linus Torvalds wrote: > > NO! Don't see if this works. For the "sha1 file already exists" file, it > forgot to return the SHA1 value in "returnsha1", and would thus corrupt > the trees it wrote. Proper version with fixes checked in. For me, it brings down the time to writ

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 11:40, Linus Torvalds wrote: > On Wed, 20 Apr 2005, Chris Mason wrote: > > Thanks for looking at this. Your new tree is faster, it gets the commit > > 100 patches time down from 1m5s to 50s. > > It really _shouldn't_ be faster. It still does the compression, and throws >

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 05:57:34PM +0200, Martin Uecker wrote: > On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote: > > > Yes, I guess this is the detail I was going to abandon. =) > > > > I viewed the fact that the top-level hash was dependent on the exact chunk > > makeup a 'mis

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Linus Torvalds wrote: > > To actually go faster, it _should_ need this patch. Untested. See if it > works.. NO! Don't see if this works. For the "sha1 file already exists" file, it forgot to return the SHA1 value in "returnsha1", and would thus corrupt the trees it wrote

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, C. Scott Ananian wrote: > > OK, sure. But how 'bout chunking trees? Are you grown happy with the new > trees-reference-other-trees paradigm, or is there a deep longing in your > heart for the simplicity of 'trees-reference-blobs-period'? I'm pretty sure we do better chu

Re: [PATCH] write-tree performance problems

2005-04-20 Thread David Willmore
On 4/20/05, Linus Torvalds <[EMAIL PROTECTED]> wrote: > It really _shouldn't_ be faster. It still does the compression, and throws > the end result away. Am I misunderstanding or is the proglem that doing: -> compress -> sha1 -> compare with existing hash is expensive? What about doing: -> unc

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 11:28:20AM -0400, C. Scott Ananian wrote: Hi, > A merkle-tree (which I think you initially pointed me at) makes the hash > of the internal nodes be a hash of the chunk's hashes; ie not a straight > content hash. This is roughly what my current implementation does, but

Re: [PATCH] write-tree performance problems

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Linus Torvalds wrote: I was considering using a chunked representation for *all* files (not just blobs), which would avoid the original 'trees must reference other trees or they become too large' issue -- and maybe the performance issue you're referring to, as well? No. The mos

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, C. Scott Ananian wrote: > > Hmm. Are our index files too large, or is there some other factor? They _are_ pretty large, but they have to be, For the kernel, the index file is about 1.6MB. That's - 17,000+ files and filenames - stat information for all of them - the s

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Chris Mason wrote: > > Thanks for looking at this. Your new tree is faster, it gets the commit 100 > patches time down from 1m5s to 50s. It really _shouldn't_ be faster. It still does the compression, and throws the end result away. To actually go faster, it _should_ nee

Re: [PATCH] write-tree performance problems

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Chris Mason wrote: With the basic changes I described before, the 100 patch time only goes down to 40s. Certainly not fast enough to justify the changes. In this case, the bulk of the extra time comes from write-tree writing the index file, so I split write-tree.c up into li

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Martin Uecker wrote: You can (and my code demonstrates/will demonstrate) still use a whole-file hash to use chunking. With content prefixes, this takes O(N ln M) time (where N is the file size and M is the number of chunks) to compute all hashes; if subtrees can share the same

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 02:43, Linus Torvalds wrote: > On Tue, 19 Apr 2005, Chris Mason wrote: > > I'll finish off the patch once you ok the basics below. My current code > > works like this: > > Chris, before you do anything further, let me re-consider. > > Assuming that the real cost of write

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 10:30:15AM -0400, C. Scott Ananian wrote: Hi, your code looks pretty cool. thank you! > On Wed, 20 Apr 2005, Martin Uecker wrote: > > >The other thing I don't like is the use of a sha1 > >for a complete file. Switching to some kind of hash > >tree would allow to introduc

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Linus Torvalds
On Thu, 21 Apr 2005, David Woodhouse wrote: > > The reason for doing this is that without it, we can't ever have a full > history actually connected to the current trees. There'd always be a > break at 2.6.12-rc2, at which point you'd have to switch to an entirely > different git repository. Qu

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Linus Torvalds wrote: - _keep_ the same compression format, but notice that we already have an object by looking at the uncompressed one. With a chunked file, you can also skip writing certain *subtrees* of the file as soon as you notice it's already present on disk. I can

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread C. Scott Ananian
On Wed, 20 Apr 2005, Martin Uecker wrote: The other thing I don't like is the use of a sha1 for a complete file. Switching to some kind of hash tree would allow to introduce chunks later. This has two advantages: You can (and my code demonstrates/will demonstrate) still use a whole-file hash to us

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Linus Torvalds
On Wed, 20 Apr 2005, Jon Seymour wrote: > > Am I correct to understand that with this change, all the objects in the > database are still being compressed (so no net performance benefit), but by > doing the SHA1 calculations before compression you are keeping open the > possibility that at so

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread David Woodhouse
On Wed, 2005-04-20 at 02:08 -0700, Linus Torvalds wrote: > I converted my git archives (kernel and git itself) to do the SHA1 > hash _before_ the compression phase. I'm happy to see that -- because I'm going to be asking you to make another change which will also require a simple repository conver

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Jon Seymour
> The main point is not about trying different compression > techniques but that you don't need to compress at all just > to calculate the hash of some data. (to know if it is > unchanged for example) > Ah, ok, I didn't understand that there were extra compresses being performed for that reason.

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Morten Welinder
On 4/20/05, Martin Uecker <[EMAIL PROTECTED]> wrote: > The storage method of the database of a collection of > files in the underlying file system. Because of the > random nature of the hashes this leads to a horrible > amount of seeking for all operations which walk the > logical structure of som

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Martin Uecker
On Wed, Apr 20, 2005 at 10:11:10PM +1000, Jon Seymour wrote: > On 4/20/05, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > > > > I converted my git archives (kernel and git itself) to do the SHA1 hash > > _before_ the compression phase. > > > > Linus, > > Am I correct to understand that with

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Jon Seymour
On 4/20/05, Linus Torvalds <[EMAIL PROTECTED]> wrote: > > > I converted my git archives (kernel and git itself) to do the SHA1 hash > _before_ the compression phase. > Linus, Am I correct to understand that with this change, all the objects in the database are still being compressed (so no n

Re: WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Ingo Molnar
* Linus Torvalds <[EMAIL PROTECTED]> wrote: > So to convert your old git setup to a new git setup, do the following: > [...] did this for two repositories (git and kernel-git), it works as advertised. Ingo - To unsubscribe from this list: send the line "unsubscribe git" in the body of

WARNING! Object DB conversion (was Re: [PATCH] write-tree performance problems)

2005-04-20 Thread Linus Torvalds
I converted my git archives (kernel and git itself) to do the SHA1 hash _before_ the compression phase. So I'll just have to publically admit that everybody who complained about that particular design decision was right. Oh, well. On Wed, 20 Apr 2005, H. Peter Anvin wrote: > Linus Torvalds wr

Re: [PATCH] write-tree performance problems

2005-04-20 Thread H. Peter Anvin
Linus Torvalds wrote: So I'll see if I can turn the current fsck into a "convert into uncompressed format", and do a nice clean format conversion. Just let me know what you want to do, and I can trivially change the conversion scripts I've already written to do what you want. -hpa - To

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: > > I'll finish off the patch once you ok the basics below. My current code > works > like this: Chris, before you do anything further, let me re-consider. Assuming that the real cost of write-tree is the compression (and I think it is), I really susp

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: > > 5) right before exiting, write-tree updates the index if it made any changes. This part won't work. It needs to do the proper locking, which means that it needs to create "index.lock" _before_ it reads the index file, and write everything to that on

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Christopher Li
On Tue, Apr 19, 2005 at 04:59:18PM -0700, Linus Torvalds wrote: > > However, it definitely wouldn't be useful for _me_. The whole thing that > I'm after is to allow painless merging of distributed work. If I have to > merge one patch at a time, I'd much rather see people send me patches > directly

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
On Tuesday 19 April 2005 17:23, Linus Torvalds wrote: > On Tue, 19 Apr 2005, Chris Mason wrote: > > Regardless, putting it into the index somehow should be fastest, I'll see > > what I can do. > > Start by putting it in at "read-tree" time, and adding the code to > invalidate all parent directory i

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, David Lang wrote: > > > > If so, he should set up one repository per quilt patch. > > a tool to do this automaticaly is what I was trying to suggest (and asking > if it would be useful) Heh. It's certainly possible. Esepcially with the object sharing, you could create a g

Re: [PATCH] write-tree performance problems

2005-04-19 Thread David Lang
On Tue, 19 Apr 2005, Linus Torvalds wrote: On Tue, 19 Apr 2005, David Lang wrote: if you are useing quilt for locally developed patches I fully agree with you, but I was thinking of the case where Andrew is receiving independant patches from lots of people and storing them in quilt for testing, and

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, David Lang wrote: > > if you are useing quilt for locally developed patches I fully agree with > you, but I was thinking of the case where Andrew is receiving independant > patches from lots of people and storing them in quilt for testing, and > then sending them on to yo

Re: [PATCH] write-tree performance problems

2005-04-19 Thread David Lang
On Tue, 19 Apr 2005, Linus Torvalds wrote: On Tue, 19 Apr 2005, David Lang wrote: what if you turned the forest of quilt patches into a forest of git trees? (essentially applying each patch against the baseline seperatly) would this make sense or be useful? It has a certain charm, but the fact is,

Re: [PATCH] write-tree performance problems

2005-04-19 Thread C. Scott Ananian
On Tue, 19 Apr 2005, Linus Torvalds wrote: (*) Actually, I think it's the compression that ends up being the most expensive part. You're also using the equivalent of '-9', too -- and *that's slow*. Changing to Z_NORMAL_COMPRESSION would probably help a lot (but would break all existing repositories

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, David Lang wrote: > > what if you turned the forest of quilt patches into a forest of git trees? > (essentially applying each patch against the baseline seperatly) would > this make sense or be useful? It has a certain charm, but the fact is, it gets really messy to sort

Re: [PATCH] write-tree performance problems

2005-04-19 Thread David Lang
On Tue, 19 Apr 2005, Linus Torvalds wrote: On Tue, 19 Apr 2005, Chris Mason wrote: Very true, you can't replace quilt with git without ruining both of them. But it would be nice to take a quilt tree and turn it into a git tree for merging purposes, or to make use of whatever visualization tools m

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: > > Regardless, putting it into the index somehow should be fastest, I'll see > what > I can do. Start by putting it in at "read-tree" time, and adding the code to invalidate all parent directory indexes when somebody changes a file in the index (ie "up

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
On Tuesday 19 April 2005 15:03, Linus Torvalds wrote: > On Tue, 19 Apr 2005, Chris Mason wrote: > > Very true, you can't replace quilt with git without ruining both of them. > > But it would be nice to take a quilt tree and turn it into a git tree > > for merging purposes, or to make use of whatev

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: > > Very true, you can't replace quilt with git without ruining both of them. > But > it would be nice to take a quilt tree and turn it into a git tree for merging > purposes, or to make use of whatever visualization tools might exist someday. > Fa

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Olivier Galibert
On Tue, Apr 19, 2005 at 10:36:06AM -0700, Linus Torvalds wrote: > In fact, git has all the same issues that BK had, and for the same > fundamental reason: if you do distributed work, you have to always > "append" stuff, and that means that you can never re-order anything after > the fact. You c

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
On Tuesday 19 April 2005 13:36, Linus Torvalds wrote: > On Tue, 19 Apr 2005, Chris Mason wrote: > > I did a quick experiment with applying/commit 100 patches from the suse > > kernel into a kernel git tree, which quilt can do in 2 seconds. git > > needs 1m5s. > > Note that I don't think you want t

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Linus Torvalds
On Tue, 19 Apr 2005, Chris Mason wrote: > > I did a quick experiment with applying/commit 100 patches from the suse > kernel > into a kernel git tree, which quilt can do in 2 seconds. git needs 1m5s. Note that I don't think you want to replace quilt with git. The approaches are totally diff