Re: [PATCH] enable core.fsyncObjectFiles by default

2018-01-21 Thread Chris Mason

On 01/18/2018 11:27 AM, Christoph Hellwig wrote:

[adding Chris to the Cc list - this is about the awful ext3 data=ordered
behavior of syncing the whole file system data and metadata on each
fsync]

On Wed, Jan 17, 2018 at 03:57:53PM -0800, Linus Torvalds wrote:

On Wed, Jan 17, 2018 at 3:52 PM, Theodore Ts'o  wrote:


Well, let's be fair; this is something *ext3* got wrong, and it was
the default file system back them.


I'm pretty sure reiserfs and btrfs did too..


I'm pretty sure btrfs never did, and reiserfs at least looks like
it currently doesn't but I'd have to dig into history to check if
it ever did.



Christoph has this right, btrfs only fsyncs the one file that you've 
asked for, and unrelated data/metadata won't be included.


We've seen big fsync stalls on ext4 caused by data=ordered, so it's 
still possible to trigger on ext4, but much better than ext3.


I do share Ted's concern about the perf impact of the fsyncs on tons of 
loose files, but the defaults should be safety first.


-chris


Re: [PATCH] multi item packed files

2005-04-22 Thread Chris Mason
On Friday 22 April 2005 16:32, Chris Mason wrote:

> If I pack every 64k (uncompressed), the checkout-tree time goes down to
> 3m14s. That's a very big difference considering how stupid my code is  .git
> was only 20% smaller with 64k chunks.  I should be able to do better...I'll
> do one more run.
>

This run also packed tree files together (everything produced by write-tree 
went into a packed file), but not the commits.  I estimate I could save about 
another 168m by packing the tree files and commits into the same file with 
the blobs, but this wouldn't make any of the times below faster.

git - original (28k commits)packed
FS size2,675,408k   1,723,820k
read-tree24.45s 18.9s
checkout-cache   4m30s  3m5s
patch time 2h30m1h55m

The format for the packed files could be smarter, such that it didn't require 
decompressing the whole packed file to read one item.  I would guess I could 
get another 20% checkout-cache performance out of it via more tuning, and 
probably another 10% of space savings.

Of course, none of this is likely to convince you ;)  If you decide later on 
it's worthwhile, I don't think it would be difficult to add then.

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] multi item packed files

2005-04-21 Thread Chris Mason
On Thursday 21 April 2005 18:47, Linus Torvalds wrote:
> On Thu, 21 Apr 2005, Chris Mason wrote:
> > Shrug, we shouldn't need help from the kernel for something like this. 
> > git as a database hits worst case scenarios for almost every FS.

[ ... ]

We somewhat agree on most of this, I snipped out the parts that aren't worth 
nitpicking over.  git is really fast right now, and I'm all for throwing 
drive space at things to solve problems.  I just don't think we have to throw 
as much space at it as we are.

> The _seek_ issue is real, but git actually has a very nice architecture
> even there: not only dos it cache really really well (and you can do a
> simple "ls-tree $(cat .git/HEAD)" and populate the case from the results),
> but the low level of indirection in a git archive means that it's almost
> totally prefetchable with near-perfect access patterns.

We can sort by the files before reading them in, but even if we order things 
perfectly, we're spreading the io out too much across the drive. It works 
right now because the git archive is relatively dense.  At a few hundred MB 
when we order things properly the drive head isn't moving that much.

At 3-6 GB this hurts more.  The data gets farther apart as things age, and 
drive performance rots away.  I'll never convince you without numbers, which 
means I'll have to wait for the full load of old history and try it out ;)

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] multi item packed files

2005-04-21 Thread Chris Mason
On Thursday 21 April 2005 15:28, Krzysztof Halasa wrote:
> Linus Torvalds <[EMAIL PROTECTED]> writes:
> > Wrong. You most definitely _can_ lose: you end up having to optimize for
> > one particular filesystem blocking size, and you'll lose on any other
> > filesystem. And you'll lose on the special filesystem of "network
> > traffic", which is byte-granular.
>
> If someone needs better on-disk ratio, (s)he can go with 1 KB filesystem
> or something like that, without all the added complexity of packing.
>
> If we want to optimize that further, I would try doing it at the
> underlying filesystem level. For example, loop-mounted one.

Shrug, we shouldn't need help from the kernel for something like this.  git as 
a database hits worst case scenarios for almost every FS.

We've got:

1) subdirectories with lots of files
2) wasted space for tiny files
3) files that are likely to be accessed together spread across the whole disk

One compromise for SCM use would be one packed file per commit, with an index 
that lets us quickly figure out which commit has a particular version of a 
given file.  My hack gets something close to that (broken into 32k chunks for 
no good reason), and the index to find a given file is just the git directory 
tree.

But my code does hide the fact that we're packing things from most of the git 
interfaces.  So I can almost keep a straight face while claiming to be true 
to the original git design...almost.  The whole setup is far from perfect, 
but it is one option for addressing points 2 & 3 above.

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] multi item packed files

2005-04-21 Thread Chris Mason
On Thursday 21 April 2005 11:41, Linus Torvalds wrote:
> On Thu, 21 Apr 2005, Chris Mason wrote:
> > There have been a few threads on making git more space efficient, and
> > eventually someone mentions tiny files and space fragmentation.  Now that
> > git object names are decoupled from their compression, it's easier to
> > consider a a variety of compression algorithms.  I whipped up a really
> > silly "pack files together" compression.
>
> Careful.
>
> This is something that needs history to tell whether it's effective. In
> particular, if one file changes and another one does not, your packed
> archive now ends up being a new blob, so while you "saved space" by having
> just one blob for the object, in reality you didn't save any space at all
> because with the  files changing, you just guaranteed that the packed
> blob changes  times more often.

The packed blob lives in git but never makes it into a tree.  Lets say that I 
have a packed blob with files "a, b, c", and another packed blob with files 
"x, y, z".  Someone changes files, b and z and then runs update-cache b z.

Now we have 2 unchanged packed blobs: "a, b, c", "x, y, z",  and one new 
packed blob: "b_new, z_new".  This means that in order for the packing to 
help, we have to change more then one file at a time.  That's why it would be 
good to have update-cache include the write-tree and commit-tree.

>
> See? Your "packing in space" ends up also resulting in "packing in time",
> and you didn't actually win anything.
>
> (If you did a good job of packing, you hopefully didn't _lose_ anything
> either - you needed 1: number of objects that took 1: the space if
> the packing ended up perfect - but since you needed  times more of
> these objects unless they all change together, you end up with exactly the
> same space usage).
>
> So the argument is: you can't lose with the method, and you _can_ win.
> Right?
>
> Wrong. You most definitely _can_ lose: you end up having to optimize for
> one particular filesystem blocking size, and you'll lose on any other
> filesystem. And you'll lose on the special filesystem of "network
> traffic", which is byte-granular.
>
The patch does have one extra directory entry (for the packed blob), but from 
a network point of view roughly the same number of bytes should be copied.  
The hardlinks won't play nice with rsync though, soft links might be better.

packing isn't just about filesystem block sizes, it's about locality.  All the 
hashing means pretty much every access in git is random.  With packing we can 
at least try to put a single changeset together on disk.  Right now it 
doesn't matter much, but when the git tree is 6GB in two years we'll feel the 
pain.

> I don't want to pee on peoples parades, and I'm all for gathering numbers,
> but the thing is, the current git isn't actually all that bad, and I
> guarantee that it's hard to make it better without using delta
> representation. And the current thing is really really simple.
>

Grin, if I thought you wanted the patch I might have tried to pretty it up a 
little.  The point is that all the discussions about ways to make git use 
less space end up stuck in "but wait, that'll make a bunch of tiny files and 
filesystems aren't good at that".  So I believe some kind of packing is a 
required building block for any kind of delta storage.

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] multi item packed files

2005-04-21 Thread Chris Mason
Hello,

There have been a few threads on making git more space efficient, and 
eventually someone mentions tiny files and space fragmentation.  Now that git 
object names are decoupled from their compression, it's easier to consider a 
a variety of compression algorithms.  I whipped up a really silly "pack files 
together" compression.

This would maintain the write once semantics but allow a simple mechanism 
where objects are combined together.  Choosing which objects to combine is 
easy, things put together into update-cache go together.  This gives us more 
space efficiency and no seeks when reading that packed file off disk.

A natural extension to this is to make update-cache --commit-tree, which 
includes the files produced by write-tree and commit-tree into the same 
packed file.  (I haven't coded this).

The layout works like this:

1) a new object type "packed" is added.
2) new objects are buffered into a packed object, until it gets to around 32k 
in size.  This is complete arbitrary but felt about right.
3) The packed object is writting to git storage and then hard links are made 
to the packed object from the sha1 filename of each object inside.
4) read_sha1_file is changed to recognize the packed object and search inside.

I did a simple test on the 2.6.11 tree with my 100 patches applied.  Without 
packing, .git is 99MB.  With packing it only needs 62MB:

read speeds don't suffer with this, time to read-tree ; checkout-cache -a -f 
from a cold cache were the same.  I could get the times lower with the patch 
by caching the uncompressed data, since in theory I should be faster here.

Using this on data you care about would be a really bad idea right now.  I'm 
only posting the patch to get the basic idea across for benchmarking and 
discussion.

-chris
diff -ur linus.back/cache.h linus/cache.h
--- linus.back/cache.h	2005-04-21 11:05:27.971607944 -0400
+++ linus/cache.h	2005-04-21 09:35:47.173613576 -0400
@@ -109,7 +109,7 @@
 
 /* Read and unpack a sha1 file into memory, write memory to a sha1 file */
 extern void * map_sha1_file(const unsigned char *sha1, unsigned long *size);
-extern void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size);
+extern void * unpack_sha1_file(const unsigned char *sha1, void *map, unsigned long mapsize, char *type, unsigned long *size);
 extern void * read_sha1_file(const unsigned char *sha1, char *type, unsigned long *size);
 extern int write_sha1_file(char *buf, unsigned len, unsigned char *return_sha1);
 extern int check_sha1_signature(unsigned char *sha1, void *buf, unsigned long size, const char *type);
@@ -117,6 +117,10 @@
 /* Convert to/from hex/sha1 representation */
 extern int get_sha1_hex(const char *hex, unsigned char *sha1);
 extern char *sha1_to_hex(const unsigned char *sha1);	/* static buffer result! */
+extern int pack_sha1_buffer(void *buf, unsigned long buf_len, 
+		 unsigned char *returnsha1, char **dest, 
+		 unsigned long *dest_size);
+int write_packed_buffer(void *buf, unsigned long len);
 
 /* General helper functions */
 extern void usage(const char *err);
diff -ur linus.back/cat-file.c linus/cat-file.c
--- linus.back/cat-file.c	2005-04-21 11:05:27.971607944 -0400
+++ linus/cat-file.c	2005-04-21 10:04:29.871723656 -0400
@@ -23,7 +23,7 @@
 		type[size] = '\n';
 		size++;
 	} else if (strcmp(type, argv[1])) {
-		die("cat-file %s: bad tag", argv[2]);
+		die("cat-file %s: bad tag (%s: %s)", argv[2], type, argv[1]);
 	}
 
 	while (size > 0) {
diff -ur linus.back/fsck-cache.c linus/fsck-cache.c
--- linus.back/fsck-cache.c	2005-04-21 11:05:27.974607488 -0400
+++ linus/fsck-cache.c	2005-04-21 09:14:03.139856840 -0400
@@ -85,7 +85,7 @@
 		if (map) {
 			char type[100];
 			unsigned long size;
-			void *buffer = unpack_sha1_file(map, mapsize, type, &size);
+			void *buffer = unpack_sha1_file(sha1, map, mapsize, type, &size);
 			if (!buffer)
 return -1;
 			if (check_sha1_signature(sha1, buffer, size, type) < 0)
diff -ur linus.back/sha1_file.c linus/sha1_file.c
--- linus.back/sha1_file.c	2005-04-21 11:05:27.978606880 -0400
+++ linus/sha1_file.c	2005-04-21 10:41:51.280977656 -0400
@@ -116,7 +116,8 @@
 	return map;
 }
 
-void * unpack_sha1_file(void *map, unsigned long mapsize, char *type, unsigned long *size)
+void * unpack_sha1_file(const unsigned char *sha1, void *map, 
+			unsigned long mapsize, char *type, unsigned long *size)
 {
 	int ret, bytes;
 	z_stream stream;
@@ -134,12 +135,12 @@
 	ret = inflate(&stream, 0);
 	if (sscanf(buffer, "%10s %lu", type, size) != 2)
 		return NULL;
-
 	bytes = strlen(buffer) + 1;
 	buf = malloc(*size);
-	if (!buf)
+	if (!buf) {
+		perror("malloc");
 		return NULL;
-
+	}
 	memcpy(buf, buffer + bytes, stream.total_out - bytes);
 	bytes = stream.total_out - bytes;
 	if (bytes < *size && ret == Z_OK) {
@@ -149,6 +150,36 @@
 			/* nothing */;
 	}
 	inflateEnd(&stream);
+
+	/* we've found a packed object */
+	if (strcmp(type, "packed") == 0) {
+		char *p = buf;
+		if (!sha1

Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 13:52, Linus Torvalds wrote:
> On Wed, 20 Apr 2005, Chris Mason wrote:
> > The patch below with your current tree brings my 100 patch test down to
> > 22 seconds again.
>
> If you ever have a cache_entry bigger than 16384, your code will write
> things out in the wrong order (write the new cache without flushing the
> old buffer).

Whoops

> Finally, if you really want to go fast, you should really try to make your
> writes powers-of-two, ie fill up the buffer entirely rather than saying
> "if I were to overflow, flush it now". It doesn't matter that much for
> some filesystems (especially local and append-only like the patterns are
> here), but it can definitely matter for the stupid ones.

Well, the difference there should be pretty hard to see with any benchmark.
But I was being lazy...new patch attached.  This one gets the same perf 
numbers, if this is still wrong then I really need some more coffee.

-chris

--- linus.back/read-cache.c	2005-04-20 10:14:23.26831 -0400
+++ linus/read-cache.c	2005-04-20 14:54:28.554518320 -0400
@@ -232,11 +232,13 @@
 	SHA_CTX c;
 	struct cache_header hdr;
 	int i;
+	#define BUFLEN 16384
+	static char buf[BUFLEN];
+	int len = 0;
 
 	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
 	hdr.hdr_version = htonl(1);
 	hdr.hdr_entries = htonl(entries);
-
 	SHA1_Init(&c);
 	SHA1_Update(&c, &hdr, offsetof(struct cache_header, sha1));
 	for (i = 0; i < entries; i++) {
@@ -246,13 +248,37 @@
 	}
 	SHA1_Final(hdr.sha1, &c);
 
-	if (write(newfd, &hdr, sizeof(hdr)) != sizeof(hdr))
-		return -1;
-
+	/* hdr is small right now, but just
+	 * in case someone changes that...
+	 */
+	if (sizeof(hdr) < BUFLEN) {
+		memcpy(buf, &hdr, sizeof(hdr));
+		len += sizeof(hdr);
+	} else {
+		if (write(newfd, &hdr, sizeof(hdr)) != sizeof(hdr))
+			return -1;
+	}
 	for (i = 0; i < entries; i++) {
 		struct cache_entry *ce = cache[i];
 		int size = ce_size(ce);
-		if (write(newfd, ce, size) != size)
+		char *p = (char *)ce;
+		while(size > 0) {
+			int count = size;
+			if (count > BUFLEN - len)
+count = BUFLEN - len;
+			memcpy(buf + len, p, count);
+			size -= count;
+			len += count;
+			p += count;
+			if (len == BUFLEN) {
+if (write(newfd, buf, len) != len)
+	return -1;
+len = 0;
+			}
+		}
+	}
+	if (len) {
+		if (write(newfd, buf, len) != len)
 			return -1;
 	}
 	return 0;


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 13:06, Linus Torvalds wrote:
> On Wed, 20 Apr 2005, Chris Mason wrote:
> > At any rate, the time for a single write-tree is pretty consistent. 
> > Before it was around .5 seconds, and with this change it goes down to
> > .128s.
>
> Oh, wow.
>
> I bet your SHA1 implementation is done with hand-optimized and scheduled
> x86 MMX code or something, while my poor G5 is probably using some slow
> generic routine. As a result, it only improved by 33% for me since the
> compression was just part of the picture, but with your cheap SHA1 the
> compression costs really dominated, and so it's almost four times faster
> for you.

Aha, I was wondering why your write-tree speeds sounded so bad...this athlon 
machine is ~2years old now.

Your comments about costs for writing the index file got me thinking, so I 
benchmarked how long the update-cache takes if we don't do the sha1 of the 
index file.  There was almost no difference at all.  update-cache currently 
takes about .152 seconds

The code to write the cache calls write() for every cache entry, writing just 
a few bytes at a time.  I changed it to collect these into a 16k buffer, 
which brings me down to .044s.  This might not help as much on ext23, since 
they are faster than reiser for tiny writes.

The patch below with your current tree brings my 100 patch test down to 22 
seconds again.

-chris
--- linus.back/read-cache.c	2005-04-20 10:14:23.26831 -0400
+++ linus/read-cache.c	2005-04-20 13:05:13.200083672 -0400
@@ -232,11 +232,12 @@
 	SHA_CTX c;
 	struct cache_header hdr;
 	int i;
+	char *buf;
+	int len = 0;
 
 	hdr.hdr_signature = htonl(CACHE_SIGNATURE);
 	hdr.hdr_version = htonl(1);
 	hdr.hdr_entries = htonl(entries);
-
 	SHA1_Init(&c);
 	SHA1_Update(&c, &hdr, offsetof(struct cache_header, sha1));
 	for (i = 0; i < entries; i++) {
@@ -246,13 +247,31 @@
 	}
 	SHA1_Final(hdr.sha1, &c);
 
+	buf = malloc(16384);
+	if (!buf) {
+		return -1;
+	}
 	if (write(newfd, &hdr, sizeof(hdr)) != sizeof(hdr))
 		return -1;
 
 	for (i = 0; i < entries; i++) {
 		struct cache_entry *ce = cache[i];
 		int size = ce_size(ce);
-		if (write(newfd, ce, size) != size)
+		if (size > 16384) {
+			if (write(newfd, ce, size) != size)
+return -1;
+			continue;
+		}
+		if (len + size > 16384) {
+			if (write(newfd, buf, len) != len)
+return -1;
+			len = 0;
+		}
+		memcpy(buf + len, ce, size);
+		len += size;
+	}
+	if (len) {
+		if (write(newfd, buf, len) != len)
 			return -1;
 	}
 	return 0;


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 11:40, Linus Torvalds wrote:
> On Wed, 20 Apr 2005, Chris Mason wrote:
> > Thanks for looking at this.  Your new tree is faster, it gets the commit
> > 100 patches time down from 1m5s to 50s.
>
> It really _shouldn't_ be faster. It still does the compression, and throws
> the end result away.

Well, that's a little odd.  I had thought about making sure you did this 
change and forgotten.  1 minute benchmarks are a horrible idea since they run 
into noise with cache writebacks.  I should know better...

At any rate, the time for a single write-tree is pretty consistent.  Before it 
was around .5 seconds, and with this change it goes down to .128s.  My patch 
was .024.

The 100 patch time is down to 32s (3 run average).  This is close enough that 
I don't think my patch is worth it if no other part of git can benefit from 
having trees in the index.

>
> To actually go faster, it _should_ need this patch. Untested. See if it
> works..

Thanks. This one missed the filling in the returnsha1.  New patch attached.

-chris
diff -u linus.back/sha1_file.c linus/sha1_file.c
--- linus.back/sha1_file.c	2005-04-20 12:31:00.240181016 -0400
+++ linus/sha1_file.c	2005-04-20 12:13:56.339837528 -0400
@@ -173,12 +173,27 @@
 	z_stream stream;
 	unsigned char sha1[20];
 	SHA_CTX c;
+	char *filename;
+	int fd;
 
 	/* Sha1.. */
 	SHA1_Init(&c);
 	SHA1_Update(&c, buf, len);
 	SHA1_Final(sha1, &c);
 
+	filename = sha1_file_name(sha1);
+	fd = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);
+	if (fd < 0) {
+		if (errno != EEXIST)
+			return -1;
+
+		/*
+		 * We might do collision checking here, but we'd need to
+		 * uncompress the old file and check it. Later.
+		 */
+		goto out;
+	}
+
 	/* Set it up */
 	memset(&stream, 0, sizeof(stream));
 	deflateInit(&stream, Z_BEST_COMPRESSION);
@@ -195,8 +210,10 @@
 	deflateEnd(&stream);
 	size = stream.total_out;
 
-	if (write_sha1_buffer(sha1, compressed, size) < 0)
-		return -1;
+	if (write(fd, compressed, size) != size)
+		die("unable to write file");
+	close(fd);
+out:		
 	if (returnsha1)
 		memcpy(returnsha1, sha1, 20);
 	return 0;


Re: [PATCH] write-tree performance problems

2005-04-20 Thread Chris Mason
On Wednesday 20 April 2005 02:43, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Chris Mason wrote:
> > I'll finish off the patch once you ok the basics below.  My current code
> > works like this:
>
> Chris, before you do anything further, let me re-consider.
>
> Assuming that the real cost of write-tree is the compression (and I think
> it is), I really suspect that this ends up being the death-knell to my
> "use the sha1 of the _compressed_ object" approach. 

Thanks for looking at this.  Your new tree is faster, it gets the commit 100 
patches time down from 1m5s to 50s.  I've attached my patch from last night, 
which is mostly a rough guess of the changes we would need, I haven't 
validated or cleaned things up.

With the basic changes I described before, the  100 patch time only goes down 
to 40s.  Certainly not fast enough to justify the changes.  In this case, the 
bulk of the extra time comes from write-tree writing the index file, so I 
split write-tree.c up into libwrite-tree.c, and created update-cache 
--write-tree.

This gets our time back down to 21s.

The attached patch is not against your latest revs.  After updating I would 
need to sprinkle a few S_ISDIR checks into diff-cache.c and checkout-cache.c, 
but the changes should be small.

-chris
Index: Makefile
===
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/Makefile  (mode:100644 sha1:6a04941a337ec50da06cf4cf52aa58f3b1435776)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/Makefile  (mode:100644 sha1:2ba6d49196e8a2335cfcd77ec0dbe9cda3e402dd)
@@ -29,7 +29,7 @@
 
 VERSION= VERSION
 
-LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o
+LIB_OBJS=read-cache.o sha1_file.o usage.o object.o commit.o tree.o blob.o libwrite-tree.o
 LIB_FILE=libgit.a
 LIB_H=cache.h object.h
 
Index: cache.h
===
--- dbeacafeb442bcfd39dfdc90c360d47d4215c185/cache.h  (mode:100644 sha1:c182ea0c5c1def37d899f9a05f8884ebe17c9d92)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/cache.h  (mode:100644 sha1:0882b713222b71e67c9dab5d58ab6f15c3c49ed6)
@@ -74,7 +74,7 @@
 #define ce_stage(ce) ((CE_STAGEMASK & ntohs((ce)->ce_flags)) >> CE_STAGESHIFT)
 
 #define ce_permissions(mode) (((mode) & 0100) ? 0755 : 0644)
-#define create_ce_mode(mode) htonl(S_IFREG | ce_permissions(mode))
+#define create_ce_mode(mode) htonl((mode & (S_IFREG|S_IFDIR)) | ce_permissions(mode))
 
 #define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) & ~7)
 
Index: libwrite-tree.c
===
--- /dev/null  (tree:dbeacafeb442bcfd39dfdc90c360d47d4215c185)
+++ 27e71cd40ff1dccfbbd996427833fd7bac714dde/libwrite-tree.c  (mode:100644 sha1:52202930d02b3721f5a388ae1178c5a4d99ec1b4)
@@ -0,0 +1,174 @@
+/*
+ * GIT - The information manager from hell
+ *
+ * Copyright (C) Linus Torvalds, 2005
+ */
+#include "cache.h"
+
+struct new_ce {
+	struct new_ce *next;
+	struct cache_entry ce;
+};
+
+static struct new_ce *add_list = NULL;
+
+static int check_valid_sha1(unsigned char *sha1)
+{
+	char *filename = sha1_file_name(sha1);
+	int ret;
+
+	/* If we were anal, we'd check that the sha1 of the contents actually matches */
+	ret = access(filename, R_OK);
+	if (ret)
+		perror(filename);
+	return ret;
+}
+
+static int prepend_integer(char *buffer, unsigned val, int i)
+{
+	buffer[--i] = '\0';
+	do {
+		buffer[--i] = '0' + (val % 10);
+		val /= 10;
+	} while (val);
+	return i;
+}
+
+#define ORIG_OFFSET (40)	/* Enough space to add the header of "tree \0" */
+
+static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
+{
+	unsigned char subdir_sha1[20];
+	unsigned long size, offset;
+	char *buffer;
+	int i, nr;
+
+	/* Guess at some random initial size */
+	size = 8192;
+	buffer = malloc(size);
+	offset = ORIG_OFFSET;
+
+	nr = 0;
+	do {
+		struct cache_entry *ce = cachep[nr];
+		const char *pathname = ce->name, *filename, *dirname;
+		int pathlen = ce_namelen(ce), entrylen;
+		unsigned char *sha1;
+		unsigned int mode;
+
+		/* Did we hit the end of the directory? Return how many we wrote */
+		if (baselen >= pathlen || memcmp(base, pathname, baselen))
+			break;
+
+		sha1 = ce->sha1;
+		mode = ntohl(ce->ce_mode);
+
+		/* Do we have _further_ subdirectories? */
+		filename = pathname + baselen;
+		dirname = strchr(filename, '/');
+		if (dirname) {
+			int subdir_written;
+			int len = dirname - pathname;
+			unsigned int size = cache_entry_size(len);
+			struct new_ce *new_ce = malloc(size + sizeof(struct new_ce *));
+			struct cache_entry *c = &new_ce->ce;
+			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
+			nr += subdir_written 

Re: [PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
On Tuesday 19 April 2005 17:23, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Chris Mason wrote:
> > Regardless, putting it into the index somehow should be fastest, I'll see
> > what I can do.
>
> Start by putting it in at "read-tree" time, and adding the code to
> invalidate all parent directory indexes when somebody changes a file in
> the index (ie "update-cache" for anything but a "--refresh").
>
> That would be needed anyway, since those two are the ones that already
> change the index file.
>
> Once you're sure that you can correctly invalidate the entries (so that
> you could never use a stale tree entry by mistake), the second stage would
> be to update it at "write-tree" time.

This was much easier then I expected, and it seems to be working here.  It 
does slow down the write-tree slightly because we have to write out the index 
file, but I can get around that with the index file on tmpfs change.

The original write-tree needs .54 seconds to run

write-tree with the index speedup gets that down to .024s (same as my first 
patch) when nothing has changed.  When it has to rewrite the index file 
because something changed, it's .167s.

I'll finish off the patch once you ok the basics below.  My current code works 
like this:

1) read-tree will insert index entries for directories.  There is no index 
entry for the root.

2) update-cache removes index entries for all parents of the file you're 
updating.  So, if you update-cache fs/ext3/inode.c, I remove the index of fs 
and fs/ext3

3) If write-tree finds a directory in the index, it uses the sha1 in the cache 
entry and skips all files/dirs under that directory.

4) If write-tree detects a subdir with no directory in the index, it calls 
write_tree the same way it used to.  It then inserts a new cache object with 
the calculated sha1.

5) right before exiting, write-tree updates the index if it made any changes.

The downside to this setup is that I've got to change other index users to 
deal with directory entries that are there sometimes and missing other times.  
The nice part is that I don't have to "invalidate" the directory entry, if it 
is present, it is valid.

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
On Tuesday 19 April 2005 15:03, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Chris Mason wrote:
> > Very true, you can't replace quilt with git without ruining both of them.
> >  But it would be nice to take a quilt tree and turn it into a git tree
> > for merging purposes, or to make use of whatever visualization tools
> > might exist someday.
>
> Fair enough. The thing is, going from quilt->git really is a pretty "big
> decision", since it's the decision that says "I will now really commit all
> this quilt changes forever and ever".
>
> Which is also why I think it's actually ok to take a minute to do 100
> quilt patches. This is not something you do on a whim. It's something
> you'd better think about. It's turning a very fluid environment into a
> unchangable, final thing.
>

It's only final when someone pulls from you...for me, all the trees would be 
temporary.

[ ... subtree tree hashes in the index file ... ]

> I'll think about it. I'd love to speed up write-tree, and keeping track of
> it in the index is a nice little trick, but it's not quite high enough up
> on my worries for me to act on it right now.
>
> But if you want to try to see how nasty it would be to add tree index
> entries to the index file at "write-tree" time automatically, hey...
>

Makes sense, I'll let the merge development frenzy die down and give it a try 
one weekend.  I might look into making it a special case of the merging index 
changes, since some of the concepts seem similar.

Regardless, putting it into the index somehow should be fastest, I'll see what 
I can do.

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
On Tuesday 19 April 2005 13:36, Linus Torvalds wrote:
> On Tue, 19 Apr 2005, Chris Mason wrote:
> > I did a quick experiment with applying/commit 100 patches from the suse
> > kernel into a kernel git tree, which quilt can do in 2 seconds.  git
> > needs 1m5s.
>
> Note that I don't think you want to replace quilt with git. The approaches
> are totally different, and git does _not_ obviate the need for the quilt
> kind of "patch testing".
>
> In fact, git has all the same issues that BK had, and for the same
> fundamental reason: if you do distributed work, you have to always
> "append" stuff, and that means that you can never re-order anything after
> the fact.

Very true, you can't replace quilt with git without ruining both of them.  But 
it would be nice to take a quilt tree and turn it into a git tree for merging 
purposes, or to make use of whatever visualization tools might exist someday.  

> What I _would_ like is the ability to re-use an old tree, though. What you
> really want to do is not pass in a set of directory names and just trust
> that they are correct, but just pass in a directory to compare with, and
> if the contents match, you don't need to write out a new one.
>
> I'll try to whip up something that does what you want done, but doesn't
> need (or take) any untrusted information from the user in the form "trust
> me, it hasn't changed".

We already have a "trust me, it hasn't changed" via update-cache.  If it gets 
called wrong the tree won't reflect reality.  The patch doesn't change the 
write-tree default, but does enable you to give write-tree better information 
about the parts of the tree you want written back to git.

With that said, I hate the patch too.  I didn't see how to compare against the 
old tree without reading each tree object from the old tree, and that should 
be slower then what write-tree does now.  So I wimped out and made the quick 
patch that demonstrates the cause of the performance hit.

The "move .git/index to a tmpfs file" change should be easier though, and has 
a real benefit.  How do you feel about s|.git/index|.git/index_dir/index| in 
the sources?  This gives us the flexibility to link it wherever is needed.

-chris
-
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] write-tree performance problems

2005-04-19 Thread Chris Mason
Hello everyone,

I did a quick experiment with applying/commit 100 patches from the suse kernel 
into a kernel git tree, which quilt can do in 2 seconds.  git needs 1m5s.

The primary performance problem during each commit is write-tree recalculating 
the hash of each directory, even though the contents of most directories are 
not changing.  I've attached a very quick and dirty mod of write-tree.c, it 
takes an optional tree id (sha1) and list of directories.  The hash of any 
directories not in the list are read in from existing files instead of being 
recalculated.

You have to pass each sub dir with a modified file.  So, if you change 
fs/super.c and fs/ext3/super.c, you would call "write-tree sha1 fs fs/ext3"
With this patch, the time to apply 100 commits goes down to 22 seconds.  It 
could be faster (and easier to use) if the index stored the hash of trees 
instead of just blobs, but that would be a larger change.

I was able to get the commit time down to 13 seconds by changing read-tree.c, 
update-cache.c and read-cache.c to store/read the index in tmpfs instead of 
on the local filesystem.  I haven't attached the patch for that, but it seems 
easiest to move .git/index into .git/index_dir/index, and let the user decide 
where to put index_dir.

Quick speed summary, apply/commit 100 patches
quilt push -a :2s
git (unmodified):   1m5s
git (tree hash reduction)  22s
git (tree hash, tmpfs index) 13s

This patch is against pasky's tree from this morning, but also applies to 
linus' tree.  It's nasty stuff, but will hopefully get some discussion 
started on speeding things up.

-chris
--- a/write-tree.c
+++ b/write-tree.c
@@ -4,6 +4,8 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include "cache.h"
+static char **dirs;
+static int num_dirs = 0;
 
 static int check_valid_sha1(unsigned char *sha1)
 {
@@ -27,15 +29,47 @@ static int prepend_integer(char *buffer,
 	return i;
 }
 
+static int find_sha(char *buffer, int size, const char *base, int baselen, char *returnsha1)
+{
+	while(size) {
+		int len = strlen(buffer)+1;
+		unsigned char *sha1 = buffer + len;
+		char *path = strchr(buffer, ' ')+1;
+		unsigned int mode;
+
+		if (size < len + 20 || sscanf(buffer, "%o", &mode) != 1)
+			die("corrupt 'tree' file");
+		buffer = sha1 + 20;
+		size -= len + 20;
+		if (strncmp(path, base, baselen) == 0 &&
+		strlen(path) == baselen) {
+			memcpy(returnsha1, sha1, 20);
+			return 0;
+		}
+	}
+	return -1;
+}
+
 #define ORIG_OFFSET (40)	/* Enough space to add the header of "tree \0" */
 
-static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1)
+static int write_tree(struct cache_entry **cachep, int maxentries, const char *base, int baselen, unsigned char *returnsha1, char *treesha)
 {
 	unsigned char subdir_sha1[20];
 	unsigned long size, offset;
 	char *buffer;
 	int i, nr;
-
+	char *tree = NULL;
+	unsigned long tree_size;
+	char type[20];
+	if (treesha) {
+		tree = read_sha1_file(treesha, type, &tree_size);
+		if (!tree) {
+			die("unable to read sha1 file");
+		} else {
+		}
+		if (strcmp(type, "tree"))
+			die("expected a tree node");
+	}
 	/* Guess at some random initial size */
 	size = 8192;
 	buffer = malloc(size);
@@ -55,15 +89,60 @@ static int write_tree(struct cache_entry
 
 		sha1 = ce->sha1;
 		mode = ntohl(ce->ce_mode);
-
 		/* Do we have _further_ subdirectories? */
 		filename = pathname + baselen;
 		dirname = strchr(filename, '/');
 		if (dirname) {
 			int subdir_written;
-
-			subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1);
-			nr += subdir_written;
+			int dirlen = dirname - pathname;
+			int dirmatch = 1;
+			if (tree && num_dirs > 0) {
+dirmatch = 0;
+for(i = 0 ; i < num_dirs; i++) {
+	int len = strlen(dirs[i]);
+	if (len <= baselen)
+		continue;
+	if (memcmp(dirs[i], pathname, dirlen)==0 &&
+	pathname[dirlen] == '/') {
+		dirmatch = 1;
+		break;
+	}
+}
+if (!dirmatch && find_sha(tree, tree_size, 
+			 filename, 
+			 dirname-filename, 
+			 subdir_sha1)) {
+	dirmatch = 1;
+}
+			}
+			if (!dirmatch) {
+/* eat all the entries in this dir */
+while(++nr < maxentries) {
+	char *p;
+	ce = cachep[nr];
+	p = strchr(ce->name + baselen, '/');
+	if (!p)
+		break;
+	if (p - ce->name != dirname-pathname)
+		break;
+	if (memcmp(pathname, ce->name, p-ce->name))
+		break;
+}
+			} else {
+unsigned char thisdir_sha1[20];
+char *p = thisdir_sha1;
+if (num_dirs && tree) {
+if (find_sha(tree, tree_size, filename, 
+ dirname-filename, p)) {
+	num_dirs = 0;
+	p = NULL;
+}
+} else {
+	p = NULL;
+}
+subdir_written = write_tree(cachep + nr, maxentries - nr, pathname, dirname-pathname+1, subdir_sha1, p);
+nr += subdir_written;
+			}
 
 			/* Now we need to w