Re: [PATCH 3/2] merge-trees script for Linus git

2005-04-16 Thread Junio C Hamano
 LT == Linus Torvalds [EMAIL PROTECTED] writes:

LT Damn, my cunning plan is some good stuff. 

I really like this a lot.  It is *so* *simple*, clear, flexible
and an example of elegance.  This is one of the things I would
happily say Sheesh!  Why didn't *I* think of *THAT*
first!!! to.

LT NOTE NOTE NOTE! I could make read-tree do some of these nontrivial 
LT merges, but I ended up deciding that only the matches in all three 
LT states thing collapses by default.

 * Understood and agreed.

LT Damn, I'm good.

 * Agreed ;-). Wholeheartedly.

So what's next?  Certainly I'd immediately drop (and I would
imagine you would as well) both C or Perl version of
merge-tree(s).

The userland merge policies need ways to extract the stage
information and manipulate them.  Am I correct to say that you
mean by ls-files -l the extracting part?

LT I should make ls-files have a -l format, which shows the
LT index and the mode for each file too.

You probably meant ls-tree.  You used the word mode but it
already shows the mode so I take it to mean stage.  Perhaps
something like this?

$ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b
100644  blobfe2a4177a760fd110e78788734f167bd633be8deCOPYING
100644  blobb39b4ea37586693dd707d1d0750a9b580350ec50:1  man/frotz.6
100644  blobb39b4ea37586693dd707d1d0750a9b580350ec50:2  man/frotz.6
100664  blobeeed997e557fb079f38961354473113ca0d0b115:3  man/frotz.6
 ...

The above example shows that COPYING has merged successfully,
and O and A have the same contents and B has something different
at man/frotz.6.

Assuming that you would be working on that, I'd like to take the
dircache manipulation part.  Let's think about the minimally
necessary set of operations:

 * The merge policy decides to take one of the existing stage.

   In this case we need a way to register a known mode/sha1 at a
   path.  We already have this as update-cache --cacheinfo.
   We just need to make sure that when update-cache puts
   things at stage 0 it clears other stages as well.

 * The merge policy comes up with a desired blob somewhere on
   the filesystem (perhaps by running an external merge
   program).  It wants to register it as the result of the
   merge.

   We could do this today by first storing the desired blob
   in a temporary file somewhere in the path the dircache
   controls, update-cache --add the temporary file, ls-tree to
   find its mode/sha1, update-cache --remove the temporary
   file and finally update-cache --cacheinfo the mode/sha1.
   This is workable but clumsy.  How about:

   $ update-cache --graft [--add] desired-blob path

   to say I want to register mode/sha1 from desired-blob, which
   may not be of verify_path() satisfying name, at path in the
   dircache?

 * The merge policy decides to delete the path.

   We could do this today by first stashing away the file at the
   path if it exists, update-cache --remove it, and restore
   if necessary.  This is again workable but clumsy.  How about:

   $ update-cache --force-remove path

   to mean I want to remove the path from dircache even though
   it may exist in my working tree?

So it all boils down to update-cache.  The new things to be
introduced are:

 * An explicit update-cache always removes stage 1/2/3 entries
   associated with the named path.

 * update-cache --graft

 * update-cache --force-remove

Am I on the right track?

You might want to go even lower level by letting them say
something like:

 * update-cache --register-stage mode sha1 stage path

   Registers the mode/sha1 at stage for path.  Does not look at
   the working tree.  stage is [0-3]
 
 * update-cache --delete-stage stage-list path

   Removes the entry at named stages for path.  Does not look at
   the working tree.  stage-list is either [0-3](,[0-3])+ or
   bitmask (i.e. (1  stage-number) ORed together).  The former
   would probably be easier to work with by scripts

 * write-blob path

   Hashes and registers the file at path (regardless of what
   verify_path() says) and writes the resulting blob's mode/sha1
   to the standard output.

If you take this lower-level approach, an explicit update-cache
would not clear stage1/2/3.

My preference is the former, not so low-level, interface.
Guidance?

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


git-pasky file mode handling

2005-04-16 Thread Russell King
Hi,

It seems that there's something weird going on with the file mode
handling.  Firstly, some files in the git-pasky repository have mode
0664 while others have 0644.

Having pulled from git-pasky a number of times, with Petr's being the
tracked repository, I now find that when I do an update-cache --refresh,
it complains that the files need updating, despite show-diff showing no
differences.  Investigating, this appears to be because the file modes
are wrong for a number of the files.  All my files do not have group
write.

I notice in the changelog what appears to be a dependence on the umask.
If this is so, please note that git appears to track the file modes,
and any dependence upon the umask is likely to screw with this tracking.

-- 
Russell King

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] Add --stage to show-files for new stage dircache.

2005-04-16 Thread Junio C Hamano
 JNH == Junio C Hamano [EMAIL PROTECTED] writes:
 LT == Linus Torvalds [EMAIL PROTECTED] writes:

LT I should make ls-files have a -l format, which shows the
LT index and the mode for each file too.

JNH You probably meant ls-tree.  You used the word mode but it
JNH already shows the mode so I take it to mean stage.

I was *wrong*.  Of course you meant show-files.

Instead of sending you an apology, I am sending you the one I
wrote myself.  Please find it in the next message ;-).

Here is its sample output.  It shows file-mode, SHA1, stage and
pathname.  I am attaching this one because this is a
verification that your read-tree -m passed the test.

$ ../show-files --stage
100664 578cc900ed980b72acfbdd1eea63e688a893c458 2 AA
100664 f355077379fce072c210628691da232b59b6f25c 3 AA
100664 d698ebc45d0edfe6e5b95aebb5983cb5c760960b 2 AN
100664 0fa6a8e41814531679e1c76e968a9066fceb689d 1 DD
100664 aff448a9467a4d83b164ef969cfe92ff18eb96be 1 DM
100664 4bfe111723f11cb4a4deec7c837e12601030285f 3 DM
100664 9b0f86e5cded99b9de3bd9d234747ec2d1a4cddd 1 DN
100664 9b0f86e5cded99b9de3bd9d234747ec2d1a4cddd 3 DN
100664 a6772f2a2c15bac796d8c7bb55885891956534cf 1 MD
100664 dc2088ce13f659f2bd554b2c1b343f4966143b9b 2 MD
100664 e4310204563a9059828644464779874c3a406fee 1 MM
100664 fe5ddcd7618d26384cf98c6fcd15780c7125e6d6 2 MM
100664 53a9d14868dbe346a9f0cf01fcda742545b55987 3 MM
100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 1 MN
100664 d7600381b69b92f61bad50c5f8408e831b622ef0 2 MN
100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 3 MN
100664 67fb1517ea8d59949a8e4f5f07f0422b212f64dc 3 NA
100664 0e5842253af8881b2c9f579029d7b50a8e03d7f6 1 ND
100664 0e5842253af8881b2c9f579029d7b50a8e03d7f6 2 ND
100664 0d45c04c9d05fa9c21edf95fc2c1a43519a8c440 1 NM
100664 0d45c04c9d05fa9c21edf95fc2c1a43519a8c440 2 NM
100664 849bfa41d15951f5e97cb93e22cbcc2924ce4517 3 NM
100664 83d94b8fd056921f22ad2ca0122dd7f64974be7c 0 NN

This is taken from the dircache after I ran

$ read-tree -m O A B

using the merge testcase I prepared earlier.  Very trivial,
single ancestor O, with two branches A  B merge case.  This
covers all possible patterns, except file vs directory
conflicts.  The filenames are all two letters, first letter
being what the first branch does to that file while the second
one encodes what the second branch does to it.  The actions are:

 - A means Added in this branch --- did not exist in the ancestor.

 - N means No change in this branch.

 - D means Deleted in this branch.

 - M means Modified in this branch.

So, for example, the first branch modified file MN while the
second one did not touch it.  Of course it existed in the
ancestor.  You can see that read-tree did the right thing
because SHA1 for stage 1 and stage 3 match, and stage 2 is
different.

100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 1 MN
100664 d7600381b69b92f61bad50c5f8408e831b622ef0 2 MN
100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 3 MN

I verified all of the above result and it shows your algorithm
is doing exactly what is expected.

For those of you who are interested, this is the recipe to
reproduce this merge testcase.  NOTE! NOTE! NOTE!  Do not run
this in your working tree, because it trashes .git in its
working directory.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

--- /dev/null
+++ generate-merge-test.sh
@@ -0,0 +1,163 @@
+#!/bin/sh
+
+: Skip execution up to \End_of_Commentary
+
+This directory is to hold a test case for merges.
+
+There is one ancestor (called O for Original) and two branches A
+and B derived from it.  We want to do 3-way merge between A and
+B, using O as the common ancestor.
+
+merge A O B
+diff3 A O B
+
+Decisions are made by comparing contents of O, A and B pathname
+by pathname.  The result is determined by the following guiding
+principle:
+
+ - If only A does something to it and B does not touch it, take
+   whatever A does.
+
+ - If only B does something to it and A does not touch it, take
+   whatever B does.
+
+ - If both A and B does something but in the same way, take
+   whatever they do.
+
+ - If A and B does something but different things, we need a
+   3-way merge:
+
+   - We cannot do anything about the following cases:
+
+ * O does not have it.  A and B both must be adding to the
+   same path independently.
+
+ * A deletes it.  B must be modifying.
+
+   - Otherwise, A and B are modifying.  Run 3-way merge.
+
+
+First, the case matrix.
+
+ - Vertical axis is for A's actions.
+ - Horizontal axis is for B's actions.
+
+..
+| AB | No Action  |   Delete   |   Modify   |Add |
+|++++|
+| No Action  |||||
+|| select O   | delete | select B   | select B   |
+||||||
+|++++|
+| 

[PATCH 2/2] Add --stage to show-files for new stage dircache.

2005-04-16 Thread Junio C Hamano
This adds --stage option to show-files command.  It shows
file-mode, SHA1, stage and pathname.  Record separator follows
the usual convention of -z option as before.

The patch is on top of the byte order fix for create_ce_flags in my
previous message.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 cache.h  |   12 +++-
 show-files.c |   22 ++
 2 files changed, 25 insertions(+), 9 deletions(-)

--- cache.h 2005-04-16 03:02:36.0 -0700
+++ cache.h=show-files-stage-flags  2005-04-16 02:48:47.0 -0700
@@ -65,8 +65,14 @@
 
 #define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
+#define CE_STAGESHIFT 12
 
-#define create_ce_flags(len, stage) htons((len) | ((stage)  12))
+#define create_ce_flags(len, stage) htons((len) | ((stage)  CE_STAGESHIFT))
+#define ce_namelen(ce) (CE_NAMEMASK  ntohs((ce)-ce_flags))
+#define ce_size(ce) cache_entry_size(ce_namelen(ce))
+#define ce_stage(ce) ((CE_STAGEMASK  ntohs((ce)-ce_flags))  CE_STAGESHIFT)
+
+#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) 
 ~7)
 
 const char *sha1_file_directory;
 struct cache_entry **active_cache;
@@ -75,10 +81,6 @@
 #define DB_ENVIRONMENT SHA1_FILE_DIRECTORY
 #define DEFAULT_DB_ENVIRONMENT .git/objects
 
-#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) 
 ~7)
-#define ce_namelen(ce) (CE_NAMEMASK  ntohs((ce)-ce_flags))
-#define ce_size(ce) cache_entry_size(ce_namelen(ce))
-
 #define alloc_nr(x) (((x)+16)*3/2)
 
 /* Initialize and use the cache information */



--- show-files.c
+++ show-files.c2005-04-16 02:58:32.0 -0700
@@ -14,6 +14,7 @@
 static int show_cached = 0;
 static int show_others = 0;
 static int show_ignored = 0;
+static int show_stage = 0;
 static int line_terminator = '\n';
 
 static const char **dir;
@@ -108,10 +109,19 @@
for (i = 0; i  nr_dir; i++)
printf(%s%c, dir[i], line_terminator);
}
-   if (show_cached) {
+   if (show_cached | show_stage) {
for (i = 0; i  active_nr; i++) {
struct cache_entry *ce = active_cache[i];
-   printf(%s%c, ce-name, line_terminator);
+   if (!show_stage)
+   printf(%s%c, ce-name, line_terminator);
+   else
+   printf(/* %06o %s %d %10d %s%c, */
+  %06o %s %d %s%c,
+  ntohl(ce-ce_mode),
+  sha1_to_hex(ce-sha1),
+  ce_stage(ce),
+  /* ntohl(ce-ce_size), */
+  ce-name, line_terminator); 
}
}
if (show_deleted) {
@@ -156,12 +166,16 @@
show_ignored = 1;
continue;
}
+   if (!strcmp(arg, --stage)) {
+   show_stage = 1;
+   continue;
+   }
 
-   usage(show-files (--[cached|deleted|others|ignored])*);
+   usage(show-files [-z] 
(--[cached|deleted|others|ignored|stage])*);
}
 
/* With no flags, we default to showing the cached files */
-   if (!(show_cached | show_deleted | show_others | show_ignored))
+   if (!(show_stage | show_deleted | show_others | show_ignored))
show_cached = 1;
 
read_cache();

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Add clone support to lntree

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 05:16:12AM CEST, I got a letter
where Linus Torvalds [EMAIL PROTECTED] told me that...
 On Sat, 16 Apr 2005, Petr Baudis wrote:
  
  I'm wondering, whether each tree should be fixed to a certain branch.
 
 I'm wondering why you talk about branches at all.
 
 No such thing should exist. There are no branches. There are just 
 repositories. You can track somebody elses repository, but you should 
 track it by location, not by any branch name.
 
 And you track it by just merging it.
 
 Yeah, we don't have really usable merges yet, but..

First, this level of branches concerns multiple working directories
tied to a single repository. It seems like a sensible thing to do; and
you agreed with it too (IIRC). And when you do that, git-pasky just
saves some work for you. For git-pasky, branch is really just a symbolic
name for a commit ID, which gets updated every time you commit in some
repository. Nothing more.

So the whole point of this is to have a symbolic name for some other
working directory. When you want to merge, you don't need to go over to
the other directory, do commit-id, cut'n'paste, and feed that to git
merge. You just do

git merge myotherbranch


Now, about remote repositories. When you pull a remote repository, that
does not mean it has to be immediately merged somewhere. It is very
useful to have another branch you do *not* want to merge, but you want
to do diffs to it, or even check it out / export it later to some
separate directory. Again, the branch is just a symbolic name for the
head commit ID of what you pulled, and the pointer gets updated every
time you pull again - that's the whole point of it.

The last concept are tracking working directories. If you pull the
tracked branch to this directory, it also automerges it. This is useful
when you have a single canonical branch for this directory, which it
should always mirror. That would be the case e.g. for the gazillions of
Linux users who would like to just have the latest bleeding kernel of
your, and they expect to use git just like a different CVS. Basically,
they will just do

git pull

instead of

cvs update

:-).

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-16 Thread David Lang
On Fri, Apr 15, 2005 at 08:32:46AM -0700, Linus Torvalds wrote:
In other words, I'm right. I'm always right, but sometimes I'm more 
right
than other times. And dammit, when I say files don't matter, I'm 
really
really Right(tm).
You're right, of course (All Hail Linus!), if you can make it work
efficiently enough.
Just to put something else on the table, here's how I'd go about
tracking renames and the like, in another world where Linus /does/
make the odd mistake - it's basically a unique id for files in the
repository, added when the file is first recognised and updated when
update-cache adds a new version to the cache. Renames copy the id
across to the new name, and add it into the cache.
This gives you an O(n) way to tell what file was what across
renames, and it might even be useful in Linus' world, or if someone
wanted to build a traditional SCM on top of a git-a-like.
Attached is a patch, and a rename-file.c to use it.
Simon
given that you have multiple machines creating files, how do you deal with 
the idea of the same 'unique id' being assigned to different files by 
different machines?

David Lang

--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: space compression (again)

2005-04-16 Thread David Lang
we alrady have the concept of objects that contain objects and therefor 
don'e need to be re-checked (directories), the chunks inside a file could 
be the same type of thing.

currently we say that if the hash on the directory is the same we don't 
need to re-check each of the files in that directory, this would be that 
if the hash on the file hasn't changed we don't need to re-check the 
chunks inside that file.

David Lang
 On Fri, 15 Apr 2005, Ray Heasman wrote:
Date: Fri, 15 Apr 2005 12:33:03 -0700
From: Ray Heasman [EMAIL PROTECTED]
To: git@vger.kernel.org
Subject: Re: space compression (again)
For for this email not threading properly, I have been lurking on the
mail list archives and just had to reply to this message.
I was planning to ask exactly this question, and Scott beat me to to. I
even wanted to call them chunks too. :-)
It's probably worthwhile for anyone discussing this subject to read this
link: http://www.cs.bell-labs.com/sys/doc/venti/venti.pdf . I know it's
been posted before, but it really is worth reading. :-)
On Fri, 15 Apr 2005, Linus Torvalds wrote:
On Fri, 15 Apr 2005, C. Scott Ananian wrote:
Why are blobs per-file?  [After all, Linus insists that files are an
illusion.]  Why not just have 'chunks', and assemble *these*
into blobs (read, 'files')?  A good chunk size would fit evenly into some
number of disk blocks (no wasted space!).
I actually considered that. I ended up not doing it, because it's not
obvious how to block things up (and even more so because while I like
the notion, it flies in the face of the other issues I had: performance
and simplicity).
I don't think it's as bad as you think.
Let's conceptually have two types of files - Pobs (Proxy Objects, or
Pointer Objects), and chunks. Both are stored and referenced by their
content hash, as usual. Pobs just contain a list of hashes referencing
the chunks in a file. When a file is initially stored, we chunk it so
each chunk fits comfortably in a block, but otherwise we aren't too
critical about sizes. When a file is changed (say, a single line edit),
we update the chunk that contains that line, hash it and store it with
its new name, and update the Pob, which we rehash and restore. If a
chunk grows to be very large (say  2 disk blocks), we can rechunk it
and update the Pob to include the new chunks.
The problem with chunking is:
 - it complicates a lot of the routines. Things like is this file
   unchanged suddenly become is this file still the same set of chunks,
   which is just a _lot_ more code and a lot more likely to have bugs.
You're half right; it will be more complex, but I don't think it's as
bad as you think. Pobs are stored by hash just like anything else. If
some chunks are different, the pob is different, which means it has a
different hash. It's exactly the same as dealing with changed file now.
Sure, when you have to fetch the data, you have to read the pob and get
a list of chunks to concatenate and return, but your example given
doesn't change.
 - you have to find a blocking factor. I thought of just going it fixed
   chunks, and that just doesn't help at all.
Just use the block size of the filesystem. Some filesystems do tail
packing, so space isn't an issue, though speed can be. We don't actually
care how big a chunk is, except to make it easy on the filesystem.
Individual chunks can be any size.
 - we already have wasted space due to the low-level filesystem (as
   opposed to git) usually being block-based, which means that space
   utilization for small objects tends to suck. So you really want to
   prefer objects that are several kB (compressed), and a small block just
   wastes tons of space.
If a chunk is smaller than a disk block, this is true. However, if we
size it right this is no worse than any other file. Small files (less
than a block) can't be made any larger, so they waste space anyway.
Large files end up wasting space in one block unless they are a perfect
multiple of the block size.
When we increase the size of a chunk, it will waste space, but we would
have created an entire new file, so we win there too.
Admittedly, Pobs will be wasting space too.
On the other hand, I use ReiserFS, so I don't care. ;-)
 - there _is_ a natural blocking factor already. That's what a file
   boundary really is within the project, and finding any other is really
   quite hard.
Nah. I think I've made a good case it isn't.
So I'm personally 100% sure that it's not worth it. But I'm not opposed to
the _concept_: it makes total sense in the filesystem view, and is 100%
equivalent to having an inode with pointers to blocks. I just don't think
the concept plays out well in reality.
Well, the reason I think this would be worth it is that you really win
when you have multiple parallel copies of a source tree, and changes are
cheaper too. If you store all the chunks for all your git repositories
in one place, and otherwise treat your trees of Pobs as the real
repository, your copied trees only cost you space 

Re: SHA1 hash safety

2005-04-16 Thread Ingo Molnar

* David Lang [EMAIL PROTECTED] wrote:

 this issue was raised a few days ago in the context of someone 
 tampering with the files and it was decided that the extra checks were 
 good enough to prevent this (at least for now), but what about 
 accidental collisions?
 
 if I am understanding things right the objects get saved in the 
 filesystem in filenames that are the SHA1 hash. of two legitimate 
 files have the same hash I don't see any way for both of them to 
 exist.
 
 yes the risk of any two files having the same has is low, but in the 
 earlier thread someone chimed in and said that they had two files on 
 their system that had the same hash..

you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision 
checking (disabled currently). If there indeed exist two files that have 
different content but the same hash, could someone send those two files?

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Brian O'Mahoney
Three points:
(1) I _have_ seen real-life collisions with MD5, in the context of
Document management systems containing ~10^6 ms-WORD documents.
(2) The HMAC (ethernet-harware-address) of any interface _should_
help to make a unique Id.
(3) While I havn't looked at the details of the plumbing, this is
the time to make sure we can, easily, drop in SHA-160, SHA-256
(or whatever comes from NIST) when needed.


David Lang wrote:
 On Sat, 16 Apr 2005, Ingo Molnar wrote:
 
 * David Lang [EMAIL PROTECTED] wrote:

 this issue was raised a few days ago in the context of someone
 tampering with the files and it was decided that the extra checks were
 good enough to prevent this (at least for now), but what about
 accidental collisions?

 if I am understanding things right the objects get saved in the
 filesystem in filenames that are the SHA1 hash. of two legitimate
 files have the same hash I don't see any way for both of them to
 exist.

 yes the risk of any two files having the same has is low, but in the
 earlier thread someone chimed in and said that they had two files on
 their system that had the same hash..


 you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision
 checking (disabled currently). If there indeed exist two files that have
 different content but the same hash, could someone send those two files?
 
 
 remember that the flap over SHA1 being 'broken' a couple weeks ago was
 not from researchers finding multiple files with the same hash, but
 finding that it was more likly then expected that files would have the
 same hash.
 
 there was qa discussion on LKML within the last year about useing MD5
 hashes for identifying unique filesystem blocks (with the idea of being
 able to merge identical blocks) and in that discussion it was pointed
 out that collisions are a known real-life issue.
 
 so if collision detection is turned on in git, does that make it error
 out if it runs into a second file with the same hash, or does it do
 something else?
 
 David Lang
 

-- 
Brian
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Proposal for simplification and impovement of the git model

2005-04-16 Thread Luca Barbieri
In this message, a method to simplify and at the same time make more
powerful the git abstraction is presented.

I believe that the enhancements I propose make git adhere even more to
its spirit and make it more intuitive.

The proposal makes it much easier to build an SCM over git, obtaining in
particular the following advantages:

- Blob and tree objects become symmetric

- Commit objects are removed (their data is put inside tree objects)

- Commit comments are per-file

- A tree in a repository looks like a repository itself, with full
version information (now only the one mentioned in the commit object has
version information)

- File and directory renames are tracked

- Renames are tracked regardless of the way they are made (even with cp
and rm)

- Commit comments can be updated at any time by whoever made the change

- Doing the blame operation is trivial

- Minimizing disk space usage (at the expense of speed) by storing diffs
is easily doable



The basic idea is that rather than having single blob or tree revisions
as the base concept, the abstract base unit is the whole set of
modifications, with comments, leading to that state.

Of course, tracking that would be extremely space-inefficient, so we
instead track the current file contents, plus the public key of the
author and the hashes of all parents.


This is implemented with the following changes to git:


- The commit object is removed


- Each tree must have a .git-commit file that contains the information
previously in the commit object (only for immediate children, thus
having a .git-commit file in each directory), but with the author
public key instead of the comments


- Each blob will be hashed as the blob contents plus an header in a
canonical format that contains data similar to the data in the
.git-commit file


- When checked out, the blob header is put in a C/C++ comment, a #
comment, or if the file format is unknown, in an extended attribute or a
separate file

An example of a C/C++ file with metadata is the following:

// @parentSHA1_OF_PARENT1 @parentSHA1_OF_PARENT2
// @authorFINGERPRINT_OF_AUTHOR_PUBLIC_KEY
#include stdlib.h

int main(int argc, char** argv)
{
printf(Hello, world!\n);
return 0;
}

Note that @parent and @author in checked out files are NOT the same
of the ones in the repository but are crafted so that there is a single
@parent pointing to the repository file and @author is taken from
$HOME/.gitrc


- When the file is checked in, the header is parsed and removed.

*  If there is a single parent, its header is added and the resulting
buffer is hashed and compared with the parent's hash. If equal, the file
is unchanged and not committed.

* Otherwise, the header data is added in a canonical format and the
buffer is hashed and committed


- A new class of objects is added, that is not named by their hash, but
rather by a public key (or fingerprint of it), a timestamp and a name.

The object is correct if and only if the contents plus name and
timestamp are signed with the private key corresponding to public key in
the name.

Object names are formatted as id/name/args where url is an
uuid or url that makes the id/name unique, name is the name, and
args is additional data.

File names formatted like git/c/sha1 are interpreted as commit
comments for object sha1.


- For storage or network transmission purposes, a binary diff against
the parents can be stored instead of the contents af an object. This
will of course require to walk the whole history to rebuild it, but
smarter schemes are possible (e.g. keyframes, jump diffs, etc.).

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* Ingo Molnar [EMAIL PROTECTED] wrote:

 the patches contain all the existing metadata, dates, log messages and 
 revision history. (What i think is missing is the BK tree merge 
 information, but i'm not sure we want/need to convert them to GIT.)

author names are abbreviated, e.g. 'viro' instead of 
[EMAIL PROTECTED], and no committer information is 
included (albeit commiter ought to be Linus in most cases). These are 
limitations of the BK-CVS gateway i think.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git-pasky file mode handling

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 11:45:59AM CEST, I got a letter
where Russell King [EMAIL PROTECTED] told me that...
 Hi,

Hello,

 It seems that there's something weird going on with the file mode
 handling.  Firstly, some files in the git-pasky repository have mode
 0664 while others have 0644.
 
 Having pulled from git-pasky a number of times, with Petr's being the
 tracked repository, I now find that when I do an update-cache --refresh,
 it complains that the files need updating, despite show-diff showing no
 differences.  Investigating, this appears to be because the file modes
 are wrong for a number of the files.  All my files do not have group
 write.

this is was a problem with git apply, which did not apply mode changes
correctly until recently. If you have no local changes,

checkout-cache -f -a

should fix this. Hopefully.

 I notice in the changelog what appears to be a dependence on the umask.
 If this is so, please note that git appears to track the file modes,
 and any dependence upon the umask is likely to screw with this tracking.

I personally don't think I like the mode tracking at all. Some people
(Linus?) may want to have group +w. Other people (me) have their default
group as 'users', and I definitively don't want everyone to be able to
write to the files. :-)

I think we should track only whether the file is executable or not.
Linus?

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Francois Romieu
Ingo Molnar [EMAIL PROTECTED] :
[...]
 the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
 script that will apply all the patches in order and will create a 
 pristine 2.6.12-rc2 tree.

127 weeks of bk-commit mail for the 2.6 branch alone since october 2002
provides more than 44000 messages here. The figures are surprisingly
different.

 it needed many hours to finish, on a very fast server with tons of RAM, 
 and it also needed a fair amount of manual work to extract it and to 
 make it usable, so i guessed others might want to use the end result as 
 well, to try and generate large GIT repositories from them (or to run 
 analysis over the patches, etc.).

Has anyone already compared the (split/digested) content of the ChangeLog
file with the commit messages ? It raises the interesting question of
inserting the merge messages/patches in the sequence at the right place
but I'd like to know if someone met other issues.

--
Ueimor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: SHA1 hash safety

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 04:58:15PM CEST, I got a letter
where C. Scott Ananian [EMAIL PROTECTED] told me that...
 On Sat, 16 Apr 2005, Brian O'Mahoney wrote:
 
 (1) I _have_ seen real-life collisions with MD5, in the context of
Document management systems containing ~10^6 ms-WORD documents.
 
 Dude!  You could have been *famous*!  Why the 
 aitch-ee-double-hockey-sticks didn't you publish this when you found it?
 Seriously, man.
 
 Even given the known weaknesses in MD5, it would take much more than a 
 million documents to find MD5 collisions.  I can only conclude that the 
 hash was being used incorrectly; most likely truncated (my wild-ass guess 
 would be to 32 bits; a collision is likely with  50% probability in a 
 million document store for a hash of less than 40 bits).
 
 I know the current state of the art here.  It's going to take more than 
 just hearsay to convince me that full 128-bit MD5 collisions are likely. 
 I believe there are only two or so known to exist so far, and those were 
 found by a research team in China (which, yes, is fairly famous among the 
 cryptographic community now after publishing a paper consisting of little 
 apart from the two collisions themselves).

http://cryptography.hyperlink.cz/MD5_collisions.html

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/2] merge-trees script for Linus git

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Junio C Hamano wrote:
 
 LT NOTE NOTE NOTE! I could make read-tree do some of these nontrivial 
 LT merges, but I ended up deciding that only the matches in all three 
 LT states thing collapses by default.
 
  * Understood and agreed.

Having slept on it, I think I'll merge all the trivial cases that don't 
involve a file going away or being added. Ie if the file is in all three 
trees, but it's the same in two of them, we know what to do.

That way we'll leave thigns where the tree itself changed (files added or 
removed at any point) and/or cases where you actually need a 3-way merge.

 The userland merge policies need ways to extract the stage
 information and manipulate them.  Am I correct to say that you
 mean by ls-files -l the extracting part?

No, I meant show-files, since we need to show the index, not a tree (no 
valid tree can ever have the modes information, since (a) it doesn't 
have the space for it anyway and (b) we refuse to write out a dirty index 
file.



 
 LT I should make ls-files have a -l format, which shows the
 LT index and the mode for each file too.
 
 You probably meant ls-tree.  You used the word mode but it
 already shows the mode so I take it to mean stage.  Perhaps
 something like this?
 
 $ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b
 100644blobfe2a4177a760fd110e78788734f167bd633be8deCOPYING
 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:1  
 man/frotz.6
 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:2  
 man/frotz.6
 100664blobeeed997e557fb079f38961354473113ca0d0b115:3  
 man/frotz.6

Apart from the fact that it would be

show-files -l

since there are no tree objects that can have anything but fully merged
state, yes.

 Assuming that you would be working on that, I'd like to take the
 dircache manipulation part.  Let's think about the minimally
 necessary set of operations:
 
  * The merge policy decides to take one of the existing stage.
 
In this case we need a way to register a known mode/sha1 at a
path.  We already have this as update-cache --cacheinfo.
We just need to make sure that when update-cache puts
things at stage 0 it clears other stages as well.
 
  * The merge policy comes up with a desired blob somewhere on
the filesystem (perhaps by running an external merge
program).  It wants to register it as the result of the
merge.
 
We could do this today by first storing the desired blob
in a temporary file somewhere in the path the dircache
controls, update-cache --add the temporary file, ls-tree to
find its mode/sha1, update-cache --remove the temporary
file and finally update-cache --cacheinfo the mode/sha1.
This is workable but clumsy.  How about:
 
$ update-cache --graft [--add] desired-blob path
 
to say I want to register mode/sha1 from desired-blob, which
may not be of verify_path() satisfying name, at path in the
dircache?
 
  * The merge policy decides to delete the path.
 
We could do this today by first stashing away the file at the
path if it exists, update-cache --remove it, and restore
if necessary.  This is again workable but clumsy.  How about:
 
$ update-cache --force-remove path
 
to mean I want to remove the path from dircache even though
it may exist in my working tree?

Yes.

 Am I on the right track?

Exactly.

 You might want to go even lower level by letting them say
 something like:
 
  * update-cache --register-stage mode sha1 stage path
 
Registers the mode/sha1 at stage for path.  Does not look at
the working tree.  stage is [0-3]

I'd prefer not. I'd avoid playing games with the stages at any other level
than the full tree level until we show a real need for it.

Let's go with the known-needed minimal cases that are high-level enough to
make the scripting simple, and see if there is any reason to ever touch
the tree any other way.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Merge with git-pasky II.

2005-04-16 Thread Johannes Schindelin
Hi,

On Fri, 15 Apr 2005, David Woodhouse wrote:

 But if it can be done cheaply enough at a later date even though we end
 up repeating ourselves, and if it can be done _well_ enough that we
 shouldn't have just asked the user in the first place, then yes, OK I
 agree.

The repetition could be helped by using a cache.

Ciao,
Dscho
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Merge with git-pasky II.

2005-04-16 Thread Simon Fowler
On Sat, Apr 16, 2005 at 06:03:33PM +0200, Petr Baudis wrote:
 Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter
 where Simon Fowler [EMAIL PROTECTED] told me that...
  On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote:
   Simon
   
   given that you have multiple machines creating files, how do you deal 
   with 
   the idea of the same 'unique id' being assigned to different files by 
   different machines?
   
  The id is a sha1 hash of the current time and the full path of the
  file being added - the chances of that being replicated without
  malicious intent is extremely small. There are other things that
  could be used, like the hostname, username of the person running the
  program, etc, but I don't really see them being necessary.
 
 Why not just use UUID?
 
Hey, everything else in git seems to use sha1, so I just copied
Linus' sha1 code ;-)

All I wanted was something that had a good chance of being unique
across any potential set of distributed repositories, to avoid the
chance of accidental clashes. A sha1 hash of something that's not
likely to be replicated is a simple way to do that.

Simon

-- 
PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc
(crappy) Homepage: http://himi.org
doe #237 (see http://www.lemuria.org/DeCSS) 
My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ 


signature.asc
Description: Digital signature


Re: [PATCH 3/2] merge-trees script for Linus git

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Linus Torvalds wrote:
 
 Having slept on it, I think I'll merge all the trivial cases that don't 
 involve a file going away or being added. Ie if the file is in all three 
 trees, but it's the same in two of them, we know what to do.

Junio, I pushed this out, along with the two patches from you. It's still
more anal than my original tree-diff algorithm, in that it refuses to
touch anything where the name isn't the same in all three versions
(original, new1 and new2), but now it does the if two of them match, just
select the result directly trivial merges.

I really cannot see any sane case where user policy might dictate doing
anything else, but if somebody can come up with an argument for a merge
algorithm that wouldn't do what that trivial merge does, we can make a
flag for don't merge at all.

The reason I do want to merge at all in read-tree is that I want to
avoid having to write out a huge index-file (it's 1.6MB on the kernel, so
if you don't do _any_ trivial merges, it would be 4.8MB after reading
three trees) and then having people read it and parse it just to do stuff
that is obvious. Touching 5MB of data isn't cheap, even if you don't do a 
whole lot to it.

Anyway, with the modified read-tree, as far as I can tell it will now 
merge all the cases where one side has done something to a file, and the 
other side has left it alone (or where both sides have done the exact same 
modification). That should _really_ cut down the cases to just a few files 
for most of the kernel merges I can think of. 

Does it do the right thing for your tests?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Ingo Molnar wrote:
 
 i've converted the Linux kernel CVS tree into 'flat patchset' format, 
 which gave a series of 28237 separate patches. (Each patch represents a 
 changeset, in the order they were applied. I've used the cvsps utility.)
 
 the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
 script that will apply all the patches in order and will create a 
 pristine 2.6.12-rc2 tree.

Hey, that's great. I got the CVS repo too, and I was looking at it, but 
the more I looked at it, the more I felt that the main reason I want to 
import it into git ends up being to validate that my size estimates are at 
all realistic.

I see that Thomas Gleixner seems to have done that already, and come to a 
figure of 3.2GB for the last three years, which I'm very happy with, 
mainly because it seems to match my estimates to a tee. Which means that I 
just feel that much more confident about git actually being able to handle 
the kernel long-term, and not just as a stop-gap measure.

But I wonder if we actually want to actually populate the whole history.. 
Now that my size estimates have been verified, I have little actual real 
reason to put the history into git. There are no visualization tools done 
for git yet, and no helpers to actually find problems, and by the time 
there will be, we'll have new history.

So I'd _almost_ suggest just starting from a clean slate after all.  
Keeping the old history around, of course, but not necessarily putting it
into git now. It would just force everybody who is getting used to git in 
the first place to work with a 3GB archive from day one, rather than 
getting into it a bit more gradually.

What do people think? I'm not so much worried about the data itself: the
git architecture is _so_ damn simple that now that the size estimate has
been confirmed, that I don't think it would be a problem per se to put
3.2GB into the archive. But it will bog down rsync horribly, so it will
actually hurt synchronization untill somebody writes the rev-tree-like
stuff to communicate changes more efficiently..

IOW, it smells to me like we don't have the infrastructure to really work 
with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can 
build up the infrastructure in parallell with starting to really need it.

But it's _great_ to have the history in this format, especially since 
looking at CVS just reminded me how much I hated it.

Comments?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Thomas Gleixner
On Sat, 2005-04-16 at 10:04 -0700, Linus Torvalds wrote:

 So I'd _almost_ suggest just starting from a clean slate after all.  
 Keeping the old history around, of course, but not necessarily putting it
 into git now. It would just force everybody who is getting used to git in 
 the first place to work with a 3GB archive from day one, rather than 
 getting into it a bit more gradually.

Sure. We can export the 2.6.12-rc2 version of the git'ed history tree
and start from there. Then the first changeset has a parent, which just
lives in a different place. 
Thats the only difference to your repository, but it would change the
sha1 sums of all your changesets.

 What do people think? I'm not so much worried about the data itself: the
 git architecture is _so_ damn simple that now that the size estimate has
 been confirmed, that I don't think it would be a problem per se to put
 3.2GB into the archive. But it will bog down rsync horribly, so it will
 actually hurt synchronization untill somebody writes the rev-tree-like
 stuff to communicate changes more efficiently..

We have all the tracking information in SQL and we will post the data
base dump soon, so people interested in revision tracking can use this
as an information base.

 But it's _great_ to have the history in this format, especially since 
 looking at CVS just reminded me how much I hated it.

:)

One remark on the tree blob storage format. 
The binary storage of the sha1sum of the refered object is a PITA for
scripting. 
Converting the ASCII - binary for the sha1sum comparision should not
take much longer than the binary - ASCII conversion for the file
reference. Can this be changed ?

tglx


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: full kernel history, in patchset format

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter
where Thomas Gleixner [EMAIL PROTECTED] told me that...
 One remark on the tree blob storage format. 
 The binary storage of the sha1sum of the refered object is a PITA for
 scripting. 
 Converting the ASCII - binary for the sha1sum comparision should not
 take much longer than the binary - ASCII conversion for the file
 reference. Can this be changed ?

Huh, you aren't supposed to peek into trees directly. What's wrong with
ls-tree?

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Mike Taht

 * A script git-archive-tar is used to create a base tarball
   that roughly corresponds to linux-*.tar.gz.  This works as
   follows:
$ git-archive-tar C [B1 B2...]
   This reads the named commit C, grabs the associated tree
   (i.e.  its sub-tree objects and the blob they refer to), and
   makes a tarball of ??/??
   files.  The tarball does not have to contain any extra
   information to reproduce any ancestor of the named commit.
alternatively, git-archive-torrent to create a list of files for a 
bittorrent feed

--
Mike Taht
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: full kernel history, in patchset format

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 08:32:32PM CEST, I got a letter
where Petr Baudis [EMAIL PROTECTED] told me that...
 Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter
 where Thomas Gleixner [EMAIL PROTECTED] told me that...
  One remark on the tree blob storage format. 
  The binary storage of the sha1sum of the refered object is a PITA for
  scripting. 
  Converting the ASCII - binary for the sha1sum comparision should not
  take much longer than the binary - ASCII conversion for the file
  reference. Can this be changed ?
 
 Huh, you aren't supposed to peek into trees directly. What's wrong with
 ls-tree?

(I meant, you aren't supposed to peek into trees from scripts. Or well,
not not supposed, but it does not make much sense when you have
ls-tree.)

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: full kernel history, in patchset format

2005-04-16 Thread Thomas Gleixner
On Sat, 2005-04-16 at 20:32 +0200, Petr Baudis wrote:
 Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter
 where Thomas Gleixner [EMAIL PROTECTED] told me that...
  One remark on the tree blob storage format. 
  The binary storage of the sha1sum of the refered object is a PITA for
  scripting. 
  Converting the ASCII - binary for the sha1sum comparision should not
  take much longer than the binary - ASCII conversion for the file
  reference. Can this be changed ?
 
 Huh, you aren't supposed to peek into trees directly. What's wrong with
 ls-tree?

Why I'm not supposed ? Is this evil ?

My export script has all the data available, so I write the tree refs
directly. The full export runs ~1 hour. Thats long enough :) I tried the
git way and it slows me down by factor BIG (I dont remember the
number)

Also for reference tracking all the information might be available e.g.
by a database. Why should the revtool then use some tool to retrieve
information which is already there ?

tglx


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Thomas Gleixner wrote:
 
 One remark on the tree blob storage format. 
 The binary storage of the sha1sum of the refered object is a PITA for
 scripting. 
 Converting the ASCII - binary for the sha1sum comparision should not
 take much longer than the binary - ASCII conversion for the file
 reference. Can this be changed ?

I'd really rather not. Why don't you just use ls-tree for scripting? 
That's why it exists in the first place. 

It might make sense to have some simple selection capabilities built into 
ls-tree (ie ls-tree --match drivers/char/ -z treesha1 to get just a 
subtree out), but that depends entirely on how you end up using it.

The fact is, there should _never_ any reason to look at the objects
themselves directly. cat-file is a debugging aid, it shouldn't be
scripted (with the possible exception of cat-file blob  to just
extract the blob contents, since that object doesn't have any internal
structure).

That level of abstraction (we never look directly at the objects) is 
what allows us to change the object structure later. For example, we 
already changed the commit date thing once, and the tree object has 
obviously evolved a bit, and if we ever change the hash, the objects will 
change too, but if you always just script them using nice helper tools, 
you won't ever need to _care_. And that's how it should be.

If there's a tool missing, holler. THAT is the part I've been trying to
write: all the plumbing so that you _can_ script the thing sanely, and not
worry about how objects are created and worked with. 

For example, that index file format likely _will_ change. I ended up
doing the new stage flags in a way that kept the index file compatible
with old ones, but I did that mainly because it also happened to be the
easiest way to enforce the rule I wanted to enforce (ie the stage really
_is_ a part of the filename from a compare filenames standpoint, in
order to make sure that the stages are always ordered).

So if the index file change hadn't had that property, I'd have just said
I'll change the format, and anybody who tried to parse the index file
would have been _broken_.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Junio C Hamano
 JCH == Junio C Hamano [EMAIL PROTECTED] writes:

JCH I have been cooking this idea before I dove into the merge stuff
JCH and did not have time to implement it myself (Hint Hint), but I
JCH think something along the following lines would work nicely:

It should be fairly obvious from the context what I meant to
say, but in case somebody gets confused by my inaccurate
description of small details (or, before somebody nitpicks ;-),
I'd add some clarifications and corrections.

JCH  * Run diff-tree between neighboring commits [*1*] to find out
JCHthe set of blobs that are related.  Extract those related
JCHblobs and run diff [*2*] between them to see if it produces
JCHa patch smaller than the whole thing when compressed.  If
JCHdiff+patch is a win, then we do not have to transmit the blob
JCHthat we could reproduce by sending the diff.  Note that fact.

I talked only about blobs here, but I really mean all types:
commits, trees and blobs here.  Nothing prevents us from
extracting the raw data for trees and commits and run diff
between them.  We can use cat-file to do that today.

What we do not have is the reverse of $ cat-file type rawdata
(i.e. $ write-file type rawdata), but that is trivial to
write.  The raw data for related tree objects should delta well.
I do not think it is worth the effort to attempt delta for
commit objects.  Anything that git-archive-tar decides not to
send in diff+patch form, be it blob or tree or commit, should be
noted here, not just blob as my previous message incorrectly
implies.

JCH Given the above, the operation of git-archive-patch is also
JCH quite obvious.  Extract the diff package tarball into the
JCH objects/ directory that has (at least) the full Bn, uncompress
JCH the patch file part, and run patch on it. 

Of course after you ran patch to reproduce the raw data for the
blob or tree, we need the reverse of cat-file to register such
data under object/ hierarchy.

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Thomas Gleixner
On Sat, 2005-04-16 at 11:44 -0700, Linus Torvalds wrote:

 That level of abstraction (we never look directly at the objects) is 
 what allows us to change the object structure later. For example, we 
 already changed the commit date thing once, and the tree object has 
 obviously evolved a bit, and if we ever change the hash, the objects will 
 change too, but if you always just script them using nice helper tools, 
 you won't ever need to _care_. And that's how it should be.

For the export stuff its terrible slow. :(

I agree that using common tools is good. But we talk also about an open
format, so using a script to speed up certain tasks is not bad at all.

tglx



-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: full kernel history, in patchset format

2005-04-16 Thread Christopher Li
On Sat, Apr 16, 2005 at 07:43:27PM +0200, Petr Baudis wrote:
 Dear diary, on Sat, Apr 16, 2005 at 07:04:31PM CEST, I got a letter
 where Linus Torvalds [EMAIL PROTECTED] told me that...
  So I'd _almost_ suggest just starting from a clean slate after all.  
  Keeping the old history around, of course, but not necessarily putting it
  into git now. It would just force everybody who is getting used to git in 
  the first place to work with a 3GB archive from day one, rather than 
  getting into it a bit more gradually.
  
  Comments?
 
 FWIW, it looks pretty reasonable to me. Perhaps we should have a
 separate GIT repository with the previous history though, and in the
 first new commit the parent could point to the last commit from the
 other repository.
 
 Just if it isn't too much work, though. :-)

I think we can make the git using stackable repository. When it fail
to find an object, it will try it's to read from parent repository.
It is useful to slice the history.

I can have local repository that all the new object create by me will
store in my tree instead of the official one. Clean up the object in the
my local tree will be much easier it only need to work on a much smaller
repository. If all my change is merge to official tree, I just simply
empty my local repository.

About the kernel git repository. I think it is much easier just put
them in one tree.  So I don't need to worry about if I need to see
pre 2.6.12, I need to do this. And the full repository  need to
store in the server some where any way.

However I totally agree that people should not deal with unnecessary the history
when they start using the git tools. We should just make the tools
by default don't download all the histories. Only get it when user specific 
ask for it.

Why 2.6.12-rc2? When kernel grows to 2.6.15, a new user might not even need
pre 2.6.13 most of the time. If we make it very easier for people to get
history if they need, it will make them less motivate to store unnecessary
history locally (just in case I need it).

I think we should not advise using rsync to sync the whole git tree as
way to get update. We need to get use to only have a slice of the history
and get more if we needed.
The server should should provide some small metadata file like the
the rev-tool cache, so the SCM tools can download it to figure out what file
is needed to download to get to certain revision. Instead of download the
whole repository to figure out what is new.

We can even slice that metadata information to smaller pieces base on major 
release point.

Chris
 
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Jan-Benedict Glaw
On Sat, 2005-04-16 10:04:31 -0700, Linus Torvalds [EMAIL PROTECTED]
wrote in message [EMAIL PROTECTED]:

 What do people think? I'm not so much worried about the data itself: the
 git architecture is _so_ damn simple that now that the size estimate has
 been confirmed, that I don't think it would be a problem per se to put
 3.2GB into the archive. But it will bog down rsync horribly, so it will
 actually hurt synchronization untill somebody writes the rev-tree-like
 stuff to communicate changes more efficiently..
 
 IOW, it smells to me like we don't have the infrastructure to really work 
 with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can 
 build up the infrastructure in parallell with starting to really need it.

3GB is quite some data, but I'd accept and prefer to download it from
somewhere. I think that it's worth it.

I accept that there are people out there which would love to get a
smaller archive, but at least most developers that would actually use it
for day-to-day work *do* have the bandwidth to download it. Maybe we'd
also prepare (from time to time) bzip'ed tarballs, which I expect to be
a tad smaller.

MfG, JBG

-- 
Jan-Benedict Glaw   [EMAIL PROTECTED]. +49-172-7608481 _ O _
Eine Freie Meinung in  einem Freien Kopf| Gegen Zensur | Gegen Krieg  _ _ O
 fuer einen Freien Staat voll Freier Brger | im Internet! |   im Irak!   O O 
O
ret = do_actions((curr | FREE_SPEECH)  ~(NEW_COPYRIGHT_LAW | DRM | TCPA));


signature.asc
Description: Digital signature


Re: full kernel history, in patchset format

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Thomas Gleixner wrote:
 
 For the export stuff its terrible slow. :(

I don't really see your point.

If you already know what the tree is like you say, you don't care about
the tree object. And if you don't know what the tree is, what _are_ you
doing?

In other words, show us what you're complaining about. If you're looking
into the trees yourself, then the binary representation of the sha1 is
already what you want. That _is_ the hash. So why do you want it in ASCII?  
And if you're not looking into the tree directly, but using cat-file
tree and you were hoping to see ASCII data, then that's certainly not
going to be any faster than just doing ls-tree instead.

In other words, I don't see your point. Either you want ascii output for 
scripting, or you don't. First you claimed that you did, and that you 
would want the tree object to change in order to do so. Now you claim that 
you can't use ls-tree because it's too slow. 

That just isn't making any sense. You're mixing two totally different
levels, and complaining about performance when scripting things. Yet
you're talking about a 20-byte data structure that is trivial to convert
to any format you want.

What kind of _strange_ scripting architecture is so fast that there's a
difference between cat-file and ls-tree and can handle 17,000 files in
60,000 revisions, yet so slow that you can't trivially convert 20 bytes of 
data?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* David Mansfield [EMAIL PROTECTED] wrote:

 Ingo Molnar wrote:
 * Ingo Molnar [EMAIL PROTECTED] wrote:
 
 
 the patches contain all the existing metadata, dates, log messages and 
 revision history. (What i think is missing is the BK tree merge 
 information, but i'm not sure we want/need to convert them to GIT.)
 
 
 author names are abbreviated, e.g. 'viro' instead of 
 [EMAIL PROTECTED], and no committer information is 
 included (albeit commiter ought to be Linus in most cases). These are 
 limitations of the BK-CVS gateway i think.
 
 
 Glad to hear cvsps made it through!  I'm curious what the manual 
 fixups required were, except for the binary file issue (logo.gif).

--cvs-direct was needed to speed it up from 'several days to finish' to 
'several hours to finish', but it crashed on a handful of patches [i 
used the latest devel snapshot so this isnt a complaint]. (one of the 
crashes was when generating 1860.patch.) Also, 'cvs rdiff' apparently 
emits an empty patch for diffs that remove a file that end without 
having a newline character - but this isnt cvsps's problem.  (grep for 
+++ in the patchset to find those cases.)

 As to the actual email addresses, for more recent patches, the 
 Signed-off should help.  For earlier ones, isn't their some script 
 which 'knows' a bunch of canonical author-email mappings? (the 
 shortlog script or something)?

yeah, that's not that much of a problem, most of the names are unique, 
and the rest can be fixed up too.

 Is the full committer email address actually in the changeset in BK?  
 If so, given that we have the unique id (immutable I believe) of the 
 changset, could it be extracted directly from BK?

i think it's included in BK.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread Ingo Molnar

* Linus Torvalds [EMAIL PROTECTED] wrote:

  the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a 
  script that will apply all the patches in order and will create a 
  pristine 2.6.12-rc2 tree.
 
 Hey, that's great. I got the CVS repo too, and I was looking at it, 
 but the more I looked at it, the more I felt that the main reason I 
 want to import it into git ends up being to validate that my size 
 estimates are at all realistic.
 
 I see that Thomas Gleixner seems to have done that already, and come 
 to a figure of 3.2GB for the last three years, which I'm very happy 
 with, mainly because it seems to match my estimates to a tee. [...]

(yeah, we apparently worked in parallel - i only learned about his 
efforts after i sent my mail. He was using BK to extract info, i was 
using the CVS tree alone and no BK code whatsoever. (I dont think there 
will be any argument about who owns what, but i wanted to be on the safe 
side, and i also wanted to see how complete and usable the CVS metadata 
is - it's close to perfect i'd say, for the purposes i care about.))

 But I wonder if we actually want to actually populate the whole 
 history..

yeah, it definitely feels a bit brave to import 28,000 changesets into a 
source-code database project that will be a whopping 2 weeks old in 2 
days ;) Even if we felt 100% confident about all the basics (which we do 
of course ;), it's just simply too young to tie things down via a 3.2GB 
database. It feels much more natural to grow it gradually, 28,000 
changesets i'm afraid would just suffocate the 'project growth 
dynamics'. Not going too fast is just as important as not going too 
slow.

I didnt generate the patchset to get it added into some central 
repository right now, i generated it to check that we _do_ have all the 
revision history in an easy to understand format which does generate 
today's kernel tree, so that we can lean back and worry about the full 
database once things get a bit more settled down (in a couple of months 
or so). It's also an easy testbed for GIT itself.

but the revision history was one of the main reasons i used BK myself, 
so we'll need a merged database eventually. Occasionally i needed to 
check who was the one who touched a particular piece of code - was that 
fantastic new line of code written by me, or was that buggy piece of 
crap written by someone else? ;) Also, looking at a change and then 
going to the changeset that did it, and then looking at the full picture 
was pretty useful too. So that sort of annotation, and generally 
navigating around _quickly_ and looking at the 'flow' of changes going 
into a particular file was really useful (for me).

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Martin Mares
Hello!

 This adds a program to download a commit, the trees, and the blobs in them
 from a remote repository using HTTP. It skips anything you already have.

Is it really necessary to write your own HTTP downloader? If so, is it
necessary to forget basic stuff like the Host: header? ;-)

If you feel that it should be optimized for speed, then at least use
persistent connections.

 + if (memcmp(target, http://;, 7))
 + return -1;

Can crash if the string is too short.

 + entry = gethostbyname(name);
 + memcpy(sockad.sin_addr.s_addr,
 +((struct in_addr *)entry-h_addr)-s_addr, 4);

Can crash if the host doesn't exist or if you feed it with an URL containing
port number.

 +static int get_connection()

(void)

 + local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666);

What if it fails?

Have a nice fortnight
-- 
Martin `MJ' Mares   [EMAIL PROTECTED]   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
A student who changes the course of history is probably taking an exam.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sat, 16 Apr 2005, Tony Luck wrote:

 On 4/16/05, Daniel Barkalow [EMAIL PROTECTED] wrote:
  +buffer = read_sha1_file(sha1, type, size);
 
 You never free this buffer.

Ideally, this should all be rearranged to share the code with
read-tree, and it should be fixed in common.

 It would also be nice if you saved tree objects in some temporary file
 and did not install them until after you had fetched all the blobs and
 trees that this tree references.  Then if your connection is interrupted
 you can just restart it.

It looks over everything relevant, even if it doesn't need to download
anything, so it should work to continue if it stops in between.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Adam Kropelin
Tony Luck wrote:
Otherwise this looks really nice.  I was going to script something
similar using wget ... but that would have made zillions of seperate
connections.  Not so kind to the server.
How about building a file list and doing a batch download via 'wget -i 
/tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
connectionss when successive URLs point to the same server.

Writing yet another http client does seem a bit pointless, what with wget 
and curl available. The real win lies in creating the smarts to get the 
minimum number of files.

--Adam
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Martin Mares wrote:

 Hello!
 
  This adds a program to download a commit, the trees, and the blobs in them
  from a remote repository using HTTP. It skips anything you already have.
 
 Is it really necessary to write your own HTTP downloader? If so, is it
 necessary to forget basic stuff like the Host: header? ;-)

I wanted to get something hacked quickly; can you suggest a good one to
use?

 If you feel that it should be optimized for speed, then at least use
 persistent connections.

That's the next step.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Daniel Barkalow
On Sat, 16 Apr 2005, Adam Kropelin wrote:

 Tony Luck wrote:
  Otherwise this looks really nice.  I was going to script something
  similar using wget ... but that would have made zillions of seperate
  connections.  Not so kind to the server.
 
 How about building a file list and doing a batch download via 'wget -i 
 /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
 connectionss when successive URLs point to the same server.

You need to look at some of the files before you know what other files to
get. You could do it in waves, but that would be excessively complicated
to code and not the most efficient anyway.

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread David Lang
that's the difference between CS researchers and sysadmins.
sysadmins realize that there are an infinante number of files that map to 
the same hash value and plan accordingly (becouse we KNOW we will run 
across them eventually), and don't see it as a big deal when we finally 
do.

CS researches quote statistics that show how hard it is to intentiallly 
create two files with the same hash and insist it just doesn't happen 
until presented by the proof, at which point it is a big deal.

a difference in viewpoints.
David Lang
 On Sat, 16 Apr 2005, C. Scott Ananian wrote:
Date: Sat, 16 Apr 2005 10:58:15 -0400 (EDT)
From: C. Scott Ananian [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Cc: David Lang [EMAIL PROTECTED], Ingo Molnar [EMAIL PROTECTED],
git@vger.kernel.org
Subject: Re: SHA1 hash safety
On Sat, 16 Apr 2005, Brian O'Mahoney wrote:
(1) I _have_ seen real-life collisions with MD5, in the context of
   Document management systems containing ~10^6 ms-WORD documents.
Dude!  You could have been *famous*!  Why the aitch-ee-double-hockey-sticks 
didn't you publish this when you found it?
Seriously, man.

Even given the known weaknesses in MD5, it would take much more than a 
million documents to find MD5 collisions.  I can only conclude that the hash 
was being used incorrectly; most likely truncated (my wild-ass guess would be 
to 32 bits; a collision is likely with  50% probability in a million 
document store for a hash of less than 40 bits).

I know the current state of the art here.  It's going to take more than just 
hearsay to convince me that full 128-bit MD5 collisions are likely. I believe 
there are only two or so known to exist so far, and those were found by a 
research team in China (which, yes, is fairly famous among the cryptographic 
community now after publishing a paper consisting of little apart from the 
two collisions themselves).

Please, let's talk about hash collisions responsibly.  I posted earlier about 
the *actual computed probability* of finding two files with an SHA-1 
collision before the sun goes supernova.  It's 10^28 to 1 against.
The recent cryptographic works has shown that there are certain situations 
where a decent amount of computer work (2^69 operations) can produce two 
sequences with the same hash, but these sequences are not freely chosen; 
they've got very specific structure.  This attack does not apply to 
(effectively) random files sitting in a SCM.
 http://www.schneier.com/blog/archives/2005/02/sha1_broken.html

That said, Linux's widespread use means that it may not be unimaginable for 
an attacker to devote this amount of resources to an attack, which would 
probably involve first committing some specially structured file to the SCM 
(but would Linus accept it?) and then silently corrupting said file via a 
SHA1 collision to toggle some bits (which would presumably Do Evil).  Thus 
hashes other than SHA1 really ought to be considered...

...but the cryptographic community has not yet come to a conclusion on what 
the replacement ought to be.  These attacks are so new that we don't really 
understand what it is about the structure of SHA1 which makes them possible, 
which makes it hard to determine which other hashes are similarly vulnerable. 
It will take time.

I believe Linus has already stated on this list that his plan is to 
eventually provide a tool for bulk migration of an existing SHA1 git 
repository to a new hash type.   Basically munging through the repository in 
bulk, replacing all the hashes.  This seems a perfectly adequate strategy at 
the moment.
--scott

WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY 
genetic immediate radar SCRANTON COBRA JANE KGB Shoal Bay atomic Bejing
( http://cscott.net/ )

--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] update-cache --refresh cache entry leak

2005-04-16 Thread Junio C Hamano
When update-cache --refresh replaces an existing cache entry
with a new one, it forgets to free the original.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---
update-cache.c:  61d2b93a751f35ba24f479cd4fc151188916f02a
--- update-cache.c
+++ update-cache.c  2005-04-16 15:49:03.0 -0700
@@ -203,6 +203,8 @@
printf(%s: needs update\n, ce-name);
continue;
}
+   if (new != ce)
+   free(ce);
active_cache[i] = new;
}
 }

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Full history

2005-04-16 Thread Thomas Gleixner
Hi,

I can publish the stuff on monday from a university nearby.

---

total blob objects   = 228384
total tree objects   = 172507 
total commit objects =  55877

The empty changesets which are noting merges are omitted at the
moment. Is it of interest to include them ??

It might also be interesting to export/merge the various
subsystem/maintainer trees including 2.4 into this archive. This would
cover the complete history 

Disk space according to # du -sh
blobs ~ 2GiB
tree and commit objects ~ 1.3GiB

I looked at the spread of the 450k+ objects over the 256 subdirectories
in my exported git repository:

total 456768
max per XX subdir = 1646
avg per XX subdir = 1784
min per XX subdir = 1936

tglx


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Add clone support to lntree

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sat, Apr 16, 2005 at 05:06:54AM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...

  I think fork is as good as anything for describing the operation. I had
  thought about clone because it seemed to fill the role that bk
  clone had (although I never used BK, so I'm not sure). It doesn't seem
  useful to me to try cloning multiple remote repositories, since you'd get
  a copy of anything common from each; you just want to suck everything into
  the same .git/objects and split off working directories.
 
 Actually, what about if git pull outside of repository did what git
 clone does now? I'd kinda like clone instead of fork too.

This seems like the best solution to me, too. Although that would make
pull take a URL when making a new repository and not otherwise, which
might be confusing. init-remote perhaps, or maybe just have init do it
if given a URL?

-Daniel
*This .sig left intentionally blank*

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: full kernel history, in patchset format

2005-04-16 Thread David Lang
On Sat, 16 Apr 2005, Thomas Gleixner wrote:
On Sat, 2005-04-16 at 10:04 -0700, Linus Torvalds wrote:
So I'd _almost_ suggest just starting from a clean slate after all.
Keeping the old history around, of course, but not necessarily putting it
into git now. It would just force everybody who is getting used to git in
the first place to work with a 3GB archive from day one, rather than
getting into it a bit more gradually.
Sure. We can export the 2.6.12-rc2 version of the git'ed history tree
and start from there. Then the first changeset has a parent, which just
lives in a different place.
Thats the only difference to your repository, but it would change the
sha1 sums of all your changesets.
at least start with a full release. say 2.6.11
the history won't be blank, but it's far more likly that people will care 
about the details between 2.6.11 and 2.6.12 and will want to go back 
before -rc2

David Lang
--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
 what I'm talking about is the chance that somewhere, sometime there will 
 be two different documents that end up with the same hash

I have vastly greater chance of a file colliding due to hardware or
software glitch than a random message digest collision of two legitimate
documents.

I've lost quite a few files in 25 years of computing to just
such glitches, sometimes without knowing it until months or years
later.

We've already computed the chances of a random pure hash collision
with SHA1 - it's something like an average of 1 collision every
10 billion years if we have 10,000 coders generating 1 new file
version every minute, non-stop, 24 hours a day, 365 days a year.

Get real.  There are _many_ sources of random error in our
tools.  When some sources are billions of billions times
more likely to occur, it makes sense to worry about them first.

Reminds me of the drunk looking under the lamp post for the
house keys he dropped - because that's where the light is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
 sysadmins realize that there are an infinante number of files that map to 

Sysadmins know that there are an infinite ways for their
systems to crap out, and try to cover for the ones that
there is a snow balls chance in Hades of them seeing in
their lifetime.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Optionally tell show-diff to show only named files

2005-04-16 Thread Junio C Hamano
SCMs have ways to say I want diff only this particular file,
or I want diff files under this directory.  This patch teaches
show-diff to do something similar.  Without command line
arguments, it still examines everything in the dircache as
before.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 show-diff.c |   38 ++
 1 files changed, 30 insertions(+), 8 deletions(-)

show-diff.c:  5f3d4699566843a5448260e5da286ed65d90e397
--- show-diff.c
+++ show-diff.c 2005-04-16 16:07:07.0 -0700
@@ -55,6 +55,23 @@
}
 }
 
+static const char *show_diff_usage = show-diff [-s] [-q] [paths...];
+
+static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt)
+{
+   int i;
+   int namelen = ce_namelen(ce);
+   for (i = 0; i  cnt; i++) {
+   int speclen = strlen(spec[i]);
+   if (! strncmp(spec[i], ce-name, speclen) 
+   speclen = namelen 
+   (ce-name[speclen] == 0 ||
+ce-name[speclen] == '/'))
+   return 1;
+   }
+   return 0;
+}
+
 int main(int argc, char **argv)
 {
int silent = 0;
@@ -62,18 +79,19 @@
int entries = read_cache();
int i;
 
-   for (i = 1; i  argc; i++) {
-   if (!strcmp(argv[i], -s)) {
+   while (1  argc  argv[1][0] == '-') {
+   if (!strcmp(argv[1], -s))
silent_on_nonexisting_files = silent = 1;
-   continue;
-   }
-   if (!strcmp(argv[i], -q)) {
+   else if (!strcmp(argv[1], -q))
silent_on_nonexisting_files = 1;
-   continue;
-   }
-   usage(show-diff [-s] [-q]);
+   else
+   usage(show_diff_usage);
+   argv++; argc--;
}
 
+   /* At this point, if argc == 1, then we are doing everything.
+* Otherwise argv[1] .. argv[argc-1] have the explicit paths.
+*/
if (entries  0) {
perror(read_cache);
exit(1);
@@ -86,6 +104,10 @@
char type[20];
void *new;
 
+   if (1 argc 
+   ! matches_pathspec(ce, argv+1, argc-1))
+   continue;
+
if (stat(ce-name, st)  0) {
if (errno == ENOENT  silent_on_nonexisting_files)
continue;


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Martin Mares
Hi!

 We've already computed the chances of a random pure hash collision
 with SHA1 - it's something like an average of 1 collision every
 10 billion years if we have 10,000 coders generating 1 new file
 version every minute, non-stop, 24 hours a day, 365 days a year.

GIT is safe even for the millions of monkeys writing Shakespeare :-)

Have a nice fortnight
-- 
Martin `MJ' Mares   [EMAIL PROTECTED]   http://atrey.karlin.mff.cuni.cz/~mj/
Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth
Homo homini lupus, frater fratri lupior, bohemus bohemo lupissimus.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] optimize gitdiff-do script

2005-04-16 Thread Paul Jackson
Rewrite gitdiff-do so that it works with arbitrary
whitespace (space, tab, newline, ...) in filenames.

Reduce number of subcommands execv'd by a
third, by only calling 'rm' once, at end, not each
loop.

Avoid using shell arrays; perhaps more portable.

Avoid 'echo -e' when displaying names; dont expand escape
  sequences in names.

Use shell noglob (-f) to minimize getdents() calls.

Simplify argument parsing and tmp file management.

Comment the nastier shell patterns.

This reduces the time by about 1/3 of what it was.

Signed-off-by: Paul Jackson [EMAIL PROTECTED]

Index: git-pasky-0.4/gitdiff-do
===
--- git-pasky-0.4.orig/gitdiff-do   2005-04-16 13:19:07.0 -0700
+++ git-pasky-0.4/gitdiff-do2005-04-16 15:33:28.0 -0700
@@ -2,19 +2,22 @@
 #
 # Make a diff between two GIT trees.
 # Copyright (c) Petr Baudis, 2005
+# Copyright (c) Paul Jackson, 2005
 #
 # Takes two parameters identifying the two trees/commits to compare.
 # Empty string will be substitued to HEAD revision.
 #
 # Note that this is probably the most performance critical shell script
-# in the whole GIT suite. That's also why I resorted to bash builtin
-# features and stuff. -- [EMAIL PROTECTED]
+# in the whole GIT suite.
 #
 # Outputs a diff converting the first tree to the second one.
 
+set -f   # keep shell from scanning . to expand wildcards
 
-id1=$1; shift
-id2=$1; shift
+t=${TMPDIR:-/usr/tmp}/gitdiff.$$
+trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
+
+id1=$1; id2=$2; shift 2
 
 # Leaves the result in $label.
 mkbanner () {
@@ -32,58 +35,55 @@ mkbanner () {
[ $labelapp ]  label=$label  ($labelapp)
 }
 
-t=${TMPDIR:-/usr/tmp}/gitdiff.$$
-trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
-diffdir=$t.1
-diffdir1=$diffdir/$id1
-diffdir2=$diffdir/$id2
-mkdir -p $diffdir1 $diffdir2
-
-while [ $1 ]; do
-   declare -a param
-   param=($1);
-   op=${param[0]:0:1}
-   mode=${param[0]:1}
-   type=${param[1]}
-   sha=${param[2]}
-   name=${param[3]}
-
-   echo -e Index: 
$name\n===
-
-   if [ $type = tree ]; then
-   # diff-tree will kindly diff the subdirs for us
-   # XXX: What about modes?
-   shift; continue
-   fi
-
-   loc1=$diffdir1/$name; dir1=${loc1%/*}
-   loc2=$diffdir2/$name; dir2=${loc2%/*}
-   ([ -d $dir1 ]  [ -d $dir2 ]) || mkdir -p $dir1 $dir2
-
-   case $op in
-   +)
-   mkbanner $loc2 $id2 $name $mode $sha
-   diff -L /dev/null  (tree:$id1) -L $label -u /dev/null 
$loc2
-   ;;
-   -)
-   mkbanner $loc1 $id1 $name $mode $sha
-   diff -L $label -L /dev/null  (tree:$id2) -u $loc1 
/dev/null
-   ;;
-   *)
-   modes=(${mode/-/ });
-   mode1=${modes[0]}; mode2=${modes[1]}
-   shas=(${sha/-/ });
-   sha1=${shas[0]}; sha2=${shas[1]}
-   mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label
-   mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label
-   diff -L $label1 -L $label2 -u $loc1 $loc2
-   ;;
-   *)
-   echo Unknown operator $op, ignoring delta: $1;;
-   esac
-
-   rm -f $loc1 $loc2
-   shift
+for arg
+do
+  IFS=''
+  set X$arg# X: don't let shell set see leading '+' in $arg
+  op=$1
+  mode=${op#X?}# trim leading X? 1st two chars
+  type=$2
+  sha=$3
+  # if 4+ tabs, trim 1st 3 fields on 1st line with sed
+  case $arg in
+  *\   *\  *\  *\  *)
+name=$(echo $arg |
+  /bin/sed '1s/[^  ]*  [^  ]*  [^  ]*  //')
+;;
+  *)
+name=$4
+;;
+  esac
+
+  echo Index: $name
+  echo ===
+
+  test $type = tree  continue
+
+  loc1=$t.1
+  loc2=$t.2
+
+  case $op in
+  X+*)
+mkbanner $loc2 $id2 $name $mode $sha
+diff -L /dev/null  (tree:$id1) -L $label -u /dev/null $loc2
+;;
+  X-*)
+mkbanner $loc1 $id1 $name $mode $sha
+diff -L $label -L /dev/null  (tree:$id2) -u $loc1 /dev/null
+;;
+  X\**)
+mode1=${mode%-*}  # trim '-' and after
+mode2=${mode#*-}  # trim up to and including '-'
+sha1=${sha%-*}# trim '-' and after
+sha2=${sha#*-}# trim up to and including '-'
+
+mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label
+mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label
+diff -L $label1 -L $label2 -u $loc1 $loc2
+;;
+  *)
+badop=$(echo $op | sed 's/.\(.\).*/\1/')
+echo Unknown operator $badop, ignoring delta: $1
+;;
+  esac
 done
-
-rm -rf $diffdir

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the 

[PATCH] missing mkdir -p flag in gitdiff-do

2005-04-16 Thread Paul Jackson
First mkdir in gitdiff-do missing -p, so useless error

Signed-off-by: Paul Jackson [EMAIL PROTECTED]

Index: git-pasky-0.4/gitdiff-do
===
--- git-pasky-0.4.orig/gitdiff-do   2005-04-16 13:18:29.0 -0700
+++ git-pasky-0.4/gitdiff-do2005-04-16 13:19:07.0 -0700
@@ -37,7 +37,7 @@ trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2
 diffdir=$t.1
 diffdir1=$diffdir/$id1
 diffdir2=$diffdir/$id2
-mkdir $diffdir1 $diffdir2
+mkdir -p $diffdir1 $diffdir2
 
 while [ $1 ]; do
declare -a param

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] show-diff -z option for machine readable output.

2005-04-16 Thread Junio C Hamano
This patch adds the -z option to the show-diff command,
primarily for use by scripts.  The information emitted is
similar to that of -q option, but in a more machine readable
form.  Records are terminated with NUL instead of LF, so that
the scripts can deal with pathnames with embedded newlines.

To be applied on top of my previous patch:

[PATCH] Optionally tell show-diff to show only named files.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 show-diff.c |   28 +++-
 1 files changed, 19 insertions(+), 9 deletions(-)

show-diff.c: 0c5fb1a381a6c6689dca3f52d0c66bb591cadb39
--- show-diff.c
+++ show-diff.c 2005-04-16 16:23:40.0 -0700
@@ -55,7 +55,7 @@
}
 }
 
-static const char *show_diff_usage = show-diff [-s] [-q] [paths...];
+static const char *show_diff_usage = show-diff [-s] [-q] [-z] [paths...];
 
 static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt)
 {
@@ -76,6 +76,7 @@
 {
int silent = 0;
int silent_on_nonexisting_files = 0;
+   int machine_readable = 0;
int entries = read_cache();
int i;
 
@@ -84,6 +85,9 @@
silent_on_nonexisting_files = silent = 1;
else if (!strcmp(argv[1], -q))
silent_on_nonexisting_files = 1;
+   else if (!strcmp(argv[1], -z)) {
+   machine_readable = 1;
+   }
else
usage(show_diff_usage);
argv++; argc--;
@@ -99,7 +103,7 @@
for (i = 0; i  entries; i++) {
struct stat st;
struct cache_entry *ce = active_cache[i];
-   int n, changed;
+   int changed;
unsigned long size;
char type[20];
void *new;
@@ -111,18 +115,24 @@
if (stat(ce-name, st)  0) {
if (errno == ENOENT  silent_on_nonexisting_files)
continue;
-   printf(%s: %s\n, ce-name, strerror(errno));
-   if (errno == ENOENT)
-   show_diff_empty(ce);
+   if (machine_readable)
+   printf(X %s%c, ce-name, 0);
+   else {
+   printf(%s: %s\n, ce-name, strerror(errno));
+   if (errno == ENOENT)
+   show_diff_empty(ce);
+   }
continue;
}
changed = cache_match_stat(ce, st);
if (!changed)
continue;
-   printf(%s:  , ce-name);
-   for (n = 0; n  20; n++)
-   printf(%02x, ce-sha1[n]);
-   printf(\n);
+   if (!machine_readable)
+   printf(%s: %s\n, ce-name, sha1_to_hex(ce-sha1));
+   else {
+   printf(%s %s%c, sha1_to_hex(ce-sha1), ce-name, 0);
+   continue;
+   }
fflush(stdout);
if (silent)
continue;

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Add clone support to lntree

2005-04-16 Thread Petr Baudis
Dear diary, on Sat, Apr 16, 2005 at 05:17:00AM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
 On Sat, 16 Apr 2005, Petr Baudis wrote:
 
  Dear diary, on Sat, Apr 16, 2005 at 04:47:55AM CEST, I got a letter
  where Petr Baudis [EMAIL PROTECTED] told me that...
 git branch --- creates a branch from a given commit
 (when passed empty commit, creates a branch
 from the current commit and sets the working
 tree to that branch)
   Note that there is a bug in current git update - it will allow you to
   bring several of your trees to follow the same branch, or even a remote
   branch. This is not even supposed to work, and will be fixed when I get
   some sleep. You will be able to do git pull even on local branches, and
   the proper solution for this will be just tracking the branch you want
   to follow.
  
  I must admit that I'm not entirely decided yet, so I'd love to hear your
  opinion.
  
  I'm wondering, whether each tree should be fixed to a certain branch.
  That is, you decide a name when you do git fork, and then the tree
  always follows that branch. (It always has to follow [be bound to]
  *some* branch, and each branch can be followed by only a single tree at
  a time.)
 
 I don't think I'm following the use of branches. Currently, what I do is
 have a git-pasky and a git-linus, and fork off a working directory from
 one of these for each thing I want to work on. I do some work, commit as I
 make progress, and then do a diff against the remote head to get a patch
 to send off. If I want to do a series of patches which depend on each
 other, I fork my next directory off of my previous one rather than off of
 a remote base. I haven't done much rebasing, so I haven't worked out how I
 would do that most effectively.

Yes. And that's exactly what the branches allow you to do. You just do

git fork myhttpclient ~/myhttpclientdir

then you do some hacking, and when you have something usable, you can
go back to your main working directory and do

git merge -b when_you_started myhttpclient

Since you consider the code perfect, you can now just rm -rf
~/myhttpclient.

Suddenly, you get a mail from mj pointing out some bugs, and it looks
like there are more to come. What to do?

git fork myhttpclient ~/myhttpclientdir

(Ok, this does not work, but that's a bug, will fix tomorrow.) This will
let you take off when you left in your work on the branch.

git update for seeking between commits is probably extremely important
for any kind of binary search when you are wondering when did this bug
appeared first, or when you are exploring how certain branch evolved
over time. Doing git fork for each successive iteration sounds horrible.


Now, what about git branch and git update for switching between
branches? I think this is the most controversial part; these are
basically just shortcuts for not having to do git fork, and I wouldn't
mind so much removing them, if you people really consider them too ugly
a wart for the soft clean git skin. I admit that they both come from a
hidden prejudice that git fork is going to be slow and eat a lot of
disk.

The idea for git branch is to mark a commit as this is a branch but I
don't want to git fork (because I'm lazy or short on disk space or
whatever). Let's say you are tracking a branch, do some local commits
and then want to untrack. This will get you back to HEAD.local, but you
want to keep a reference for your local commits, and possibly work on
them more later - so you mark them as a branch. But thinking about it, I
couldn't come up with another usage case than this, and I think that now
that we have git fork, I will modify git track behaviour heavily so that
tracking/untracking won't really switch you to the other branch
completely, but really only tell git pull that you want the pulled
updates applied. So git branch command will likely go.

The idea for git update for switching between branches is that
especially when you have two rather similar branches and mostly do stuff
on one of them, but sometimes you want to do something on the other one,
you can do just quick git update, do stuff, and git update back, without
any forking.


Note that this all is *absolutely* subject to change, provided you can
convince me about some better way. ;-) My mindset on this is pretty
open. This is just what seems to me as a pretty flexible and elegant to
do stuff, while giving you enough freedom to pick your own style.

 I think I can make this space efficient by hardlinking unmodified blobs to
 a directory of cached expanded blobs.

I don't know but I really feel *very* unsafe when doing that. What if
something screws up and corrupts my base... way too easy. And it gets
pretty inconvenient and even more dangerous when you get the idea to do
some modifications on your tree by something else than your favorite
editor (which you've already checked does the right 

Re: [PATCH] fix mktemp (remove mktemp ;)

2005-04-16 Thread Jan-Benedict Glaw
On Sat, 2005-04-16 16:27:43 -0700, Paul Jackson [EMAIL PROTECTED]
wrote in message [EMAIL PROTECTED]:
 Index: git-pasky-0.4/gitcommit.sh
 ===
 --- git-pasky-0.4.orig/gitcommit.sh   2005-04-12 10:39:14.0 -0700
 +++ git-pasky-0.4/gitcommit.sh2005-04-16 13:17:49.0 -0700
 @@ -60,7 +60,9 @@ for file in $commitfiles; do
   echo $file;
  done
  echo Enter commit message, terminated by ctrl-D on a separate line:
 -LOGMSG=`mktemp -t gitci.XX`
 +t=${TMPDIR:-/usr/tmp}/gitapply.$$

/usr/tmp/ ??? Hey, /usr may be mounted read-only!  Why not just use /tmp ?

MfG, JBG

-- 
Jan-Benedict Glaw   [EMAIL PROTECTED]. +49-172-7608481 _ O _
Eine Freie Meinung in  einem Freien Kopf| Gegen Zensur | Gegen Krieg  _ _ O
 fuer einen Freien Staat voll Freier Brger | im Internet! |   im Irak!   O O 
O
ret = do_actions((curr | FREE_SPEECH)  ~(NEW_COPYRIGHT_LAW | DRM | TCPA));
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Petr Baudis
Dear diary, on Sun, Apr 17, 2005 at 01:27:43AM CEST, I got a letter
where Paul Jackson [EMAIL PROTECTED] told me that...
 Remove mktemp usage - it doesn't work on
 some Mandrakes, nor on my SuSE 8.2 with
 mktemp-1.5-531.
 
 Replace with simple use of $$ (pid).
 I've been using this same pattern for
 20 years on many production scripts;
 it's fast, solid and simple.

And racy. And not guaranteed to come up with fresh new files.

 More robust tmp file removal, using trap,
 so that scripts interrupted by signals
 HUP, INT, QUIT or PIPE will cleanup.

But I like this!

I'm deferring those changes to the introduction of a git shell library,
which several people volunteered to do so far, but noone sent me any
patches for (the last one was probably Martin Mares, only few hours ago
though).

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Junio C Hamano
 PJ == Paul Jackson [EMAIL PROTECTED] writes:

PJ That matches my experience - store 1 bit of mode state - executable or not.

Sounds like svn ;-).

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Add clone support to lntree

2005-04-16 Thread Petr Baudis
Dear diary, on Sun, Apr 17, 2005 at 01:07:35AM CEST, I got a letter
where Daniel Barkalow [EMAIL PROTECTED] told me that...
  Actually, what about if git pull outside of repository did what git
  clone does now? I'd kinda like clone instead of fork too.
 
 This seems like the best solution to me, too. Although that would make
 pull take a URL when making a new repository and not otherwise, which
 might be confusing. init-remote perhaps, or maybe just have init do it
 if given a URL?

Yes, init taking URL optionally sounds ideal. Thanks.

-- 
Petr Pasky Baudis
Stuff: http://pasky.or.cz/
C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread Adam Kropelin
Daniel Barkalow wrote:
On Sat, 16 Apr 2005, Adam Kropelin wrote:
How about building a file list and doing a batch download via 'wget
-i /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that
it reuses connectionss when successive URLs point to the same server.
You need to look at some of the files before you know what other
files to get. You could do it in waves, but that would be excessively
complicated to code and not the most efficient anyway.
Ah, yes. Makes sense. How about libcurl or another http client library, 
then? Minimizing dependencies on external libraries is good, but writing a 
really robust http client is a tricky business. (Not that you aren't up to 
it; I just wonder if it's the best way to spend your time.)

--Adam
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Paul Jackson
 And racy. And not guaranteed to come up with fresh new files.

In theory perhaps.  In practice no.

Even mktemp(1) can collide, in theory, since there is no practical way
in shell scripts to hold open and locked the file from the instant of it
is determined to be a unique name.

The window of vulnerability for shell script tmp files is the lifetime
of the script - while the file sits there unlocked.  Anyone else with
permissions can mess with it.

More people will fail, and are already failing, using mktemp than I have
ever seen using $$ (I've never seen a documented case, and since such
files are not writable to other user accounts, such a collision would
typically not go hidden.)

Fast, simple portable solutions that work win over solutions with some
theoretical advantage that don't matter in practice, but also that are
less portable or less efficient.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Paul Jackson
Junio wrote:
 Sounds like svn 

I have no idea what svn is.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Re: Re: Add clone support to lntree

2005-04-16 Thread Daniel Barkalow
On Sun, 17 Apr 2005, Petr Baudis wrote:

 Dear diary, on Sat, Apr 16, 2005 at 05:17:00AM CEST, I got a letter
 where Daniel Barkalow [EMAIL PROTECTED] told me that...
  On Sat, 16 Apr 2005, Petr Baudis wrote:
  
   Dear diary, on Sat, Apr 16, 2005 at 04:47:55AM CEST, I got a letter
   where Petr Baudis [EMAIL PROTECTED] told me that...
git branch --- creates a branch from a given commit
(when passed empty commit, creates a branch
from the current commit and sets the working
tree to that branch)
Note that there is a bug in current git update - it will allow you to
bring several of your trees to follow the same branch, or even a remote
branch. This is not even supposed to work, and will be fixed when I get
some sleep. You will be able to do git pull even on local branches, and
the proper solution for this will be just tracking the branch you want
to follow.
   
   I must admit that I'm not entirely decided yet, so I'd love to hear your
   opinion.
   
   I'm wondering, whether each tree should be fixed to a certain branch.
   That is, you decide a name when you do git fork, and then the tree
   always follows that branch. (It always has to follow [be bound to]
   *some* branch, and each branch can be followed by only a single tree at
   a time.)
  
  I don't think I'm following the use of branches. Currently, what I do is
  have a git-pasky and a git-linus, and fork off a working directory from
  one of these for each thing I want to work on. I do some work, commit as I
  make progress, and then do a diff against the remote head to get a patch
  to send off. If I want to do a series of patches which depend on each
  other, I fork my next directory off of my previous one rather than off of
  a remote base. I haven't done much rebasing, so I haven't worked out how I
  would do that most effectively.
 
 Yes. And that's exactly what the branches allow you to do. You just do
 
   git fork myhttpclient ~/myhttpclientdir
 
 then you do some hacking, and when you have something usable, you can
 go back to your main working directory and do
 
   git merge -b when_you_started myhttpclient
 
 Since you consider the code perfect, you can now just rm -rf
 ~/myhttpclient.
 
 Suddenly, you get a mail from mj pointing out some bugs, and it looks
 like there are more to come. What to do?
 
   git fork myhttpclient ~/myhttpclientdir
 
 (Ok, this does not work, but that's a bug, will fix tomorrow.) This will
 let you take off when you left in your work on the branch.

Ah, I think that's what made me think I wasn't understanding branches; the
first thing I tried hit this big.

 git update for seeking between commits is probably extremely important
 for any kind of binary search when you are wondering when did this bug
 appeared first, or when you are exploring how certain branch evolved
 over time. Doing git fork for each successive iteration sounds horrible.

Even if there isn't a performance hit, it's semantically wrong, because
you're looking at different versions that were in the same place at
different times.

 Now, what about git branch and git update for switching between
 branches? I think this is the most controversial part; these are
 basically just shortcuts for not having to do git fork, and I wouldn't
 mind so much removing them, if you people really consider them too ugly
 a wart for the soft clean git skin. I admit that they both come from a
 hidden prejudice that git fork is going to be slow and eat a lot of
 disk.

I think that this just confuses matters.

 The idea for git update for switching between branches is that
 especially when you have two rather similar branches and mostly do stuff
 on one of them, but sometimes you want to do something on the other one,
 you can do just quick git update, do stuff, and git update back, without
 any forking.

I still think that fork should be quick enough, or you could leave the
extra tree around. I'm not against having such a command, but I think it
should be a separate command rather than a different use of update, since
it would be used by poeople working in different ways.

  I think I can make this space efficient by hardlinking unmodified blobs to
  a directory of cached expanded blobs.
 
 I don't know but I really feel *very* unsafe when doing that. What if
 something screws up and corrupts my base... way too easy. And it gets
 pretty inconvenient and even more dangerous when you get the idea to do
 some modifications on your tree by something else than your favorite
 editor (which you've already checked does the right thing).

It should only be an option, not required and maybe not even
default. I think it should be possible to prevent stuff from screwing up,
since we really don't want anything to ever modify those inodes (as
opposed to some cases, where you want to modify inodes only in certain
ways). For that matter, relatively 

Re: optimize gitdiff-do script

2005-04-16 Thread Paul Jackson
Petr wrote:
 Please don't reindent the scripts. It violates the current coding style
 and the patch is unreviewable.

Sorry - I had not realized that there was a style in this case.

I am all in favor of such coding styles, and will gladly fit this one.

Do you want the patch resent, or a patch to restore indent on top of
this one?

 the patch is unreviewable.

The section that I indented the wrong way was such a total rewrite, that
you aren't going to be able to review it line by line compared to the
old anyway.  So in this case, it wasn't that I was modifying and
reindenting, rather that I was rewriting a page of code from scratch.

But that's a nit.  Honoring the coding style is necessary in any case.

 The idea behind that was that diffing could take a significant portion
 of disk space,

Here I don't understand, or don't agree, not sure which.

This won't eat more disk space, because the same tmp files are reused,
over and over.  Instead of unlinking them just before reopening them
truncating (O_WRONLY|O_CREAT|O_TRUNC), I just reopen them truncating.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Daniel Barkalow
This enables the use of HTTP to download commits and associated objects
from remote repositories. It now uses libcurl instead of local hack code.

Still causes warnings for fsck-cache and rev-tree, due to unshared code.

Still leaks a bit of memory due to bug copied from read-tree.

Needs libcurl post 7.7 or so.

Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED]

Index: Makefile
===
--- ed4f6e454b40650b904ab72048b2f93a068dccc3/Makefile  (mode:100644 
sha1:b39b4ea37586693dd707d1d0750a9b580350ec50)
+++ d332a8ddffb50c1247491181af458970bf639942/Makefile  (mode:100644 
sha1:ca5dfd41b750cb1339128e4431afbbbc21bf57bb)
@@ -14,7 +14,7 @@
 
 PROG=   update-cache show-diff init-db write-tree read-tree commit-tree \
cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \
-   check-files ls-tree merge-tree
+   check-files ls-tree merge-tree http-get
 
 all: $(PROG)
 
@@ -23,6 +23,11 @@
 
 LIBS= -lssl -lz
 
+http-get: LIBS += -lcurl
+
+http-get:%:%.o read-cache.o
+   $(CC) $(CFLAGS) -o $@ $^ $(LIBS)
+
 init-db: init-db.o
 
 update-cache: update-cache.o read-cache.o
Index: http-get.c
===
--- /dev/null  (tree:ed4f6e454b40650b904ab72048b2f93a068dccc3)
+++ d332a8ddffb50c1247491181af458970bf639942/http-get.c  (mode:100644 
sha1:106ca31239e6afe6784e7c592234406f5c149e44)
@@ -0,0 +1,126 @@
+#include fcntl.h
+#include unistd.h
+#include string.h
+#include stdlib.h
+#include cache.h
+#include revision.h
+#include errno.h
+#include stdio.h
+
+#include curl/curl.h
+#include curl/easy.h
+
+static CURL *curl;
+
+static char *base;
+
+static int fetch(unsigned char *sha1)
+{
+   char *hex = sha1_to_hex(sha1);
+   char *filename = sha1_file_name(sha1);
+
+   char *url;
+   char *posn;
+   FILE *local;
+   struct stat st;
+
+   if (!stat(filename, st)) {
+   return 0;
+   }
+
+   local = fopen(filename, w);
+
+   if (!local) {
+   fprintf(stderr, Couldn't open %s\n, filename);
+   return -1;
+   }
+
+   curl_easy_setopt(curl, CURLOPT_FILE, local);
+
+   url = malloc(strlen(base) + 50);
+   strcpy(url, base);
+   posn = url + strlen(base);
+   strcpy(posn, objects/);
+   posn += 8;
+   memcpy(posn, hex, 2);
+   posn += 2;
+   *(posn++) = '/';
+   strcpy(posn, hex + 2);
+
+   curl_easy_setopt(curl, CURLOPT_URL, url);
+
+   curl_easy_perform(curl);
+
+   fclose(local);
+   
+   return 0;
+}
+
+static int process_tree(unsigned char *sha1)
+{
+   void *buffer;
+unsigned long size;
+char type[20];
+
+buffer = read_sha1_file(sha1, type, size);
+   if (!buffer)
+   return -1;
+   if (strcmp(type, tree))
+   return -1;
+   while (size) {
+   int len = strlen(buffer) + 1;
+   unsigned char *sha1 = buffer + len;
+   unsigned int mode;
+   int retval;
+
+   if (size  len + 20 || sscanf(buffer, %o, mode) != 1)
+   return -1;
+
+   buffer = sha1 + 20;
+   size -= len + 20;
+
+   retval = fetch(sha1);
+   if (retval)
+   return -1;
+
+   if (S_ISDIR(mode)) {
+   retval = process_tree(sha1);
+   if (retval)
+   return -1;
+   }
+   }
+   return 0;
+}
+
+static int process_commit(unsigned char *sha1)
+{
+   struct revision *rev = lookup_rev(sha1);
+   if (parse_commit_object(rev))
+   return -1;
+   
+   fetch(rev-tree);
+   process_tree(rev-tree);
+   return 0;
+}
+
+int main(int argc, char **argv)
+{
+   char *commit_id = argv[1];
+   char *url = argv[2];
+
+   unsigned char sha1[20];
+
+   get_sha1_hex(commit_id, sha1);
+
+   curl_global_init(CURL_GLOBAL_ALL);
+
+   curl = curl_easy_init();
+
+   base = url;
+
+   fetch(sha1);
+   process_commit(sha1);
+
+   curl_global_cleanup();
+   return 0;
+}
Index: revision.h
===
--- ed4f6e454b40650b904ab72048b2f93a068dccc3/revision.h  (mode:100664 
sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83)
+++ d332a8ddffb50c1247491181af458970bf639942/revision.h  (mode:100664 
sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab)
@@ -24,6 +24,7 @@
unsigned int flags;
unsigned char sha1[20];
unsigned long date;
+   unsigned char tree[20];
struct parent *parent;
 };
 
@@ -111,4 +112,29 @@
}
 }
 
+static int parse_commit_object(struct revision *rev)
+{
+   if (!(rev-flags  SEEN)) {
+   void *buffer, *bufptr;
+   unsigned long size;
+   char type[20];
+   unsigned char parent[20];
+
+   

Re: [PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Paul Jackson
 Needs libcurl post 7.7 or so.

That could be mentioned in the README, which has a list of 'Software
requirements.'  Actually, zlib-devel and openssl should be on this list
as well.  My laziness got in the way of my sending in a patch for that.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Dave Jones
On Sat, Apr 16, 2005 at 05:02:21PM -0700, Paul Jackson wrote:
   And racy. And not guaranteed to come up with fresh new files.
  
  In theory perhaps.  In practice no.
  
  Even mktemp(1) can collide, in theory, since there is no practical way
  in shell scripts to hold open and locked the file from the instant of it
  is determined to be a unique name.

Using the pid as a 'random' number is a bad idea. all an attacker
has to do is create 65535 symlinks in /usr/tmp, and he can now
overwrite any file you own.

mktemp is being used here to provide randomness in the filename,
not just a uniqueness.

  The window of vulnerability for shell script tmp files is the lifetime
  of the script - while the file sits there unlocked.  Anyone else with
  permissions can mess with it.

Attacker doesnt need to touch the script. Just take advantage of
flaws in it, and wait for someone to run it.

  More people will fail, and are already failing, using mktemp than I have
  ever seen using $$ (I've never seen a documented case, and since such
  files are not writable to other user accounts, such a collision would
  typically not go hidden.)
  
  Fast, simple portable solutions that work win over solutions with some
  theoretical advantage that don't matter in practice, but also that are
  less portable or less efficient.

I'd suggest fixing your distributions mktemp over going with an
inferior solution.

Dave

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Erik van Konijnenburg
On Sat, Apr 16, 2005 at 08:33:25PM -0400, Dave Jones wrote:
 On Sat, Apr 16, 2005 at 05:02:21PM -0700, Paul Jackson wrote:
And racy. And not guaranteed to come up with fresh new files.
   
   In theory perhaps.  In practice no.
   
   Even mktemp(1) can collide, in theory, since there is no practical way
   in shell scripts to hold open and locked the file from the instant of it
   is determined to be a unique name.
 
 Using the pid as a 'random' number is a bad idea. all an attacker
 has to do is create 65535 symlinks in /usr/tmp, and he can now
 overwrite any file you own.
 
 mktemp is being used here to provide randomness in the filename,
 not just a uniqueness.

How about putting using .git/tmp.$$ or similar as tempfile?

This should satisfy both the portability and security requirements,
since the warnings against using $$ only apply to public directories.

Regards,
Erik
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] show-diff shell safety

2005-04-16 Thread Paul Jackson
Junio wrote:
 The command line for running diff command is built without
 taking shell metacharacters into account. 

Ack - you're right.  One should avoid popen and system
in all but personal hacking code.  There are many ways,
beyond just embedded shell redirection, to cause problems
with these calls.

One should directly code execve(), execv(), or execl().

Search for popen system IFS PATH

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Morten Welinder
 Does it really make sense to store full permissions in the trees? I think
 that remembering the x-bit should be good enough for almost all purposes
 and the other permissions should be left to the local environment.

It makes some sense in principle, but without storing what they mean
(i.e., group==?) it certainly makes no sense.  It's a bit like unpacking a
tar file.

I suspect a non-readable file would cause a bit of a problem in the low-level
commands.

Morten
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Rename confusing variable in show-diff

2005-04-16 Thread Junio C Hamano
The show-diff command uses a variable new but it is always
used to point at the original data recorded in the dircache
before the user started editing in the working file.  Rename it
to old to avoid confusion.

To be applied on top of my previous patches:

[PATCH] Optionally tell show-diff to show only named files.
[PATCH] show-diff -z option for machine readable output.
[PATCH] show-diff shell safety.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 show-diff.c |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

show-diff.c: e52eee21c2f682bef2dba06445699cca8e34c63a
--- show-diff.c
+++ show-diff.c 2005-04-16 18:05:55.0 -0700
@@ -162,7 +162,7 @@
int changed;
unsigned long size;
char type[20];
-   void *new;
+   void *old;
 
if (1 argc 
! matches_pathspec(ce, argv+1, argc-1))
@@ -193,8 +193,8 @@
if (silent)
continue;
 
-   new = read_sha1_file(ce-sha1, type, size);
-   show_differences(ce-name, new, size);
+   old = read_sha1_file(ce-sha1, type, size);
+   show_differences(ce-name, old, size);
free(new);
}
return 0;

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Paul Jackson
Dave wrote:
 http://www.linuxsecurity.com/content/view/115462/151/

Nice - thanks.

Pasky - would you be interested in a patch that used a more robust tmp
file creation, along the lines of replacing

t=${TMPDIR:-/usr/tmp}/gitdiff.$$
trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15

with:

tmp=${TMPDIR-/tmp}
tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$
(umask 077  mkdir $tmp) || {
echo Could not create temporary directory! Exiting. 12 
exit 1
}
t=$tmp/tmp
trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15

If interested, would you want it instead of my previous mktemp removal
patch, or on top of it?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Paul Jackson
Erik wrote:
 How about putting using .git/tmp.$$ or similar as tempfile?

One could, but best to normally honor the users TMPDIR setting.

Could one 'git diff' a readonly git repository?

Perhaps someone has a reason for putting their tmp files where
they choose - say a local file system in a heavy NFS environment.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] (take 2) Rename confusing variable in show-diff

2005-04-16 Thread Junio C Hamano
Oops, sorry I screwed up and sent a wrong patch.  Please discard
the previous one.

The show-diff command uses a variable new but it is always
used to point at the original data recorded in the dircache
before the user started editing in the working file.  Rename it
to old to avoid confusion.

To be applied on top of my previous patches:

[PATCH] Optionally tell show-diff to show only named files.
[PATCH] show-diff -z option for machine readable output.
[PATCH] show-diff shell safety.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 show-diff.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

show-diff.c: e52eee21c2f682bef2dba06445699cca8e34c63a
--- show-diff.c
+++ show-diff.c 2005-04-16 18:23:57.0 -0700
@@ -162,7 +162,7 @@
int changed;
unsigned long size;
char type[20];
-   void *new;
+   void *old;
 
if (1 argc 
! matches_pathspec(ce, argv+1, argc-1))
@@ -193,9 +193,9 @@
if (silent)
continue;
 
-   new = read_sha1_file(ce-sha1, type, size);
-   show_differences(ce-name, new, size);
-   free(new);
+   old = read_sha1_file(ce-sha1, type, size);
+   show_differences(ce-name, old, size);
+   free(old);
}
return 0;
 }


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Paul Jackson
Morten wrote:
 It makes some sense in principle, but without storing what they mean
 (i.e., group==?) it certainly makes no sense. 

There's no they there.

I think Martin's proposal, to which I agreed, was to store a _single_
bit.  If any of the execute permissions of the incoming file are set,
then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
bit is ON, then the file permission is set mode 0777 (modulo umask),
else it is set mode 0666 (modulo umask).

You might disagree that this is a good idea, but it certainly does
'make sense' (as in 'is sensibly well defined').

 I suspect a non-readable file would cause a bit of a problem in the low-level
 commands.

Probably so.  If someone sets their umask 0333 or less, then they are
either fools or QA (software quality assurance, or test) engineers.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] show-diff style fix.

2005-04-16 Thread Junio C Hamano
This fixes some stylistic problems introduced by my previous set
of patches.  I'll be sending my last patch to show-diff next,
which depends on this cleanup.

To be applied on top of my previous patches:

[PATCH] Optionally tell show-diff to show only named files.
[PATCH] show-diff -z option for machine readable output.
[PATCH] show-diff shell safety.
[PATCH] (take 2) Rename confusing variable in show-diff.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 show-diff.c |7 +++
 1 files changed, 3 insertions(+), 4 deletions(-)

--- ./show-diff.c   2005-04-16 18:59:09.0 -0700
+++ ./show-diff.c   2005-04-16 19:01:28.0 -0700
@@ -111,7 +111,7 @@
}
 }
 
-static const char *show_diff_usage = show-diff [-s] [-q] [-z] [paths...];
+static const char *show_diff_usage = show-diff [-q] [-s] [-z] [paths...];
 
 static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt)
 {
@@ -141,9 +141,8 @@
silent_on_nonexisting_files = silent = 1;
else if (!strcmp(argv[1], -q))
silent_on_nonexisting_files = 1;
-   else if (!strcmp(argv[1], -z)) {
+   else if (!strcmp(argv[1], -z))
machine_readable = 1;
-   }
else
usage(show_diff_usage);
argv++; argc--;
@@ -164,7 +163,7 @@
char type[20];
void *old;
 
-   if (1 argc 
+   if (1  argc 
! matches_pathspec(ce, argv+1, argc-1))
continue;
 

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Brian O'Mahoney
No, you have to:
(a) create a unique, pid specific file name /var/tmp/myapp.$$.xyzzy
(b) create it in O_EXCL mode, so you wont smash another's held lock

(b-1) It worked, OK

(b-2) open failed, try ...xyzzz

repeat until (b-1)

There are thousands of examples of how to do this with bash.

Paul Jackson wrote:
 Dave wrote:
 
mktemp is being used here to provide randomness in the filename,
not just a uniqueness.
 
 
 Ok - useful point.
 
 How about:
 
   t=${TMPDIR:-/usr/tmp}/gitdiff.$$.$RANDOM
 
 
all an attacker has to do is create 65535 symlinks in /usr/tmp

the point of the xyzzy seed is to make creating all possible files
in-feasable.

 
 
 And how about if I removed the tmp files at the top:
 
   t=${TMPDIR:-/usr/tmp}/gitdiff.$$.$RANDOM
   trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15
   rm -fr $t.?
 
   ... rest of script ...
 
 How close does that come to providing the same level of safety, while
 remaining portable over a wider range of systems, and not requiring that
 a separate command be forked?
 
 
I'd suggest fixing your distributions ...
 
 
 It's not just my distro; it's the distros of all git users.
 
 If apps can avoid depending on inessential details of their
 environment, that's friendlier to all concerned.
 
 And actually my distro is fine - it's just that I am running an old
 version of it on one of my systems.  Newer versions of the mktemp -t
 option.
 

-- 
mit freundlichen Grüßen, Brian.

Dr. Brian O'Mahoney
Mobile +41 (0)79 334 8035 Email: [EMAIL PROTECTED]
Bleicherstrasse 25, CH-8953 Dietikon, Switzerland
PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D  F5 14 39 C9 6D 38 56 D5
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: fix mktemp (remove mktemp ;)

2005-04-16 Thread Paul Jackson
 No, you have to:

How does this compare with the one I posted about 1 hour 30
minuts ago:

tmp=${TMPDIR-/tmp}
tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$
(umask 077  mkdir $tmp) || {
echo Could not create temporary directory! Exiting. 12 
exit 1
}
t=$tmp/tmp
trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15

derived from the reference that Dave Jones provided?

 create it in O_EXCL mode,

What can one do that and hold that O_EXCL from within bash?

 There are thousands of examples of how to do this with bash.

Care to provide one?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] libgit

2005-04-16 Thread Mike Taht
commit b0550573055abcf8ad19dcb8a036c32dd00a3be4
tree b77882b170769c07732381b9f19ff2dd5c9f1520
parent 866b4aea9313513612f2b0d66814a2f526d17f21
author Mike Taht [EMAIL PROTECTED] 1113704772 -0700
committer Mike Taht [EMAIL PROTECTED] 1113704772 -0700
looks my 1878 line patch to convert git to libgit got eaten by vger..
I put it up at http://pbx.picketwyre.com/~mtaht/libgit.patch if anyone 
wants to comment. from my log:

Converted git to libgit. Moved all the main() calls into a single 
multi-call binary - git-main.
Made extern a bunch of functions that were static. Verified it at least 
still minimally worked.
Note: this is only a first step towards creating a generic library. 
Figuring out what functions and variables *truly* need to be exported, 
renaming them to a git_function api, making it thread safe
... and not least of all, keeping up with everybody working out of the 
base tree... are problems that remain. Also - cleaning up the UI.


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Get commits from remote repositories by HTTP

2005-04-16 Thread tony . luck
How about building a file list and doing a batch download via 'wget -i 
/tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses 
connectionss when successive URLs point to the same server.

Here's a script that does just that.  So there is a burst of individual
wget commands to get HEAD, the top commit object, and all the tree
objects.  The just one to get all the missing blobs.

Subsequent runs will do far less work as many of the tree objects will
not have changed, so we don't descend into any tree that we already have.

-Tony

Not a patch ... it is a whole file.  I called it git-wget, but it might
also want to be called git-pulltop.

Signed-off-by: Tony Luck [EMAIL PROTECTED]

-- script starts here -
#!/bin/sh

# Copyright (C) 2005 Tony Luck

REMOTE=http://www.kernel.org/pub/linux/kernel/people/torvalds/linux-2.6.git/

rm -rf .gittmp
# set up a temp git repository so that we can use cat-file and ls-tree on the
# objects we pull without installing them into our tree. This allows us to
# restart if the download is interrupted
mkdir .gittmp
cd .gittmp
init-db

wget -q $REMOTE/HEAD

if cmp -s ../.git/HEAD HEAD
then
echo Already have HEAD = `cat ../.git/HEAD`
cd ..
rm -rf .gittmp
exit 0
fi

sha1=`cat HEAD`
sha1file=${sha1:0:2}/${sha1:2}

if [ -f ../.git/objects/$sha1file ]
then
echo Already have most recent commit. Update HEAD to $sha1
cd ..
rm -rf .gittmp
exit 0
fi

wget -q $REMOTE/objects/$sha1file -O .git/objects/$sha1file

treesha1=`cat-file commit $sha1 | (read tag tree ; echo $tree)`

get_tree()
{
treesha1file=${1:0:2}/${1:2}
if [ -f ../.git/objects/$treesha1file ]
then
return
fi
wget -q $REMOTE/objects/$treesha1file -O .git/objects/$treesha1file
ls-tree $1 | while read mode tag sha1 name
do
subsha1file=${sha1:0:2}/${sha1:2}
if [  -f ../.git/objects/$subsha1file ]
then
continue
fi
if [ $mode = 4 ]
then
get_tree $sha1 `expr $2 + 1`
else
echo objects/$subsha1file  needbloblist
fi
done
}

# get all the tree objects to our .gittmp area, and create list of needed blobs
get_tree $treesha1

# now get the blobs
cd ../.git
if [ -s ../.gittmp/needbloblist ]
then
wget -q -r -nH  --cut-dirs=6 --base=$REMOTE -i ../.gittmp/needbloblist
fi

# Now we have the blobs, move the trees and commit from .gitttmp
cd ../.gittmp/.git/objects
find ?? -type f -print | while read f
do
mv $f ../../../.git/objects/$f
done

# update HEAD
cd ../..
mv HEAD ../.git

cd ..
rm -rf .gittmp
-- script ends here -
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Tkil
 Brian == Brian O'Mahoney [EMAIL PROTECTED] writes:

Brian (1) I _have_ seen real-life collisions with MD5, in the context
Brian of Document management systems containing ~10^6 ms-WORD
Brian documents.

Was this whole-document based, or was it blocked or otherwise chunked?

I'm wondering, because (SFAIK) the MS word on-disk format is some
serialized version of one or more containers, possibly nested.  If
you're blocks are sized so that the first block is the same across
multiple files, this could cause collisions -- but they're the good
kind, that allow us to save disk space, so they're not a problem.

Are you saying that, within 1e7 documents, that you found two
documents with the same MD5 hash yet different contents?

That's not an accusation, btw; I'm just trying to get clarity on the
terminology.  I'm fascinated by the idea of using this sort of
content-addressable filesystem, but the chance of any collision at all
wigs me out.  I look at the probabilities, but still.

Thanks,
t.
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update-cache --refresh cache entry leak

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Junio C Hamano wrote:

 When update-cache --refresh replaces an existing cache entry
 with a new one, it forgets to free the original.

I've seen this patch now three times, and it's been wrong every single 
time. Maybe we should add a comment?

That active-cache entry you free()'d was not necessarily allocated with 
malloc(). Most cache-entries are just mmap'ed directly from the index 
file.

Leaking is ok. We cannot leak too much.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libgit

2005-04-16 Thread Randy.Dunlap
On Sat, 16 Apr 2005 20:12:56 -0700 Mike Taht wrote:

| commit b0550573055abcf8ad19dcb8a036c32dd00a3be4
| tree b77882b170769c07732381b9f19ff2dd5c9f1520
| parent 866b4aea9313513612f2b0d66814a2f526d17f21
| author Mike Taht [EMAIL PROTECTED] 1113704772 -0700
| committer Mike Taht [EMAIL PROTECTED] 1113704772 -0700
| 
| looks my 1878 line patch to convert git to libgit got eaten by vger..
| I put it up at http://pbx.picketwyre.com/~mtaht/libgit.patch if anyone 
| wants to comment. from my log:

Connection refused.

---
~Randy
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another base64 patch

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
Earlier, hpa wrote:
The base64 version has 2^12 subdirectories instead of 2^8 (I just used 2 
characters as the hash key just like the hex version.)
Later, hpa wrote:
Ultimately the question is: do we care about old (broken) filesystems?

I'd imagine we care a little - just not alot.
Some people (e.g., me) would really like for git
to be more forgiving of nasty filesystems,
so that git can be used very widely.
I.E., be forgiving about case insensitivity,
poor performance or problems with a large # of files
in a directory, etc.  You're already working to make
sure git handles filenames with spaces  i18n filenames,
a common failing of many other SCM systems.
If git is used for Linux kernel development  nothing else,
it's still a success.  But it'd be even better from
my point of view if git was a useful tool for MANY
other projects.  I think there are advantages, even if you
only plan to use git for the kernel, to making git easier
to use for other projects.  By making git less
sensitive to the filesystem, you'll attract more (non-kernel-dev)
users, some of whom will become new git developers who
add cool new functionality.
As noted in my SCM survey (http://www.dwheeler.com/essays/scm.html),
I think SCM Windows support is really important to a lot of
OSS projects.  Many OSS projects, even if they start
Unix/Linux only, spin off a Windows port, and it's
painful if their SCM can't run on Windows then.
Problems running on NFS filesystems have caused problems
with GNU Arch users (there are workarounds, but now you
need to learn about workarounds instead of things
just working).  If nothing else, look at the history
of other SCM projects: all too many have undergone radical and
painful surgeries so that they can be more portable to
various filesystems.
It's a trade-off, I know.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another base64 patch

2005-04-16 Thread Paul Jackson
David wrote:
 It's a trade-off, I know.

So where do you recommend we make that trade-off?

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] libgit

2005-04-16 Thread Mike Taht
Fixed.
Randy.Dunlap wrote:
On Sat, 16 Apr 2005 20:12:56 -0700 Mike Taht wrote:
| commit b0550573055abcf8ad19dcb8a036c32dd00a3be4
| tree b77882b170769c07732381b9f19ff2dd5c9f1520
| parent 866b4aea9313513612f2b0d66814a2f526d17f21
| author Mike Taht [EMAIL PROTECTED] 1113704772 -0700
| committer Mike Taht [EMAIL PROTECTED] 1113704772 -0700
| 
| looks my 1878 line patch to convert git to libgit got eaten by vger..
| I put it up at http://pbx.picketwyre.com/~mtaht/libgit.patch if anyone 
| wants to comment. from my log:

Connection refused.
---
~Randy

--
Mike Taht
  FLASH!  Intelligence of mankind decreasing.  Details at ... uh, when
the little hand is on the 
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
 but the chance of any collision at all wigs me out.

Guess you're just going to get wigged out then.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Yet another base64 patch

2005-04-16 Thread David Lang
On Thu, 14 Apr 2005, H. Peter Anvin wrote:
Linus Torvalds wrote:
Even something as simple as ls -l has been known to have O(n**2) 
behaviour for big directories.

For filesystems with linear directories, sure.  For sane filesystems, it 
should have O(n log n).
note that default configs of ext2 and ext3 don't qualify as sane 
filesystems by this definition.

ext3 does have an extention that you can enable to have it hash the 
directory access, but even if you enable that on a filesystem you aren't 
guaranteed that it will be active (if the directory existed before it was 
turned on, or has been accessed by a kernel that didn't understand the 
extention then the htree functionality won't be used until you manually 
tell the system to generate the tree)

David Lang
--
There are two ways of constructing a software design. One way is to make it so 
simple that there are obviously no deficiencies. And the other way is to make 
it so complicated that there are no obvious deficiencies.
 -- C.A.R. Hoare
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Paul Jackson wrote:

 Daniel wrote:
  I'm working off of Linus's tree when not working on scripts, and it
  doesn't have that section at all.
 
 Ah so - nevermind my README comments then.

Well, actually, I suspect that something like this should go to Pasky. I
really see my repo as purely a internal git datastructures, and when it
gets to how do we interact with other peoples web-sites, I suspect 
Pasky's tree is better.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
what I'm talking about is the chance that somewhere, sometime there will 
be two different documents that end up with the same hash
I have vastly greater chance of a file colliding due to hardware or
software glitch than a random message digest collision of two legitimate
documents.
The probability of an accidental overlap for SHA-1 for two
different files is absurdly remote; it's just not worth worrying about.
However, the possibility of an INTENTIONAL overlap is a completely
different matter.  I think the hash algorithm should change in the
future; I have a proposal below.
Someone has ALREADY broken into a server to modify the Linux kernel
code already, so the idea of an attack on kernel code
is not an idle fantasy. MD5 is dead, and SHA-1's work factor has
already been sufficiently broken that people have already been told
walk to the exits (i.e., DO NOT USE SHA-1 for new programs like git).
The fact that blobs are compressed first, with a length header
in front, _may_ make it harder to attack.  But maybe not.
I haven't checked for this case, but most decompression algorithms
I know of have a don't change mode that essentially just copies the
data behind it.  If the one used in git has such a mode
(I bet it does!), an attacker could use that mode to
make it MUCH easier to create an attack vector than it would
appear at first.  Now the attacker just needs to create a collision
(hmmm, where was that paper?).  Remember, you don't need to
run a hash algorithm over an entire file; you can precompute
to near the end, and then try your iterations from there.
A little hardware (inc. FPGAs) would speed the attack.
Of course, that assumes you actually
check everything to make sure that an attacker can't slip
in something different. After each rsync, are all new files'
hash values checked?  Do they uncompress to right length?
Do they have excess data after the decompression?
I'm hoping that sort of input-checking (since the data
might be from an attacker, if indirectly!) is already going on,
though I haven't reviewed the git source code.
While the jury's still out, the current belief by most folks
I talk to is that SHA-1 variants with more bits, such as SHA-256,
are the way to go now.  The SHA-1 attack simply reduces
the work factor (it's not a COMPLETE break), so adding
more bits is believed to increase the work factor
enough to counter it.
Adding more information to the hash can make attacking even harder.
Here's one idea: whenever that hash algorithm
switch occurs, create a new hash value as this:
  SHA-256 + uncompressed-length
Where SHA-256 is computed just like SHA-1 is now, e.g.,
SHA-256(file) where file = typecode + length + compressed data.
Leave the internal format as-is (with the length embedded as well).
This means that an attacker has to come up with an attack
that creates the same length uncompressed, yet has the same hash
of the compressed result. That's harder to do.
Length is also really, really cheap to compute :-).
That also might help the convince the what happens if there's
an accidental collision crowd: now, if the file lengths
are different, you're GUARANTEED that the hash values are different,
though that's not the best reason to do that.
One reason to think about switching sooner rather than later
is that it'd be really nice if the object store also included
signatures, so that in one fell swoop you could check who signed what
(and thus you could later on CONFIRM with much more certainty who
REALLY submitted a given change... say if it was clearly malicious).
If you switch hash algorithms, the signatures might not work,
depending on how you do it.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] update-cache --refresh cache entry leak

2005-04-16 Thread Junio C Hamano
 LT == Linus Torvalds [EMAIL PROTECTED] writes:

LT I've seen this patch now three times, and it's been wrong every single 
LT time. Maybe we should add a comment?

I found out the previous two just after I sent it out.  Sorry
about that.


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Paul Jackson wrote:

 Morten wrote:
  It makes some sense in principle, but without storing what they mean
  (i.e., group==?) it certainly makes no sense. 
 
 There's no they there.
 
 I think Martin's proposal, to which I agreed, was to store a _single_
 bit.  If any of the execute permissions of the incoming file are set,
 then the bit is stored ON, else it is stored OFF.  On 'checkout', if the
 bit is ON, then the file permission is set mode 0777 (modulo umask),
 else it is set mode 0666 (modulo umask).

I think I agree.

Anybody willing to send me a patch? One issue is that if done the obvious
way it's an incompatible change, and old tree objects won't be valid any
more. It might be ok to just change the compare cache check to only care
about a few bits, though: S_IXUSR and S_IFDIR. And then always write new 
tree objects out with mode set to one of
 - 04: we already do this for directories
 - 100644: normal files without S_IXUSR set
 - 100755: normal files _with_ S_IXUSR set

Then, at compare time, we only look at S_IXUSR matching for files (we
never compare directory modes anyway). And at file create time, we create
them with 0666 and 0777 respectively, and let the users umask sort it out
(and if the user has 0100 set in his umask, he can damn well blame
himself).

This would pretty much match the existing kernel tree, for example. We'd 
end up with some new trees there (and in git), but not a lot of 
incompatibility. And old trees would still work fine, they'd just get 
written out differently.

Anybody want to send a patch to do this?

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] Use libcurl to use HTTP to get repositories

2005-04-16 Thread Ingo Molnar

* Daniel Barkalow [EMAIL PROTECTED] wrote:

 Still leaks a bit of memory due to bug copied from read-tree.

Linus, should i resend the 18 fixes i sent the other day? (as a GIT 
repository perhaps?) I found roughly 6 common memory leaks, 8 
theoretical memory leaks, 2 overflows and did a couple of cleanups. One 
of the patches [the cache collision related thing] we agreed was not 
needed, the rest is still very much valid i think. I did some basic 
testing with the fixes applied, nothing seemed to break in any visible 
way in these tests.

Ingo
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread David A. Wheeler
Paul Jackson wrote:
Junio wrote:
Sounds like svn 

I have no idea what svn is.
svn = common abbreviation for Subversion, a
widely-used centralized SCM tool intentionally
similar to CVS.
--- David A. Wheeler
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: SHA1 hash safety

2005-04-16 Thread Paul Jackson
I have nothing further to contribute to this subtopic.
Good luck with it.

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with higher-order stages in dircache

2005-04-16 Thread Junio C Hamano
Linus,

earlier I wrote [*R1*]:

   - An explicit update-cache [--add] [--remove] path should
 be taken as a signal from the user (or Cogito) to tell the
 dircache layer the merge is done and here is the result.
 So just delete higher-order stages for the path and record
 the specified path at stage 0 (or remove it altogether).

and I think this commit of yours implements the adding half.

commit be7b1f05cea8e5213ffef8f74ebdefed2aacb6fc:1
author Linus Torvalds [EMAIL PROTECTED] 1113678345 -0700
committer Linus Torvalds [EMAIL PROTECTED] 1113678345 -0700

When inserting a index entry of stage 0, remove all old unmerged entries.

I am wondering if you have a particular reason not to do the
same for the removing half.  Without it, currently I do not see
a way for the user or Cogito to tell dircache layer that the
merge should result in removal.  That is, other than first
adding a phony entry there (which brings the entry down to stage
0) and then immediately doing a regular update-cache --remove.
That is two instead of one reading of 1.6MB index file for the
kernel case.

Also do you have any comments on this one from the same message?

 * read-tree

   - When merging two trees, i.e. read-tree -m A B, shouldn't
 we collapse identical stage-1/2 into stage-0?


[References]

*R1* http://marc.theaimsgroup.com/?l=gitm=111366023126466w=2

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Issues with higher-order stages in dircache

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Junio C Hamano wrote:
 
 I am wondering if you have a particular reason not to do the
 same for the removing half.

No. Except for me being silly.

Please just make it so.

 Also do you have any comments on this one from the same message?
 
  * read-tree
 
- When merging two trees, i.e. read-tree -m A B, shouldn't
  we collapse identical stage-1/2 into stage-0?

How do you actually intend to merge two trees? 

That sounds like a total special case, and better done with diff-tree.  
But regardless, since I assume the result is the later tree, why do a 
read-tree -m A B, since what you really want is read-tree B?

The real merge always needs the base tree, and I'd hate to complicate the 
real merge with some special-case that isn't relevant for that real case.

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] checkout-cache -a should not extract unmerged stages

2005-04-16 Thread Junio C Hamano
When checkout-cache -a is run, currently it attempts to extract
each existing unmerged stage to the same destination and
complains to what it itself has done.  This is nonsensical.

Presumably, the user is running checkout-cache -a in order to
verify the result of the part that has cleanly been merged.  So
check out only stage 0 entries and give warnings for paths that
are unmerged.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 checkout-cache.c |   11 +++
 1 files changed, 11 insertions(+)

checkout-cache.c: 431b7032576f40d93a08be801e26f76c168ed57b
--- checkout-cache.c
+++ checkout-cache.c2005-04-16 22:23:15.0 -0700
@@ -121,10 +121,21 @@
 
 static int checkout_all(void)
 {
+   struct cache_entry *unmerge_skipping = NULL;
int i;
 
for (i = 0; i  active_nr ; i++) {
struct cache_entry *ce = active_cache[i];
+   if (ce_stage(ce)) {
+   if (!unmerge_skipping ||
+   strcmp(unmerge_skipping-name, ce-name))
+   fprintf(stderr,
+   checkout-cache: needs merge %s\n,
+   ce-name);
+   unmerge_skipping = ce;
+   continue;
+   }
+   unmerge_skipping = NULL;
if (checkout_entry(ce)  0)
return -1;
}

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Paul Jackson
Linus wrote:
 It might be ok to just change the compare cache check to only care
 about a few bits, though: S_IXUSR and S_IFDIR. And then ...

I think I agree.  But since I am reluctant to take enough time to
understand the code well enough to write this patch, I'll shut up now ;).

-- 
  I won't rest till it's the best ...
  Programmer, Linux Scalability
  Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 
1.925.600.0401
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Storing permissions

2005-04-16 Thread Linus Torvalds


On Sat, 16 Apr 2005, Linus Torvalds wrote:
 
 Anybody want to send a patch to do this?

Actually, I just did it. Seems to work for the only test-case I tried,
namely I just committed it, and checked that the permissions all ended up
being recorded as 0644 in the tree (if it has the -x bit set, they get
recorded as 0755).

When checking out, we always check out with 0666 or 0777, and just let 
umask do its thing. We only test bit 0100 when checking for differences.

Maybe I missed some case, but this does indeed seem saner than the try to 
restore all bits case. If somebody sees any problems, please holler.

(Btw, you may or may not need to blow away your index file by just 
re-creating it with a read-tree after you've updated to this. I _tried_ 
to make sure that the compare just ignored the ce_mode bits, but the fact 
is, your index file may be corrupt in the sense that it has permission 
sets that sparse expects to never generate in an index file any more..)

Linus
-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] show-diff.c: do not include unused header file

2005-04-16 Thread Junio C Hamano
This is my bad.  I added #include ctype.h to the file,
which I ended up not using and failed to remove it.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

show-diff.c: d85d79b97a59342390bd34da09049dd58d56900f
--- show-diff.c
+++ show-diff.c 2005-04-16 22:37:29.0 -0700
@@ -4,7 +4,6 @@
  * Copyright (C) Linus Torvalds, 2005
  */
 #include cache.h
-#include ctype.h
 
 static char *diff_cmd = diff -L '%s' -u -N  - '%s';


-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add lsremote command.

2005-04-16 Thread Steven Cole
This is a fairly trivial addition, but if users are adding remote repositories
with git addremote, then those users should be able to list out the remote
list without having to know the details of where the remotes file is kept.

Steven

Adds lsremote command to list remotes.

Signed-Off-By: Steven Cole [EMAIL PROTECTED]

-

diff -urN git-pasky-orig/git git-pasky/git
--- git-pasky-orig/git	2005-04-16 22:47:22.0 -0600
+++ git-pasky/git	2005-04-16 22:49:14.0 -0600
@@ -41,6 +41,7 @@
 	log
 	ls		[TREE_ID]
 	lsobj		[OBJTYPE]
+	lsremote
 	merge		-b BASE_ID FROM_ID
 	pull		[RNAME]
 	rm		FILE...
@@ -105,6 +106,7 @@
 log)gitlog.sh $@;;
 ls) gitls.sh $@;;
 lsobj)  gitlsobj.sh $@;;
+lsremote)   gitlsremote.sh $@;;
 merge)  gitmerge.sh $@;;
 pull)   gitpull.sh $@;;
 rm) gitrm.sh $@;;
diff -urN git-pasky-orig/gitlsremote.sh git-pasky/gitlsremote.sh
--- git-pasky-orig/gitlsremote.sh	1969-12-31 17:00:00.0 -0700
+++ git-pasky/gitlsremote.sh	2005-04-16 22:58:15.0 -0600
@@ -0,0 +1,7 @@
+#!/bin/sh
+#
+# ls remotes in GIT repository
+#
+[ -e .git/remotes ]  cat .git/remotes  exit 1
+
+echo 'List of remotes is empty. See git addremote.'


[PATCH] Fix off-by-one error in show-diff

2005-04-16 Thread Junio C Hamano
The patch to introduce shell safety to show-diff has an
off-by-one error.  Here is an fix.

Signed-off-by: Junio C Hamano [EMAIL PROTECTED]
---

 show-diff.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

show-diff.c: 8a24ff62b85a6e23469e3f0e7a20170dfe543ebf
--- show-diff.c
+++ show-diff.c 2005-04-16 22:53:11.0 -0700
@@ -27,8 +27,8 @@
int cnt, c;
char *cp;
 
-   /* count single quote characters */ 
-   for (cnt = 0, cp = src; *cp; cnt++, cp++)
+   /* count bytes needed to store the quoted string. */ 
+   for (cnt = 1, cp = src; *cp; cnt++, cp++)
if (*cp == '\'')
cnt += 3;
 

-
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to [EMAIL PROTECTED]
More majordomo info at  http://vger.kernel.org/majordomo-info.html