Re: [PATCH 3/2] merge-trees script for Linus git
LT == Linus Torvalds [EMAIL PROTECTED] writes: LT Damn, my cunning plan is some good stuff. I really like this a lot. It is *so* *simple*, clear, flexible and an example of elegance. This is one of the things I would happily say Sheesh! Why didn't *I* think of *THAT* first!!! to. LT NOTE NOTE NOTE! I could make read-tree do some of these nontrivial LT merges, but I ended up deciding that only the matches in all three LT states thing collapses by default. * Understood and agreed. LT Damn, I'm good. * Agreed ;-). Wholeheartedly. So what's next? Certainly I'd immediately drop (and I would imagine you would as well) both C or Perl version of merge-tree(s). The userland merge policies need ways to extract the stage information and manipulate them. Am I correct to say that you mean by ls-files -l the extracting part? LT I should make ls-files have a -l format, which shows the LT index and the mode for each file too. You probably meant ls-tree. You used the word mode but it already shows the mode so I take it to mean stage. Perhaps something like this? $ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b 100644 blobfe2a4177a760fd110e78788734f167bd633be8deCOPYING 100644 blobb39b4ea37586693dd707d1d0750a9b580350ec50:1 man/frotz.6 100644 blobb39b4ea37586693dd707d1d0750a9b580350ec50:2 man/frotz.6 100664 blobeeed997e557fb079f38961354473113ca0d0b115:3 man/frotz.6 ... The above example shows that COPYING has merged successfully, and O and A have the same contents and B has something different at man/frotz.6. Assuming that you would be working on that, I'd like to take the dircache manipulation part. Let's think about the minimally necessary set of operations: * The merge policy decides to take one of the existing stage. In this case we need a way to register a known mode/sha1 at a path. We already have this as update-cache --cacheinfo. We just need to make sure that when update-cache puts things at stage 0 it clears other stages as well. * The merge policy comes up with a desired blob somewhere on the filesystem (perhaps by running an external merge program). It wants to register it as the result of the merge. We could do this today by first storing the desired blob in a temporary file somewhere in the path the dircache controls, update-cache --add the temporary file, ls-tree to find its mode/sha1, update-cache --remove the temporary file and finally update-cache --cacheinfo the mode/sha1. This is workable but clumsy. How about: $ update-cache --graft [--add] desired-blob path to say I want to register mode/sha1 from desired-blob, which may not be of verify_path() satisfying name, at path in the dircache? * The merge policy decides to delete the path. We could do this today by first stashing away the file at the path if it exists, update-cache --remove it, and restore if necessary. This is again workable but clumsy. How about: $ update-cache --force-remove path to mean I want to remove the path from dircache even though it may exist in my working tree? So it all boils down to update-cache. The new things to be introduced are: * An explicit update-cache always removes stage 1/2/3 entries associated with the named path. * update-cache --graft * update-cache --force-remove Am I on the right track? You might want to go even lower level by letting them say something like: * update-cache --register-stage mode sha1 stage path Registers the mode/sha1 at stage for path. Does not look at the working tree. stage is [0-3] * update-cache --delete-stage stage-list path Removes the entry at named stages for path. Does not look at the working tree. stage-list is either [0-3](,[0-3])+ or bitmask (i.e. (1 stage-number) ORed together). The former would probably be easier to work with by scripts * write-blob path Hashes and registers the file at path (regardless of what verify_path() says) and writes the resulting blob's mode/sha1 to the standard output. If you take this lower-level approach, an explicit update-cache would not clear stage1/2/3. My preference is the former, not so low-level, interface. Guidance? - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
git-pasky file mode handling
Hi, It seems that there's something weird going on with the file mode handling. Firstly, some files in the git-pasky repository have mode 0664 while others have 0644. Having pulled from git-pasky a number of times, with Petr's being the tracked repository, I now find that when I do an update-cache --refresh, it complains that the files need updating, despite show-diff showing no differences. Investigating, this appears to be because the file modes are wrong for a number of the files. All my files do not have group write. I notice in the changelog what appears to be a dependence on the umask. If this is so, please note that git appears to track the file modes, and any dependence upon the umask is likely to screw with this tracking. -- Russell King - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] Add --stage to show-files for new stage dircache.
JNH == Junio C Hamano [EMAIL PROTECTED] writes: LT == Linus Torvalds [EMAIL PROTECTED] writes: LT I should make ls-files have a -l format, which shows the LT index and the mode for each file too. JNH You probably meant ls-tree. You used the word mode but it JNH already shows the mode so I take it to mean stage. I was *wrong*. Of course you meant show-files. Instead of sending you an apology, I am sending you the one I wrote myself. Please find it in the next message ;-). Here is its sample output. It shows file-mode, SHA1, stage and pathname. I am attaching this one because this is a verification that your read-tree -m passed the test. $ ../show-files --stage 100664 578cc900ed980b72acfbdd1eea63e688a893c458 2 AA 100664 f355077379fce072c210628691da232b59b6f25c 3 AA 100664 d698ebc45d0edfe6e5b95aebb5983cb5c760960b 2 AN 100664 0fa6a8e41814531679e1c76e968a9066fceb689d 1 DD 100664 aff448a9467a4d83b164ef969cfe92ff18eb96be 1 DM 100664 4bfe111723f11cb4a4deec7c837e12601030285f 3 DM 100664 9b0f86e5cded99b9de3bd9d234747ec2d1a4cddd 1 DN 100664 9b0f86e5cded99b9de3bd9d234747ec2d1a4cddd 3 DN 100664 a6772f2a2c15bac796d8c7bb55885891956534cf 1 MD 100664 dc2088ce13f659f2bd554b2c1b343f4966143b9b 2 MD 100664 e4310204563a9059828644464779874c3a406fee 1 MM 100664 fe5ddcd7618d26384cf98c6fcd15780c7125e6d6 2 MM 100664 53a9d14868dbe346a9f0cf01fcda742545b55987 3 MM 100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 1 MN 100664 d7600381b69b92f61bad50c5f8408e831b622ef0 2 MN 100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 3 MN 100664 67fb1517ea8d59949a8e4f5f07f0422b212f64dc 3 NA 100664 0e5842253af8881b2c9f579029d7b50a8e03d7f6 1 ND 100664 0e5842253af8881b2c9f579029d7b50a8e03d7f6 2 ND 100664 0d45c04c9d05fa9c21edf95fc2c1a43519a8c440 1 NM 100664 0d45c04c9d05fa9c21edf95fc2c1a43519a8c440 2 NM 100664 849bfa41d15951f5e97cb93e22cbcc2924ce4517 3 NM 100664 83d94b8fd056921f22ad2ca0122dd7f64974be7c 0 NN This is taken from the dircache after I ran $ read-tree -m O A B using the merge testcase I prepared earlier. Very trivial, single ancestor O, with two branches A B merge case. This covers all possible patterns, except file vs directory conflicts. The filenames are all two letters, first letter being what the first branch does to that file while the second one encodes what the second branch does to it. The actions are: - A means Added in this branch --- did not exist in the ancestor. - N means No change in this branch. - D means Deleted in this branch. - M means Modified in this branch. So, for example, the first branch modified file MN while the second one did not touch it. Of course it existed in the ancestor. You can see that read-tree did the right thing because SHA1 for stage 1 and stage 3 match, and stage 2 is different. 100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 1 MN 100664 d7600381b69b92f61bad50c5f8408e831b622ef0 2 MN 100664 f48f37ea0205a7e5591777b4d3ae0d153d3ef131 3 MN I verified all of the above result and it shows your algorithm is doing exactly what is expected. For those of you who are interested, this is the recipe to reproduce this merge testcase. NOTE! NOTE! NOTE! Do not run this in your working tree, because it trashes .git in its working directory. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- --- /dev/null +++ generate-merge-test.sh @@ -0,0 +1,163 @@ +#!/bin/sh + +: Skip execution up to \End_of_Commentary + +This directory is to hold a test case for merges. + +There is one ancestor (called O for Original) and two branches A +and B derived from it. We want to do 3-way merge between A and +B, using O as the common ancestor. + +merge A O B +diff3 A O B + +Decisions are made by comparing contents of O, A and B pathname +by pathname. The result is determined by the following guiding +principle: + + - If only A does something to it and B does not touch it, take + whatever A does. + + - If only B does something to it and A does not touch it, take + whatever B does. + + - If both A and B does something but in the same way, take + whatever they do. + + - If A and B does something but different things, we need a + 3-way merge: + + - We cannot do anything about the following cases: + + * O does not have it. A and B both must be adding to the + same path independently. + + * A deletes it. B must be modifying. + + - Otherwise, A and B are modifying. Run 3-way merge. + + +First, the case matrix. + + - Vertical axis is for A's actions. + - Horizontal axis is for B's actions. + +.. +| AB | No Action | Delete | Modify |Add | +|++++| +| No Action ||||| +|| select O | delete | select B | select B | +|||||| +|++++| +|
[PATCH 2/2] Add --stage to show-files for new stage dircache.
This adds --stage option to show-files command. It shows file-mode, SHA1, stage and pathname. Record separator follows the usual convention of -z option as before. The patch is on top of the byte order fix for create_ce_flags in my previous message. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- cache.h | 12 +++- show-files.c | 22 ++ 2 files changed, 25 insertions(+), 9 deletions(-) --- cache.h 2005-04-16 03:02:36.0 -0700 +++ cache.h=show-files-stage-flags 2005-04-16 02:48:47.0 -0700 @@ -65,8 +65,14 @@ #define CE_NAMEMASK (0x0fff) #define CE_STAGEMASK (0x3000) +#define CE_STAGESHIFT 12 -#define create_ce_flags(len, stage) htons((len) | ((stage) 12)) +#define create_ce_flags(len, stage) htons((len) | ((stage) CE_STAGESHIFT)) +#define ce_namelen(ce) (CE_NAMEMASK ntohs((ce)-ce_flags)) +#define ce_size(ce) cache_entry_size(ce_namelen(ce)) +#define ce_stage(ce) ((CE_STAGEMASK ntohs((ce)-ce_flags)) CE_STAGESHIFT) + +#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) ~7) const char *sha1_file_directory; struct cache_entry **active_cache; @@ -75,10 +81,6 @@ #define DB_ENVIRONMENT SHA1_FILE_DIRECTORY #define DEFAULT_DB_ENVIRONMENT .git/objects -#define cache_entry_size(len) ((offsetof(struct cache_entry,name) + (len) + 8) ~7) -#define ce_namelen(ce) (CE_NAMEMASK ntohs((ce)-ce_flags)) -#define ce_size(ce) cache_entry_size(ce_namelen(ce)) - #define alloc_nr(x) (((x)+16)*3/2) /* Initialize and use the cache information */ --- show-files.c +++ show-files.c2005-04-16 02:58:32.0 -0700 @@ -14,6 +14,7 @@ static int show_cached = 0; static int show_others = 0; static int show_ignored = 0; +static int show_stage = 0; static int line_terminator = '\n'; static const char **dir; @@ -108,10 +109,19 @@ for (i = 0; i nr_dir; i++) printf(%s%c, dir[i], line_terminator); } - if (show_cached) { + if (show_cached | show_stage) { for (i = 0; i active_nr; i++) { struct cache_entry *ce = active_cache[i]; - printf(%s%c, ce-name, line_terminator); + if (!show_stage) + printf(%s%c, ce-name, line_terminator); + else + printf(/* %06o %s %d %10d %s%c, */ + %06o %s %d %s%c, + ntohl(ce-ce_mode), + sha1_to_hex(ce-sha1), + ce_stage(ce), + /* ntohl(ce-ce_size), */ + ce-name, line_terminator); } } if (show_deleted) { @@ -156,12 +166,16 @@ show_ignored = 1; continue; } + if (!strcmp(arg, --stage)) { + show_stage = 1; + continue; + } - usage(show-files (--[cached|deleted|others|ignored])*); + usage(show-files [-z] (--[cached|deleted|others|ignored|stage])*); } /* With no flags, we default to showing the cached files */ - if (!(show_cached | show_deleted | show_others | show_ignored)) + if (!(show_stage | show_deleted | show_others | show_ignored)) show_cached = 1; read_cache(); - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Add clone support to lntree
Dear diary, on Sat, Apr 16, 2005 at 05:16:12AM CEST, I got a letter where Linus Torvalds [EMAIL PROTECTED] told me that... On Sat, 16 Apr 2005, Petr Baudis wrote: I'm wondering, whether each tree should be fixed to a certain branch. I'm wondering why you talk about branches at all. No such thing should exist. There are no branches. There are just repositories. You can track somebody elses repository, but you should track it by location, not by any branch name. And you track it by just merging it. Yeah, we don't have really usable merges yet, but.. First, this level of branches concerns multiple working directories tied to a single repository. It seems like a sensible thing to do; and you agreed with it too (IIRC). And when you do that, git-pasky just saves some work for you. For git-pasky, branch is really just a symbolic name for a commit ID, which gets updated every time you commit in some repository. Nothing more. So the whole point of this is to have a symbolic name for some other working directory. When you want to merge, you don't need to go over to the other directory, do commit-id, cut'n'paste, and feed that to git merge. You just do git merge myotherbranch Now, about remote repositories. When you pull a remote repository, that does not mean it has to be immediately merged somewhere. It is very useful to have another branch you do *not* want to merge, but you want to do diffs to it, or even check it out / export it later to some separate directory. Again, the branch is just a symbolic name for the head commit ID of what you pulled, and the pointer gets updated every time you pull again - that's the whole point of it. The last concept are tracking working directories. If you pull the tracked branch to this directory, it also automerges it. This is useful when you have a single canonical branch for this directory, which it should always mirror. That would be the case e.g. for the gazillions of Linux users who would like to just have the latest bleeding kernel of your, and they expect to use git just like a different CVS. Basically, they will just do git pull instead of cvs update :-). -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with git-pasky II.
On Fri, Apr 15, 2005 at 08:32:46AM -0700, Linus Torvalds wrote: In other words, I'm right. I'm always right, but sometimes I'm more right than other times. And dammit, when I say files don't matter, I'm really really Right(tm). You're right, of course (All Hail Linus!), if you can make it work efficiently enough. Just to put something else on the table, here's how I'd go about tracking renames and the like, in another world where Linus /does/ make the odd mistake - it's basically a unique id for files in the repository, added when the file is first recognised and updated when update-cache adds a new version to the cache. Renames copy the id across to the new name, and add it into the cache. This gives you an O(n) way to tell what file was what across renames, and it might even be useful in Linus' world, or if someone wanted to build a traditional SCM on top of a git-a-like. Attached is a patch, and a rename-file.c to use it. Simon given that you have multiple machines creating files, how do you deal with the idea of the same 'unique id' being assigned to different files by different machines? David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: space compression (again)
we alrady have the concept of objects that contain objects and therefor don'e need to be re-checked (directories), the chunks inside a file could be the same type of thing. currently we say that if the hash on the directory is the same we don't need to re-check each of the files in that directory, this would be that if the hash on the file hasn't changed we don't need to re-check the chunks inside that file. David Lang On Fri, 15 Apr 2005, Ray Heasman wrote: Date: Fri, 15 Apr 2005 12:33:03 -0700 From: Ray Heasman [EMAIL PROTECTED] To: git@vger.kernel.org Subject: Re: space compression (again) For for this email not threading properly, I have been lurking on the mail list archives and just had to reply to this message. I was planning to ask exactly this question, and Scott beat me to to. I even wanted to call them chunks too. :-) It's probably worthwhile for anyone discussing this subject to read this link: http://www.cs.bell-labs.com/sys/doc/venti/venti.pdf . I know it's been posted before, but it really is worth reading. :-) On Fri, 15 Apr 2005, Linus Torvalds wrote: On Fri, 15 Apr 2005, C. Scott Ananian wrote: Why are blobs per-file? [After all, Linus insists that files are an illusion.] Why not just have 'chunks', and assemble *these* into blobs (read, 'files')? A good chunk size would fit evenly into some number of disk blocks (no wasted space!). I actually considered that. I ended up not doing it, because it's not obvious how to block things up (and even more so because while I like the notion, it flies in the face of the other issues I had: performance and simplicity). I don't think it's as bad as you think. Let's conceptually have two types of files - Pobs (Proxy Objects, or Pointer Objects), and chunks. Both are stored and referenced by their content hash, as usual. Pobs just contain a list of hashes referencing the chunks in a file. When a file is initially stored, we chunk it so each chunk fits comfortably in a block, but otherwise we aren't too critical about sizes. When a file is changed (say, a single line edit), we update the chunk that contains that line, hash it and store it with its new name, and update the Pob, which we rehash and restore. If a chunk grows to be very large (say 2 disk blocks), we can rechunk it and update the Pob to include the new chunks. The problem with chunking is: - it complicates a lot of the routines. Things like is this file unchanged suddenly become is this file still the same set of chunks, which is just a _lot_ more code and a lot more likely to have bugs. You're half right; it will be more complex, but I don't think it's as bad as you think. Pobs are stored by hash just like anything else. If some chunks are different, the pob is different, which means it has a different hash. It's exactly the same as dealing with changed file now. Sure, when you have to fetch the data, you have to read the pob and get a list of chunks to concatenate and return, but your example given doesn't change. - you have to find a blocking factor. I thought of just going it fixed chunks, and that just doesn't help at all. Just use the block size of the filesystem. Some filesystems do tail packing, so space isn't an issue, though speed can be. We don't actually care how big a chunk is, except to make it easy on the filesystem. Individual chunks can be any size. - we already have wasted space due to the low-level filesystem (as opposed to git) usually being block-based, which means that space utilization for small objects tends to suck. So you really want to prefer objects that are several kB (compressed), and a small block just wastes tons of space. If a chunk is smaller than a disk block, this is true. However, if we size it right this is no worse than any other file. Small files (less than a block) can't be made any larger, so they waste space anyway. Large files end up wasting space in one block unless they are a perfect multiple of the block size. When we increase the size of a chunk, it will waste space, but we would have created an entire new file, so we win there too. Admittedly, Pobs will be wasting space too. On the other hand, I use ReiserFS, so I don't care. ;-) - there _is_ a natural blocking factor already. That's what a file boundary really is within the project, and finding any other is really quite hard. Nah. I think I've made a good case it isn't. So I'm personally 100% sure that it's not worth it. But I'm not opposed to the _concept_: it makes total sense in the filesystem view, and is 100% equivalent to having an inode with pointers to blocks. I just don't think the concept plays out well in reality. Well, the reason I think this would be worth it is that you really win when you have multiple parallel copies of a source tree, and changes are cheaper too. If you store all the chunks for all your git repositories in one place, and otherwise treat your trees of Pobs as the real repository, your copied trees only cost you space
Re: SHA1 hash safety
* David Lang [EMAIL PROTECTED] wrote: this issue was raised a few days ago in the context of someone tampering with the files and it was decided that the extra checks were good enough to prevent this (at least for now), but what about accidental collisions? if I am understanding things right the objects get saved in the filesystem in filenames that are the SHA1 hash. of two legitimate files have the same hash I don't see any way for both of them to exist. yes the risk of any two files having the same has is low, but in the earlier thread someone chimed in and said that they had two files on their system that had the same hash.. you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision checking (disabled currently). If there indeed exist two files that have different content but the same hash, could someone send those two files? Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Three points: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. (2) The HMAC (ethernet-harware-address) of any interface _should_ help to make a unique Id. (3) While I havn't looked at the details of the plumbing, this is the time to make sure we can, easily, drop in SHA-160, SHA-256 (or whatever comes from NIST) when needed. David Lang wrote: On Sat, 16 Apr 2005, Ingo Molnar wrote: * David Lang [EMAIL PROTECTED] wrote: this issue was raised a few days ago in the context of someone tampering with the files and it was decided that the extra checks were good enough to prevent this (at least for now), but what about accidental collisions? if I am understanding things right the objects get saved in the filesystem in filenames that are the SHA1 hash. of two legitimate files have the same hash I don't see any way for both of them to exist. yes the risk of any two files having the same has is low, but in the earlier thread someone chimed in and said that they had two files on their system that had the same hash.. you can add -DCOLLISION_CHECK to Makefile:CFLAGS to turn on collision checking (disabled currently). If there indeed exist two files that have different content but the same hash, could someone send those two files? remember that the flap over SHA1 being 'broken' a couple weeks ago was not from researchers finding multiple files with the same hash, but finding that it was more likly then expected that files would have the same hash. there was qa discussion on LKML within the last year about useing MD5 hashes for identifying unique filesystem blocks (with the idea of being able to merge identical blocks) and in that discussion it was pointed out that collisions are a known real-life issue. so if collision detection is turned on in git, does that make it error out if it runs into a second file with the same hash, or does it do something else? David Lang -- Brian - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Proposal for simplification and impovement of the git model
In this message, a method to simplify and at the same time make more powerful the git abstraction is presented. I believe that the enhancements I propose make git adhere even more to its spirit and make it more intuitive. The proposal makes it much easier to build an SCM over git, obtaining in particular the following advantages: - Blob and tree objects become symmetric - Commit objects are removed (their data is put inside tree objects) - Commit comments are per-file - A tree in a repository looks like a repository itself, with full version information (now only the one mentioned in the commit object has version information) - File and directory renames are tracked - Renames are tracked regardless of the way they are made (even with cp and rm) - Commit comments can be updated at any time by whoever made the change - Doing the blame operation is trivial - Minimizing disk space usage (at the expense of speed) by storing diffs is easily doable The basic idea is that rather than having single blob or tree revisions as the base concept, the abstract base unit is the whole set of modifications, with comments, leading to that state. Of course, tracking that would be extremely space-inefficient, so we instead track the current file contents, plus the public key of the author and the hashes of all parents. This is implemented with the following changes to git: - The commit object is removed - Each tree must have a .git-commit file that contains the information previously in the commit object (only for immediate children, thus having a .git-commit file in each directory), but with the author public key instead of the comments - Each blob will be hashed as the blob contents plus an header in a canonical format that contains data similar to the data in the .git-commit file - When checked out, the blob header is put in a C/C++ comment, a # comment, or if the file format is unknown, in an extended attribute or a separate file An example of a C/C++ file with metadata is the following: // @parentSHA1_OF_PARENT1 @parentSHA1_OF_PARENT2 // @authorFINGERPRINT_OF_AUTHOR_PUBLIC_KEY #include stdlib.h int main(int argc, char** argv) { printf(Hello, world!\n); return 0; } Note that @parent and @author in checked out files are NOT the same of the ones in the repository but are crafted so that there is a single @parent pointing to the repository file and @author is taken from $HOME/.gitrc - When the file is checked in, the header is parsed and removed. * If there is a single parent, its header is added and the resulting buffer is hashed and compared with the parent's hash. If equal, the file is unchanged and not committed. * Otherwise, the header data is added in a canonical format and the buffer is hashed and committed - A new class of objects is added, that is not named by their hash, but rather by a public key (or fingerprint of it), a timestamp and a name. The object is correct if and only if the contents plus name and timestamp are signed with the private key corresponding to public key in the name. Object names are formatted as id/name/args where url is an uuid or url that makes the id/name unique, name is the name, and args is additional data. File names formatted like git/c/sha1 are interpreted as commit comments for object sha1. - For storage or network transmission purposes, a binary diff against the parents can be stored instead of the contents af an object. This will of course require to walk the whole history to rebuild it, but smarter schemes are possible (e.g. keyframes, jump diffs, etc.). - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* Ingo Molnar [EMAIL PROTECTED] wrote: the patches contain all the existing metadata, dates, log messages and revision history. (What i think is missing is the BK tree merge information, but i'm not sure we want/need to convert them to GIT.) author names are abbreviated, e.g. 'viro' instead of [EMAIL PROTECTED], and no committer information is included (albeit commiter ought to be Linus in most cases). These are limitations of the BK-CVS gateway i think. Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: git-pasky file mode handling
Dear diary, on Sat, Apr 16, 2005 at 11:45:59AM CEST, I got a letter where Russell King [EMAIL PROTECTED] told me that... Hi, Hello, It seems that there's something weird going on with the file mode handling. Firstly, some files in the git-pasky repository have mode 0664 while others have 0644. Having pulled from git-pasky a number of times, with Petr's being the tracked repository, I now find that when I do an update-cache --refresh, it complains that the files need updating, despite show-diff showing no differences. Investigating, this appears to be because the file modes are wrong for a number of the files. All my files do not have group write. this is was a problem with git apply, which did not apply mode changes correctly until recently. If you have no local changes, checkout-cache -f -a should fix this. Hopefully. I notice in the changelog what appears to be a dependence on the umask. If this is so, please note that git appears to track the file modes, and any dependence upon the umask is likely to screw with this tracking. I personally don't think I like the mode tracking at all. Some people (Linus?) may want to have group +w. Other people (me) have their default group as 'users', and I definitively don't want everyone to be able to write to the files. :-) I think we should track only whether the file is executable or not. Linus? -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
Ingo Molnar [EMAIL PROTECTED] : [...] the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a script that will apply all the patches in order and will create a pristine 2.6.12-rc2 tree. 127 weeks of bk-commit mail for the 2.6 branch alone since october 2002 provides more than 44000 messages here. The figures are surprisingly different. it needed many hours to finish, on a very fast server with tons of RAM, and it also needed a fair amount of manual work to extract it and to make it usable, so i guessed others might want to use the end result as well, to try and generate large GIT repositories from them (or to run analysis over the patches, etc.). Has anyone already compared the (split/digested) content of the ChangeLog file with the commit messages ? It raises the interesting question of inserting the merge messages/patches in the sequence at the right place but I'd like to know if someone met other issues. -- Ueimor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: SHA1 hash safety
Dear diary, on Sat, Apr 16, 2005 at 04:58:15PM CEST, I got a letter where C. Scott Ananian [EMAIL PROTECTED] told me that... On Sat, 16 Apr 2005, Brian O'Mahoney wrote: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. Dude! You could have been *famous*! Why the aitch-ee-double-hockey-sticks didn't you publish this when you found it? Seriously, man. Even given the known weaknesses in MD5, it would take much more than a million documents to find MD5 collisions. I can only conclude that the hash was being used incorrectly; most likely truncated (my wild-ass guess would be to 32 bits; a collision is likely with 50% probability in a million document store for a hash of less than 40 bits). I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. I believe there are only two or so known to exist so far, and those were found by a research team in China (which, yes, is fairly famous among the cryptographic community now after publishing a paper consisting of little apart from the two collisions themselves). http://cryptography.hyperlink.cz/MD5_collisions.html -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/2] merge-trees script for Linus git
On Sat, 16 Apr 2005, Junio C Hamano wrote: LT NOTE NOTE NOTE! I could make read-tree do some of these nontrivial LT merges, but I ended up deciding that only the matches in all three LT states thing collapses by default. * Understood and agreed. Having slept on it, I think I'll merge all the trivial cases that don't involve a file going away or being added. Ie if the file is in all three trees, but it's the same in two of them, we know what to do. That way we'll leave thigns where the tree itself changed (files added or removed at any point) and/or cases where you actually need a 3-way merge. The userland merge policies need ways to extract the stage information and manipulate them. Am I correct to say that you mean by ls-files -l the extracting part? No, I meant show-files, since we need to show the index, not a tree (no valid tree can ever have the modes information, since (a) it doesn't have the space for it anyway and (b) we refuse to write out a dirty index file. LT I should make ls-files have a -l format, which shows the LT index and the mode for each file too. You probably meant ls-tree. You used the word mode but it already shows the mode so I take it to mean stage. Perhaps something like this? $ ls-tree -l -r 49c200191ba2e3cd61978672a59c90e392f54b8b 100644blobfe2a4177a760fd110e78788734f167bd633be8deCOPYING 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:1 man/frotz.6 100644blobb39b4ea37586693dd707d1d0750a9b580350ec50:2 man/frotz.6 100664blobeeed997e557fb079f38961354473113ca0d0b115:3 man/frotz.6 Apart from the fact that it would be show-files -l since there are no tree objects that can have anything but fully merged state, yes. Assuming that you would be working on that, I'd like to take the dircache manipulation part. Let's think about the minimally necessary set of operations: * The merge policy decides to take one of the existing stage. In this case we need a way to register a known mode/sha1 at a path. We already have this as update-cache --cacheinfo. We just need to make sure that when update-cache puts things at stage 0 it clears other stages as well. * The merge policy comes up with a desired blob somewhere on the filesystem (perhaps by running an external merge program). It wants to register it as the result of the merge. We could do this today by first storing the desired blob in a temporary file somewhere in the path the dircache controls, update-cache --add the temporary file, ls-tree to find its mode/sha1, update-cache --remove the temporary file and finally update-cache --cacheinfo the mode/sha1. This is workable but clumsy. How about: $ update-cache --graft [--add] desired-blob path to say I want to register mode/sha1 from desired-blob, which may not be of verify_path() satisfying name, at path in the dircache? * The merge policy decides to delete the path. We could do this today by first stashing away the file at the path if it exists, update-cache --remove it, and restore if necessary. This is again workable but clumsy. How about: $ update-cache --force-remove path to mean I want to remove the path from dircache even though it may exist in my working tree? Yes. Am I on the right track? Exactly. You might want to go even lower level by letting them say something like: * update-cache --register-stage mode sha1 stage path Registers the mode/sha1 at stage for path. Does not look at the working tree. stage is [0-3] I'd prefer not. I'd avoid playing games with the stages at any other level than the full tree level until we show a real need for it. Let's go with the known-needed minimal cases that are high-level enough to make the scripting simple, and see if there is any reason to ever touch the tree any other way. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Merge with git-pasky II.
Hi, On Fri, 15 Apr 2005, David Woodhouse wrote: But if it can be done cheaply enough at a later date even though we end up repeating ourselves, and if it can be done _well_ enough that we shouldn't have just asked the user in the first place, then yes, OK I agree. The repetition could be helped by using a cache. Ciao, Dscho - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Merge with git-pasky II.
On Sat, Apr 16, 2005 at 06:03:33PM +0200, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 05:55:37PM CEST, I got a letter where Simon Fowler [EMAIL PROTECTED] told me that... On Sat, Apr 16, 2005 at 05:19:24AM -0700, David Lang wrote: Simon given that you have multiple machines creating files, how do you deal with the idea of the same 'unique id' being assigned to different files by different machines? The id is a sha1 hash of the current time and the full path of the file being added - the chances of that being replicated without malicious intent is extremely small. There are other things that could be used, like the hostname, username of the person running the program, etc, but I don't really see them being necessary. Why not just use UUID? Hey, everything else in git seems to use sha1, so I just copied Linus' sha1 code ;-) All I wanted was something that had a good chance of being unique across any potential set of distributed repositories, to avoid the chance of accidental clashes. A sha1 hash of something that's not likely to be replicated is a simple way to do that. Simon -- PGP public key Id 0x144A991C, or http://himi.org/stuff/himi.asc (crappy) Homepage: http://himi.org doe #237 (see http://www.lemuria.org/DeCSS) My DeCSS mirror: ftp://himi.org/pub/mirrors/css/ signature.asc Description: Digital signature
Re: [PATCH 3/2] merge-trees script for Linus git
On Sat, 16 Apr 2005, Linus Torvalds wrote: Having slept on it, I think I'll merge all the trivial cases that don't involve a file going away or being added. Ie if the file is in all three trees, but it's the same in two of them, we know what to do. Junio, I pushed this out, along with the two patches from you. It's still more anal than my original tree-diff algorithm, in that it refuses to touch anything where the name isn't the same in all three versions (original, new1 and new2), but now it does the if two of them match, just select the result directly trivial merges. I really cannot see any sane case where user policy might dictate doing anything else, but if somebody can come up with an argument for a merge algorithm that wouldn't do what that trivial merge does, we can make a flag for don't merge at all. The reason I do want to merge at all in read-tree is that I want to avoid having to write out a huge index-file (it's 1.6MB on the kernel, so if you don't do _any_ trivial merges, it would be 4.8MB after reading three trees) and then having people read it and parse it just to do stuff that is obvious. Touching 5MB of data isn't cheap, even if you don't do a whole lot to it. Anyway, with the modified read-tree, as far as I can tell it will now merge all the cases where one side has done something to a file, and the other side has left it alone (or where both sides have done the exact same modification). That should _really_ cut down the cases to just a few files for most of the kernel merges I can think of. Does it do the right thing for your tests? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Ingo Molnar wrote: i've converted the Linux kernel CVS tree into 'flat patchset' format, which gave a series of 28237 separate patches. (Each patch represents a changeset, in the order they were applied. I've used the cvsps utility.) the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a script that will apply all the patches in order and will create a pristine 2.6.12-rc2 tree. Hey, that's great. I got the CVS repo too, and I was looking at it, but the more I looked at it, the more I felt that the main reason I want to import it into git ends up being to validate that my size estimates are at all realistic. I see that Thomas Gleixner seems to have done that already, and come to a figure of 3.2GB for the last three years, which I'm very happy with, mainly because it seems to match my estimates to a tee. Which means that I just feel that much more confident about git actually being able to handle the kernel long-term, and not just as a stop-gap measure. But I wonder if we actually want to actually populate the whole history.. Now that my size estimates have been verified, I have little actual real reason to put the history into git. There are no visualization tools done for git yet, and no helpers to actually find problems, and by the time there will be, we'll have new history. So I'd _almost_ suggest just starting from a clean slate after all. Keeping the old history around, of course, but not necessarily putting it into git now. It would just force everybody who is getting used to git in the first place to work with a 3GB archive from day one, rather than getting into it a bit more gradually. What do people think? I'm not so much worried about the data itself: the git architecture is _so_ damn simple that now that the size estimate has been confirmed, that I don't think it would be a problem per se to put 3.2GB into the archive. But it will bog down rsync horribly, so it will actually hurt synchronization untill somebody writes the rev-tree-like stuff to communicate changes more efficiently.. IOW, it smells to me like we don't have the infrastructure to really work with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can build up the infrastructure in parallell with starting to really need it. But it's _great_ to have the history in this format, especially since looking at CVS just reminded me how much I hated it. Comments? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 2005-04-16 at 10:04 -0700, Linus Torvalds wrote: So I'd _almost_ suggest just starting from a clean slate after all. Keeping the old history around, of course, but not necessarily putting it into git now. It would just force everybody who is getting used to git in the first place to work with a 3GB archive from day one, rather than getting into it a bit more gradually. Sure. We can export the 2.6.12-rc2 version of the git'ed history tree and start from there. Then the first changeset has a parent, which just lives in a different place. Thats the only difference to your repository, but it would change the sha1 sums of all your changesets. What do people think? I'm not so much worried about the data itself: the git architecture is _so_ damn simple that now that the size estimate has been confirmed, that I don't think it would be a problem per se to put 3.2GB into the archive. But it will bog down rsync horribly, so it will actually hurt synchronization untill somebody writes the rev-tree-like stuff to communicate changes more efficiently.. We have all the tracking information in SQL and we will post the data base dump soon, so people interested in revision tracking can use this as an information base. But it's _great_ to have the history in this format, especially since looking at CVS just reminded me how much I hated it. :) One remark on the tree blob storage format. The binary storage of the sha1sum of the refered object is a PITA for scripting. Converting the ASCII - binary for the sha1sum comparision should not take much longer than the binary - ASCII conversion for the file reference. Can this be changed ? tglx - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: full kernel history, in patchset format
Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter where Thomas Gleixner [EMAIL PROTECTED] told me that... One remark on the tree blob storage format. The binary storage of the sha1sum of the refered object is a PITA for scripting. Converting the ASCII - binary for the sha1sum comparision should not take much longer than the binary - ASCII conversion for the file reference. Can this be changed ? Huh, you aren't supposed to peek into trees directly. What's wrong with ls-tree? -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* A script git-archive-tar is used to create a base tarball that roughly corresponds to linux-*.tar.gz. This works as follows: $ git-archive-tar C [B1 B2...] This reads the named commit C, grabs the associated tree (i.e. its sub-tree objects and the blob they refer to), and makes a tarball of ??/?? files. The tarball does not have to contain any extra information to reproduce any ancestor of the named commit. alternatively, git-archive-torrent to create a list of files for a bittorrent feed -- Mike Taht - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: full kernel history, in patchset format
Dear diary, on Sat, Apr 16, 2005 at 08:32:32PM CEST, I got a letter where Petr Baudis [EMAIL PROTECTED] told me that... Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter where Thomas Gleixner [EMAIL PROTECTED] told me that... One remark on the tree blob storage format. The binary storage of the sha1sum of the refered object is a PITA for scripting. Converting the ASCII - binary for the sha1sum comparision should not take much longer than the binary - ASCII conversion for the file reference. Can this be changed ? Huh, you aren't supposed to peek into trees directly. What's wrong with ls-tree? (I meant, you aren't supposed to peek into trees from scripts. Or well, not not supposed, but it does not make much sense when you have ls-tree.) -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: full kernel history, in patchset format
On Sat, 2005-04-16 at 20:32 +0200, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 09:23:40PM CEST, I got a letter where Thomas Gleixner [EMAIL PROTECTED] told me that... One remark on the tree blob storage format. The binary storage of the sha1sum of the refered object is a PITA for scripting. Converting the ASCII - binary for the sha1sum comparision should not take much longer than the binary - ASCII conversion for the file reference. Can this be changed ? Huh, you aren't supposed to peek into trees directly. What's wrong with ls-tree? Why I'm not supposed ? Is this evil ? My export script has all the data available, so I write the tree refs directly. The full export runs ~1 hour. Thats long enough :) I tried the git way and it slows me down by factor BIG (I dont remember the number) Also for reference tracking all the information might be available e.g. by a database. Why should the revtool then use some tool to retrieve information which is already there ? tglx - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Thomas Gleixner wrote: One remark on the tree blob storage format. The binary storage of the sha1sum of the refered object is a PITA for scripting. Converting the ASCII - binary for the sha1sum comparision should not take much longer than the binary - ASCII conversion for the file reference. Can this be changed ? I'd really rather not. Why don't you just use ls-tree for scripting? That's why it exists in the first place. It might make sense to have some simple selection capabilities built into ls-tree (ie ls-tree --match drivers/char/ -z treesha1 to get just a subtree out), but that depends entirely on how you end up using it. The fact is, there should _never_ any reason to look at the objects themselves directly. cat-file is a debugging aid, it shouldn't be scripted (with the possible exception of cat-file blob to just extract the blob contents, since that object doesn't have any internal structure). That level of abstraction (we never look directly at the objects) is what allows us to change the object structure later. For example, we already changed the commit date thing once, and the tree object has obviously evolved a bit, and if we ever change the hash, the objects will change too, but if you always just script them using nice helper tools, you won't ever need to _care_. And that's how it should be. If there's a tool missing, holler. THAT is the part I've been trying to write: all the plumbing so that you _can_ script the thing sanely, and not worry about how objects are created and worked with. For example, that index file format likely _will_ change. I ended up doing the new stage flags in a way that kept the index file compatible with old ones, but I did that mainly because it also happened to be the easiest way to enforce the rule I wanted to enforce (ie the stage really _is_ a part of the filename from a compare filenames standpoint, in order to make sure that the stages are always ordered). So if the index file change hadn't had that property, I'd have just said I'll change the format, and anybody who tried to parse the index file would have been _broken_. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
JCH == Junio C Hamano [EMAIL PROTECTED] writes: JCH I have been cooking this idea before I dove into the merge stuff JCH and did not have time to implement it myself (Hint Hint), but I JCH think something along the following lines would work nicely: It should be fairly obvious from the context what I meant to say, but in case somebody gets confused by my inaccurate description of small details (or, before somebody nitpicks ;-), I'd add some clarifications and corrections. JCH * Run diff-tree between neighboring commits [*1*] to find out JCHthe set of blobs that are related. Extract those related JCHblobs and run diff [*2*] between them to see if it produces JCHa patch smaller than the whole thing when compressed. If JCHdiff+patch is a win, then we do not have to transmit the blob JCHthat we could reproduce by sending the diff. Note that fact. I talked only about blobs here, but I really mean all types: commits, trees and blobs here. Nothing prevents us from extracting the raw data for trees and commits and run diff between them. We can use cat-file to do that today. What we do not have is the reverse of $ cat-file type rawdata (i.e. $ write-file type rawdata), but that is trivial to write. The raw data for related tree objects should delta well. I do not think it is worth the effort to attempt delta for commit objects. Anything that git-archive-tar decides not to send in diff+patch form, be it blob or tree or commit, should be noted here, not just blob as my previous message incorrectly implies. JCH Given the above, the operation of git-archive-patch is also JCH quite obvious. Extract the diff package tarball into the JCH objects/ directory that has (at least) the full Bn, uncompress JCH the patch file part, and run patch on it. Of course after you ran patch to reproduce the raw data for the blob or tree, we need the reverse of cat-file to register such data under object/ hierarchy. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 2005-04-16 at 11:44 -0700, Linus Torvalds wrote: That level of abstraction (we never look directly at the objects) is what allows us to change the object structure later. For example, we already changed the commit date thing once, and the tree object has obviously evolved a bit, and if we ever change the hash, the objects will change too, but if you always just script them using nice helper tools, you won't ever need to _care_. And that's how it should be. For the export stuff its terrible slow. :( I agree that using common tools is good. But we talk also about an open format, so using a script to speed up certain tasks is not bad at all. tglx - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: full kernel history, in patchset format
On Sat, Apr 16, 2005 at 07:43:27PM +0200, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 07:04:31PM CEST, I got a letter where Linus Torvalds [EMAIL PROTECTED] told me that... So I'd _almost_ suggest just starting from a clean slate after all. Keeping the old history around, of course, but not necessarily putting it into git now. It would just force everybody who is getting used to git in the first place to work with a 3GB archive from day one, rather than getting into it a bit more gradually. Comments? FWIW, it looks pretty reasonable to me. Perhaps we should have a separate GIT repository with the previous history though, and in the first new commit the parent could point to the last commit from the other repository. Just if it isn't too much work, though. :-) I think we can make the git using stackable repository. When it fail to find an object, it will try it's to read from parent repository. It is useful to slice the history. I can have local repository that all the new object create by me will store in my tree instead of the official one. Clean up the object in the my local tree will be much easier it only need to work on a much smaller repository. If all my change is merge to official tree, I just simply empty my local repository. About the kernel git repository. I think it is much easier just put them in one tree. So I don't need to worry about if I need to see pre 2.6.12, I need to do this. And the full repository need to store in the server some where any way. However I totally agree that people should not deal with unnecessary the history when they start using the git tools. We should just make the tools by default don't download all the histories. Only get it when user specific ask for it. Why 2.6.12-rc2? When kernel grows to 2.6.15, a new user might not even need pre 2.6.13 most of the time. If we make it very easier for people to get history if they need, it will make them less motivate to store unnecessary history locally (just in case I need it). I think we should not advise using rsync to sync the whole git tree as way to get update. We need to get use to only have a slice of the history and get more if we needed. The server should should provide some small metadata file like the the rev-tool cache, so the SCM tools can download it to figure out what file is needed to download to get to certain revision. Instead of download the whole repository to figure out what is new. We can even slice that metadata information to smaller pieces base on major release point. Chris - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 2005-04-16 10:04:31 -0700, Linus Torvalds [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: What do people think? I'm not so much worried about the data itself: the git architecture is _so_ damn simple that now that the size estimate has been confirmed, that I don't think it would be a problem per se to put 3.2GB into the archive. But it will bog down rsync horribly, so it will actually hurt synchronization untill somebody writes the rev-tree-like stuff to communicate changes more efficiently.. IOW, it smells to me like we don't have the infrastructure to really work with 3GB archives, and that if we start from scratch (2.6.12-rc2), we can build up the infrastructure in parallell with starting to really need it. 3GB is quite some data, but I'd accept and prefer to download it from somewhere. I think that it's worth it. I accept that there are people out there which would love to get a smaller archive, but at least most developers that would actually use it for day-to-day work *do* have the bandwidth to download it. Maybe we'd also prepare (from time to time) bzip'ed tarballs, which I expect to be a tad smaller. MfG, JBG -- Jan-Benedict Glaw [EMAIL PROTECTED]. +49-172-7608481 _ O _ Eine Freie Meinung in einem Freien Kopf| Gegen Zensur | Gegen Krieg _ _ O fuer einen Freien Staat voll Freier Brger | im Internet! | im Irak! O O O ret = do_actions((curr | FREE_SPEECH) ~(NEW_COPYRIGHT_LAW | DRM | TCPA)); signature.asc Description: Digital signature
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Thomas Gleixner wrote: For the export stuff its terrible slow. :( I don't really see your point. If you already know what the tree is like you say, you don't care about the tree object. And if you don't know what the tree is, what _are_ you doing? In other words, show us what you're complaining about. If you're looking into the trees yourself, then the binary representation of the sha1 is already what you want. That _is_ the hash. So why do you want it in ASCII? And if you're not looking into the tree directly, but using cat-file tree and you were hoping to see ASCII data, then that's certainly not going to be any faster than just doing ls-tree instead. In other words, I don't see your point. Either you want ascii output for scripting, or you don't. First you claimed that you did, and that you would want the tree object to change in order to do so. Now you claim that you can't use ls-tree because it's too slow. That just isn't making any sense. You're mixing two totally different levels, and complaining about performance when scripting things. Yet you're talking about a 20-byte data structure that is trivial to convert to any format you want. What kind of _strange_ scripting architecture is so fast that there's a difference between cat-file and ls-tree and can handle 17,000 files in 60,000 revisions, yet so slow that you can't trivially convert 20 bytes of data? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* David Mansfield [EMAIL PROTECTED] wrote: Ingo Molnar wrote: * Ingo Molnar [EMAIL PROTECTED] wrote: the patches contain all the existing metadata, dates, log messages and revision history. (What i think is missing is the BK tree merge information, but i'm not sure we want/need to convert them to GIT.) author names are abbreviated, e.g. 'viro' instead of [EMAIL PROTECTED], and no committer information is included (albeit commiter ought to be Linus in most cases). These are limitations of the BK-CVS gateway i think. Glad to hear cvsps made it through! I'm curious what the manual fixups required were, except for the binary file issue (logo.gif). --cvs-direct was needed to speed it up from 'several days to finish' to 'several hours to finish', but it crashed on a handful of patches [i used the latest devel snapshot so this isnt a complaint]. (one of the crashes was when generating 1860.patch.) Also, 'cvs rdiff' apparently emits an empty patch for diffs that remove a file that end without having a newline character - but this isnt cvsps's problem. (grep for +++ in the patchset to find those cases.) As to the actual email addresses, for more recent patches, the Signed-off should help. For earlier ones, isn't their some script which 'knows' a bunch of canonical author-email mappings? (the shortlog script or something)? yeah, that's not that much of a problem, most of the names are unique, and the rest can be fixed up too. Is the full committer email address actually in the changeset in BK? If so, given that we have the unique id (immutable I believe) of the changset, could it be extracted directly from BK? i think it's included in BK. Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
* Linus Torvalds [EMAIL PROTECTED] wrote: the history data starts at 2.4.0 and ends at 2.6.12-rc2. I've included a script that will apply all the patches in order and will create a pristine 2.6.12-rc2 tree. Hey, that's great. I got the CVS repo too, and I was looking at it, but the more I looked at it, the more I felt that the main reason I want to import it into git ends up being to validate that my size estimates are at all realistic. I see that Thomas Gleixner seems to have done that already, and come to a figure of 3.2GB for the last three years, which I'm very happy with, mainly because it seems to match my estimates to a tee. [...] (yeah, we apparently worked in parallel - i only learned about his efforts after i sent my mail. He was using BK to extract info, i was using the CVS tree alone and no BK code whatsoever. (I dont think there will be any argument about who owns what, but i wanted to be on the safe side, and i also wanted to see how complete and usable the CVS metadata is - it's close to perfect i'd say, for the purposes i care about.)) But I wonder if we actually want to actually populate the whole history.. yeah, it definitely feels a bit brave to import 28,000 changesets into a source-code database project that will be a whopping 2 weeks old in 2 days ;) Even if we felt 100% confident about all the basics (which we do of course ;), it's just simply too young to tie things down via a 3.2GB database. It feels much more natural to grow it gradually, 28,000 changesets i'm afraid would just suffocate the 'project growth dynamics'. Not going too fast is just as important as not going too slow. I didnt generate the patchset to get it added into some central repository right now, i generated it to check that we _do_ have all the revision history in an easy to understand format which does generate today's kernel tree, so that we can lean back and worry about the full database once things get a bit more settled down (in a couple of months or so). It's also an easy testbed for GIT itself. but the revision history was one of the main reasons i used BK myself, so we'll need a merged database eventually. Occasionally i needed to check who was the one who touched a particular piece of code - was that fantastic new line of code written by me, or was that buggy piece of crap written by someone else? ;) Also, looking at a change and then going to the changeset that did it, and then looking at the full picture was pretty useful too. So that sort of annotation, and generally navigating around _quickly_ and looking at the 'flow' of changes going into a particular file was really useful (for me). Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
Hello! This adds a program to download a commit, the trees, and the blobs in them from a remote repository using HTTP. It skips anything you already have. Is it really necessary to write your own HTTP downloader? If so, is it necessary to forget basic stuff like the Host: header? ;-) If you feel that it should be optimized for speed, then at least use persistent connections. + if (memcmp(target, http://;, 7)) + return -1; Can crash if the string is too short. + entry = gethostbyname(name); + memcpy(sockad.sin_addr.s_addr, +((struct in_addr *)entry-h_addr)-s_addr, 4); Can crash if the host doesn't exist or if you feed it with an URL containing port number. +static int get_connection() (void) + local = open(filename, O_WRONLY | O_CREAT | O_EXCL, 0666); What if it fails? Have a nice fortnight -- Martin `MJ' Mares [EMAIL PROTECTED] http://atrey.karlin.mff.cuni.cz/~mj/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth A student who changes the course of history is probably taking an exam. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
On Sat, 16 Apr 2005, Tony Luck wrote: On 4/16/05, Daniel Barkalow [EMAIL PROTECTED] wrote: +buffer = read_sha1_file(sha1, type, size); You never free this buffer. Ideally, this should all be rearranged to share the code with read-tree, and it should be fixed in common. It would also be nice if you saved tree objects in some temporary file and did not install them until after you had fetched all the blobs and trees that this tree references. Then if your connection is interrupted you can just restart it. It looks over everything relevant, even if it doesn't need to download anything, so it should work to continue if it stops in between. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
Tony Luck wrote: Otherwise this looks really nice. I was going to script something similar using wget ... but that would have made zillions of seperate connections. Not so kind to the server. How about building a file list and doing a batch download via 'wget -i /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses connectionss when successive URLs point to the same server. Writing yet another http client does seem a bit pointless, what with wget and curl available. The real win lies in creating the smarts to get the minimum number of files. --Adam - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
On Sun, 17 Apr 2005, Martin Mares wrote: Hello! This adds a program to download a commit, the trees, and the blobs in them from a remote repository using HTTP. It skips anything you already have. Is it really necessary to write your own HTTP downloader? If so, is it necessary to forget basic stuff like the Host: header? ;-) I wanted to get something hacked quickly; can you suggest a good one to use? If you feel that it should be optimized for speed, then at least use persistent connections. That's the next step. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
On Sat, 16 Apr 2005, Adam Kropelin wrote: Tony Luck wrote: Otherwise this looks really nice. I was going to script something similar using wget ... but that would have made zillions of seperate connections. Not so kind to the server. How about building a file list and doing a batch download via 'wget -i /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses connectionss when successive URLs point to the same server. You need to look at some of the files before you know what other files to get. You could do it in waves, but that would be excessively complicated to code and not the most efficient anyway. -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
that's the difference between CS researchers and sysadmins. sysadmins realize that there are an infinante number of files that map to the same hash value and plan accordingly (becouse we KNOW we will run across them eventually), and don't see it as a big deal when we finally do. CS researches quote statistics that show how hard it is to intentiallly create two files with the same hash and insist it just doesn't happen until presented by the proof, at which point it is a big deal. a difference in viewpoints. David Lang On Sat, 16 Apr 2005, C. Scott Ananian wrote: Date: Sat, 16 Apr 2005 10:58:15 -0400 (EDT) From: C. Scott Ananian [EMAIL PROTECTED] To: [EMAIL PROTECTED] Cc: David Lang [EMAIL PROTECTED], Ingo Molnar [EMAIL PROTECTED], git@vger.kernel.org Subject: Re: SHA1 hash safety On Sat, 16 Apr 2005, Brian O'Mahoney wrote: (1) I _have_ seen real-life collisions with MD5, in the context of Document management systems containing ~10^6 ms-WORD documents. Dude! You could have been *famous*! Why the aitch-ee-double-hockey-sticks didn't you publish this when you found it? Seriously, man. Even given the known weaknesses in MD5, it would take much more than a million documents to find MD5 collisions. I can only conclude that the hash was being used incorrectly; most likely truncated (my wild-ass guess would be to 32 bits; a collision is likely with 50% probability in a million document store for a hash of less than 40 bits). I know the current state of the art here. It's going to take more than just hearsay to convince me that full 128-bit MD5 collisions are likely. I believe there are only two or so known to exist so far, and those were found by a research team in China (which, yes, is fairly famous among the cryptographic community now after publishing a paper consisting of little apart from the two collisions themselves). Please, let's talk about hash collisions responsibly. I posted earlier about the *actual computed probability* of finding two files with an SHA-1 collision before the sun goes supernova. It's 10^28 to 1 against. The recent cryptographic works has shown that there are certain situations where a decent amount of computer work (2^69 operations) can produce two sequences with the same hash, but these sequences are not freely chosen; they've got very specific structure. This attack does not apply to (effectively) random files sitting in a SCM. http://www.schneier.com/blog/archives/2005/02/sha1_broken.html That said, Linux's widespread use means that it may not be unimaginable for an attacker to devote this amount of resources to an attack, which would probably involve first committing some specially structured file to the SCM (but would Linus accept it?) and then silently corrupting said file via a SHA1 collision to toggle some bits (which would presumably Do Evil). Thus hashes other than SHA1 really ought to be considered... ...but the cryptographic community has not yet come to a conclusion on what the replacement ought to be. These attacks are so new that we don't really understand what it is about the structure of SHA1 which makes them possible, which makes it hard to determine which other hashes are similarly vulnerable. It will take time. I believe Linus has already stated on this list that his plan is to eventually provide a tool for bulk migration of an existing SHA1 git repository to a new hash type. Basically munging through the repository in bulk, replacing all the hashes. This seems a perfectly adequate strategy at the moment. --scott WASHTUB Panama Minister Moscow explosives KUGOWN hack Marxist LPMEDLEY genetic immediate radar SCRANTON COBRA JANE KGB Shoal Bay atomic Bejing ( http://cscott.net/ ) -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] update-cache --refresh cache entry leak
When update-cache --refresh replaces an existing cache entry with a new one, it forgets to free the original. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- update-cache.c: 61d2b93a751f35ba24f479cd4fc151188916f02a --- update-cache.c +++ update-cache.c 2005-04-16 15:49:03.0 -0700 @@ -203,6 +203,8 @@ printf(%s: needs update\n, ce-name); continue; } + if (new != ce) + free(ce); active_cache[i] = new; } } - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Full history
Hi, I can publish the stuff on monday from a university nearby. --- total blob objects = 228384 total tree objects = 172507 total commit objects = 55877 The empty changesets which are noting merges are omitted at the moment. Is it of interest to include them ?? It might also be interesting to export/merge the various subsystem/maintainer trees including 2.4 into this archive. This would cover the complete history Disk space according to # du -sh blobs ~ 2GiB tree and commit objects ~ 1.3GiB I looked at the spread of the 450k+ objects over the 256 subdirectories in my exported git repository: total 456768 max per XX subdir = 1646 avg per XX subdir = 1784 min per XX subdir = 1936 tglx - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Add clone support to lntree
On Sun, 17 Apr 2005, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 05:06:54AM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... I think fork is as good as anything for describing the operation. I had thought about clone because it seemed to fill the role that bk clone had (although I never used BK, so I'm not sure). It doesn't seem useful to me to try cloning multiple remote repositories, since you'd get a copy of anything common from each; you just want to suck everything into the same .git/objects and split off working directories. Actually, what about if git pull outside of repository did what git clone does now? I'd kinda like clone instead of fork too. This seems like the best solution to me, too. Although that would make pull take a URL when making a new repository and not otherwise, which might be confusing. init-remote perhaps, or maybe just have init do it if given a URL? -Daniel *This .sig left intentionally blank* - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: full kernel history, in patchset format
On Sat, 16 Apr 2005, Thomas Gleixner wrote: On Sat, 2005-04-16 at 10:04 -0700, Linus Torvalds wrote: So I'd _almost_ suggest just starting from a clean slate after all. Keeping the old history around, of course, but not necessarily putting it into git now. It would just force everybody who is getting used to git in the first place to work with a 3GB archive from day one, rather than getting into it a bit more gradually. Sure. We can export the 2.6.12-rc2 version of the git'ed history tree and start from there. Then the first changeset has a parent, which just lives in a different place. Thats the only difference to your repository, but it would change the sha1 sums of all your changesets. at least start with a full release. say 2.6.11 the history won't be blank, but it's far more likly that people will care about the details between 2.6.11 and 2.6.12 and will want to go back before -rc2 David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
what I'm talking about is the chance that somewhere, sometime there will be two different documents that end up with the same hash I have vastly greater chance of a file colliding due to hardware or software glitch than a random message digest collision of two legitimate documents. I've lost quite a few files in 25 years of computing to just such glitches, sometimes without knowing it until months or years later. We've already computed the chances of a random pure hash collision with SHA1 - it's something like an average of 1 collision every 10 billion years if we have 10,000 coders generating 1 new file version every minute, non-stop, 24 hours a day, 365 days a year. Get real. There are _many_ sources of random error in our tools. When some sources are billions of billions times more likely to occur, it makes sense to worry about them first. Reminds me of the drunk looking under the lamp post for the house keys he dropped - because that's where the light is. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
sysadmins realize that there are an infinante number of files that map to Sysadmins know that there are an infinite ways for their systems to crap out, and try to cover for the ones that there is a snow balls chance in Hades of them seeing in their lifetime. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Optionally tell show-diff to show only named files
SCMs have ways to say I want diff only this particular file, or I want diff files under this directory. This patch teaches show-diff to do something similar. Without command line arguments, it still examines everything in the dircache as before. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c | 38 ++ 1 files changed, 30 insertions(+), 8 deletions(-) show-diff.c: 5f3d4699566843a5448260e5da286ed65d90e397 --- show-diff.c +++ show-diff.c 2005-04-16 16:07:07.0 -0700 @@ -55,6 +55,23 @@ } } +static const char *show_diff_usage = show-diff [-s] [-q] [paths...]; + +static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt) +{ + int i; + int namelen = ce_namelen(ce); + for (i = 0; i cnt; i++) { + int speclen = strlen(spec[i]); + if (! strncmp(spec[i], ce-name, speclen) + speclen = namelen + (ce-name[speclen] == 0 || +ce-name[speclen] == '/')) + return 1; + } + return 0; +} + int main(int argc, char **argv) { int silent = 0; @@ -62,18 +79,19 @@ int entries = read_cache(); int i; - for (i = 1; i argc; i++) { - if (!strcmp(argv[i], -s)) { + while (1 argc argv[1][0] == '-') { + if (!strcmp(argv[1], -s)) silent_on_nonexisting_files = silent = 1; - continue; - } - if (!strcmp(argv[i], -q)) { + else if (!strcmp(argv[1], -q)) silent_on_nonexisting_files = 1; - continue; - } - usage(show-diff [-s] [-q]); + else + usage(show_diff_usage); + argv++; argc--; } + /* At this point, if argc == 1, then we are doing everything. +* Otherwise argv[1] .. argv[argc-1] have the explicit paths. +*/ if (entries 0) { perror(read_cache); exit(1); @@ -86,6 +104,10 @@ char type[20]; void *new; + if (1 argc + ! matches_pathspec(ce, argv+1, argc-1)) + continue; + if (stat(ce-name, st) 0) { if (errno == ENOENT silent_on_nonexisting_files) continue; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Hi! We've already computed the chances of a random pure hash collision with SHA1 - it's something like an average of 1 collision every 10 billion years if we have 10,000 coders generating 1 new file version every minute, non-stop, 24 hours a day, 365 days a year. GIT is safe even for the millions of monkeys writing Shakespeare :-) Have a nice fortnight -- Martin `MJ' Mares [EMAIL PROTECTED] http://atrey.karlin.mff.cuni.cz/~mj/ Faculty of Math and Physics, Charles University, Prague, Czech Rep., Earth Homo homini lupus, frater fratri lupior, bohemus bohemo lupissimus. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] optimize gitdiff-do script
Rewrite gitdiff-do so that it works with arbitrary whitespace (space, tab, newline, ...) in filenames. Reduce number of subcommands execv'd by a third, by only calling 'rm' once, at end, not each loop. Avoid using shell arrays; perhaps more portable. Avoid 'echo -e' when displaying names; dont expand escape sequences in names. Use shell noglob (-f) to minimize getdents() calls. Simplify argument parsing and tmp file management. Comment the nastier shell patterns. This reduces the time by about 1/3 of what it was. Signed-off-by: Paul Jackson [EMAIL PROTECTED] Index: git-pasky-0.4/gitdiff-do === --- git-pasky-0.4.orig/gitdiff-do 2005-04-16 13:19:07.0 -0700 +++ git-pasky-0.4/gitdiff-do2005-04-16 15:33:28.0 -0700 @@ -2,19 +2,22 @@ # # Make a diff between two GIT trees. # Copyright (c) Petr Baudis, 2005 +# Copyright (c) Paul Jackson, 2005 # # Takes two parameters identifying the two trees/commits to compare. # Empty string will be substitued to HEAD revision. # # Note that this is probably the most performance critical shell script -# in the whole GIT suite. That's also why I resorted to bash builtin -# features and stuff. -- [EMAIL PROTECTED] +# in the whole GIT suite. # # Outputs a diff converting the first tree to the second one. +set -f # keep shell from scanning . to expand wildcards -id1=$1; shift -id2=$1; shift +t=${TMPDIR:-/usr/tmp}/gitdiff.$$ +trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 + +id1=$1; id2=$2; shift 2 # Leaves the result in $label. mkbanner () { @@ -32,58 +35,55 @@ mkbanner () { [ $labelapp ] label=$label ($labelapp) } -t=${TMPDIR:-/usr/tmp}/gitdiff.$$ -trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 -diffdir=$t.1 -diffdir1=$diffdir/$id1 -diffdir2=$diffdir/$id2 -mkdir -p $diffdir1 $diffdir2 - -while [ $1 ]; do - declare -a param - param=($1); - op=${param[0]:0:1} - mode=${param[0]:1} - type=${param[1]} - sha=${param[2]} - name=${param[3]} - - echo -e Index: $name\n=== - - if [ $type = tree ]; then - # diff-tree will kindly diff the subdirs for us - # XXX: What about modes? - shift; continue - fi - - loc1=$diffdir1/$name; dir1=${loc1%/*} - loc2=$diffdir2/$name; dir2=${loc2%/*} - ([ -d $dir1 ] [ -d $dir2 ]) || mkdir -p $dir1 $dir2 - - case $op in - +) - mkbanner $loc2 $id2 $name $mode $sha - diff -L /dev/null (tree:$id1) -L $label -u /dev/null $loc2 - ;; - -) - mkbanner $loc1 $id1 $name $mode $sha - diff -L $label -L /dev/null (tree:$id2) -u $loc1 /dev/null - ;; - *) - modes=(${mode/-/ }); - mode1=${modes[0]}; mode2=${modes[1]} - shas=(${sha/-/ }); - sha1=${shas[0]}; sha2=${shas[1]} - mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label - mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label - diff -L $label1 -L $label2 -u $loc1 $loc2 - ;; - *) - echo Unknown operator $op, ignoring delta: $1;; - esac - - rm -f $loc1 $loc2 - shift +for arg +do + IFS='' + set X$arg# X: don't let shell set see leading '+' in $arg + op=$1 + mode=${op#X?}# trim leading X? 1st two chars + type=$2 + sha=$3 + # if 4+ tabs, trim 1st 3 fields on 1st line with sed + case $arg in + *\ *\ *\ *\ *) +name=$(echo $arg | + /bin/sed '1s/[^ ]* [^ ]* [^ ]* //') +;; + *) +name=$4 +;; + esac + + echo Index: $name + echo === + + test $type = tree continue + + loc1=$t.1 + loc2=$t.2 + + case $op in + X+*) +mkbanner $loc2 $id2 $name $mode $sha +diff -L /dev/null (tree:$id1) -L $label -u /dev/null $loc2 +;; + X-*) +mkbanner $loc1 $id1 $name $mode $sha +diff -L $label -L /dev/null (tree:$id2) -u $loc1 /dev/null +;; + X\**) +mode1=${mode%-*} # trim '-' and after +mode2=${mode#*-} # trim up to and including '-' +sha1=${sha%-*}# trim '-' and after +sha2=${sha#*-}# trim up to and including '-' + +mkbanner $loc1 $id1 $name $mode1 $sha1; label1=$label +mkbanner $loc2 $id2 $name $mode2 $sha2; label2=$label +diff -L $label1 -L $label2 -u $loc1 $loc2 +;; + *) +badop=$(echo $op | sed 's/.\(.\).*/\1/') +echo Unknown operator $badop, ignoring delta: $1 +;; + esac done - -rm -rf $diffdir -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the
[PATCH] missing mkdir -p flag in gitdiff-do
First mkdir in gitdiff-do missing -p, so useless error Signed-off-by: Paul Jackson [EMAIL PROTECTED] Index: git-pasky-0.4/gitdiff-do === --- git-pasky-0.4.orig/gitdiff-do 2005-04-16 13:18:29.0 -0700 +++ git-pasky-0.4/gitdiff-do2005-04-16 13:19:07.0 -0700 @@ -37,7 +37,7 @@ trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 diffdir=$t.1 diffdir1=$diffdir/$id1 diffdir2=$diffdir/$id2 -mkdir $diffdir1 $diffdir2 +mkdir -p $diffdir1 $diffdir2 while [ $1 ]; do declare -a param -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] show-diff -z option for machine readable output.
This patch adds the -z option to the show-diff command, primarily for use by scripts. The information emitted is similar to that of -q option, but in a more machine readable form. Records are terminated with NUL instead of LF, so that the scripts can deal with pathnames with embedded newlines. To be applied on top of my previous patch: [PATCH] Optionally tell show-diff to show only named files. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c | 28 +++- 1 files changed, 19 insertions(+), 9 deletions(-) show-diff.c: 0c5fb1a381a6c6689dca3f52d0c66bb591cadb39 --- show-diff.c +++ show-diff.c 2005-04-16 16:23:40.0 -0700 @@ -55,7 +55,7 @@ } } -static const char *show_diff_usage = show-diff [-s] [-q] [paths...]; +static const char *show_diff_usage = show-diff [-s] [-q] [-z] [paths...]; static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt) { @@ -76,6 +76,7 @@ { int silent = 0; int silent_on_nonexisting_files = 0; + int machine_readable = 0; int entries = read_cache(); int i; @@ -84,6 +85,9 @@ silent_on_nonexisting_files = silent = 1; else if (!strcmp(argv[1], -q)) silent_on_nonexisting_files = 1; + else if (!strcmp(argv[1], -z)) { + machine_readable = 1; + } else usage(show_diff_usage); argv++; argc--; @@ -99,7 +103,7 @@ for (i = 0; i entries; i++) { struct stat st; struct cache_entry *ce = active_cache[i]; - int n, changed; + int changed; unsigned long size; char type[20]; void *new; @@ -111,18 +115,24 @@ if (stat(ce-name, st) 0) { if (errno == ENOENT silent_on_nonexisting_files) continue; - printf(%s: %s\n, ce-name, strerror(errno)); - if (errno == ENOENT) - show_diff_empty(ce); + if (machine_readable) + printf(X %s%c, ce-name, 0); + else { + printf(%s: %s\n, ce-name, strerror(errno)); + if (errno == ENOENT) + show_diff_empty(ce); + } continue; } changed = cache_match_stat(ce, st); if (!changed) continue; - printf(%s: , ce-name); - for (n = 0; n 20; n++) - printf(%02x, ce-sha1[n]); - printf(\n); + if (!machine_readable) + printf(%s: %s\n, ce-name, sha1_to_hex(ce-sha1)); + else { + printf(%s %s%c, sha1_to_hex(ce-sha1), ce-name, 0); + continue; + } fflush(stdout); if (silent) continue; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Add clone support to lntree
Dear diary, on Sat, Apr 16, 2005 at 05:17:00AM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... On Sat, 16 Apr 2005, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 04:47:55AM CEST, I got a letter where Petr Baudis [EMAIL PROTECTED] told me that... git branch --- creates a branch from a given commit (when passed empty commit, creates a branch from the current commit and sets the working tree to that branch) Note that there is a bug in current git update - it will allow you to bring several of your trees to follow the same branch, or even a remote branch. This is not even supposed to work, and will be fixed when I get some sleep. You will be able to do git pull even on local branches, and the proper solution for this will be just tracking the branch you want to follow. I must admit that I'm not entirely decided yet, so I'd love to hear your opinion. I'm wondering, whether each tree should be fixed to a certain branch. That is, you decide a name when you do git fork, and then the tree always follows that branch. (It always has to follow [be bound to] *some* branch, and each branch can be followed by only a single tree at a time.) I don't think I'm following the use of branches. Currently, what I do is have a git-pasky and a git-linus, and fork off a working directory from one of these for each thing I want to work on. I do some work, commit as I make progress, and then do a diff against the remote head to get a patch to send off. If I want to do a series of patches which depend on each other, I fork my next directory off of my previous one rather than off of a remote base. I haven't done much rebasing, so I haven't worked out how I would do that most effectively. Yes. And that's exactly what the branches allow you to do. You just do git fork myhttpclient ~/myhttpclientdir then you do some hacking, and when you have something usable, you can go back to your main working directory and do git merge -b when_you_started myhttpclient Since you consider the code perfect, you can now just rm -rf ~/myhttpclient. Suddenly, you get a mail from mj pointing out some bugs, and it looks like there are more to come. What to do? git fork myhttpclient ~/myhttpclientdir (Ok, this does not work, but that's a bug, will fix tomorrow.) This will let you take off when you left in your work on the branch. git update for seeking between commits is probably extremely important for any kind of binary search when you are wondering when did this bug appeared first, or when you are exploring how certain branch evolved over time. Doing git fork for each successive iteration sounds horrible. Now, what about git branch and git update for switching between branches? I think this is the most controversial part; these are basically just shortcuts for not having to do git fork, and I wouldn't mind so much removing them, if you people really consider them too ugly a wart for the soft clean git skin. I admit that they both come from a hidden prejudice that git fork is going to be slow and eat a lot of disk. The idea for git branch is to mark a commit as this is a branch but I don't want to git fork (because I'm lazy or short on disk space or whatever). Let's say you are tracking a branch, do some local commits and then want to untrack. This will get you back to HEAD.local, but you want to keep a reference for your local commits, and possibly work on them more later - so you mark them as a branch. But thinking about it, I couldn't come up with another usage case than this, and I think that now that we have git fork, I will modify git track behaviour heavily so that tracking/untracking won't really switch you to the other branch completely, but really only tell git pull that you want the pulled updates applied. So git branch command will likely go. The idea for git update for switching between branches is that especially when you have two rather similar branches and mostly do stuff on one of them, but sometimes you want to do something on the other one, you can do just quick git update, do stuff, and git update back, without any forking. Note that this all is *absolutely* subject to change, provided you can convince me about some better way. ;-) My mindset on this is pretty open. This is just what seems to me as a pretty flexible and elegant to do stuff, while giving you enough freedom to pick your own style. I think I can make this space efficient by hardlinking unmodified blobs to a directory of cached expanded blobs. I don't know but I really feel *very* unsafe when doing that. What if something screws up and corrupts my base... way too easy. And it gets pretty inconvenient and even more dangerous when you get the idea to do some modifications on your tree by something else than your favorite editor (which you've already checked does the right
Re: [PATCH] fix mktemp (remove mktemp ;)
On Sat, 2005-04-16 16:27:43 -0700, Paul Jackson [EMAIL PROTECTED] wrote in message [EMAIL PROTECTED]: Index: git-pasky-0.4/gitcommit.sh === --- git-pasky-0.4.orig/gitcommit.sh 2005-04-12 10:39:14.0 -0700 +++ git-pasky-0.4/gitcommit.sh2005-04-16 13:17:49.0 -0700 @@ -60,7 +60,9 @@ for file in $commitfiles; do echo $file; done echo Enter commit message, terminated by ctrl-D on a separate line: -LOGMSG=`mktemp -t gitci.XX` +t=${TMPDIR:-/usr/tmp}/gitapply.$$ /usr/tmp/ ??? Hey, /usr may be mounted read-only! Why not just use /tmp ? MfG, JBG -- Jan-Benedict Glaw [EMAIL PROTECTED]. +49-172-7608481 _ O _ Eine Freie Meinung in einem Freien Kopf| Gegen Zensur | Gegen Krieg _ _ O fuer einen Freien Staat voll Freier Brger | im Internet! | im Irak! O O O ret = do_actions((curr | FREE_SPEECH) ~(NEW_COPYRIGHT_LAW | DRM | TCPA)); - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
Dear diary, on Sun, Apr 17, 2005 at 01:27:43AM CEST, I got a letter where Paul Jackson [EMAIL PROTECTED] told me that... Remove mktemp usage - it doesn't work on some Mandrakes, nor on my SuSE 8.2 with mktemp-1.5-531. Replace with simple use of $$ (pid). I've been using this same pattern for 20 years on many production scripts; it's fast, solid and simple. And racy. And not guaranteed to come up with fresh new files. More robust tmp file removal, using trap, so that scripts interrupted by signals HUP, INT, QUIT or PIPE will cleanup. But I like this! I'm deferring those changes to the introduction of a git shell library, which several people volunteered to do so far, but noone sent me any patches for (the last one was probably Martin Mares, only few hours ago though). -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
PJ == Paul Jackson [EMAIL PROTECTED] writes: PJ That matches my experience - store 1 bit of mode state - executable or not. Sounds like svn ;-). - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Add clone support to lntree
Dear diary, on Sun, Apr 17, 2005 at 01:07:35AM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... Actually, what about if git pull outside of repository did what git clone does now? I'd kinda like clone instead of fork too. This seems like the best solution to me, too. Although that would make pull take a URL when making a new repository and not otherwise, which might be confusing. init-remote perhaps, or maybe just have init do it if given a URL? Yes, init taking URL optionally sounds ideal. Thanks. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
Daniel Barkalow wrote: On Sat, 16 Apr 2005, Adam Kropelin wrote: How about building a file list and doing a batch download via 'wget -i /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses connectionss when successive URLs point to the same server. You need to look at some of the files before you know what other files to get. You could do it in waves, but that would be excessively complicated to code and not the most efficient anyway. Ah, yes. Makes sense. How about libcurl or another http client library, then? Minimizing dependencies on external libraries is good, but writing a really robust http client is a tricky business. (Not that you aren't up to it; I just wonder if it's the best way to spend your time.) --Adam - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
And racy. And not guaranteed to come up with fresh new files. In theory perhaps. In practice no. Even mktemp(1) can collide, in theory, since there is no practical way in shell scripts to hold open and locked the file from the instant of it is determined to be a unique name. The window of vulnerability for shell script tmp files is the lifetime of the script - while the file sits there unlocked. Anyone else with permissions can mess with it. More people will fail, and are already failing, using mktemp than I have ever seen using $$ (I've never seen a documented case, and since such files are not writable to other user accounts, such a collision would typically not go hidden.) Fast, simple portable solutions that work win over solutions with some theoretical advantage that don't matter in practice, but also that are less portable or less efficient. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
Junio wrote: Sounds like svn I have no idea what svn is. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Re: Add clone support to lntree
On Sun, 17 Apr 2005, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 05:17:00AM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... On Sat, 16 Apr 2005, Petr Baudis wrote: Dear diary, on Sat, Apr 16, 2005 at 04:47:55AM CEST, I got a letter where Petr Baudis [EMAIL PROTECTED] told me that... git branch --- creates a branch from a given commit (when passed empty commit, creates a branch from the current commit and sets the working tree to that branch) Note that there is a bug in current git update - it will allow you to bring several of your trees to follow the same branch, or even a remote branch. This is not even supposed to work, and will be fixed when I get some sleep. You will be able to do git pull even on local branches, and the proper solution for this will be just tracking the branch you want to follow. I must admit that I'm not entirely decided yet, so I'd love to hear your opinion. I'm wondering, whether each tree should be fixed to a certain branch. That is, you decide a name when you do git fork, and then the tree always follows that branch. (It always has to follow [be bound to] *some* branch, and each branch can be followed by only a single tree at a time.) I don't think I'm following the use of branches. Currently, what I do is have a git-pasky and a git-linus, and fork off a working directory from one of these for each thing I want to work on. I do some work, commit as I make progress, and then do a diff against the remote head to get a patch to send off. If I want to do a series of patches which depend on each other, I fork my next directory off of my previous one rather than off of a remote base. I haven't done much rebasing, so I haven't worked out how I would do that most effectively. Yes. And that's exactly what the branches allow you to do. You just do git fork myhttpclient ~/myhttpclientdir then you do some hacking, and when you have something usable, you can go back to your main working directory and do git merge -b when_you_started myhttpclient Since you consider the code perfect, you can now just rm -rf ~/myhttpclient. Suddenly, you get a mail from mj pointing out some bugs, and it looks like there are more to come. What to do? git fork myhttpclient ~/myhttpclientdir (Ok, this does not work, but that's a bug, will fix tomorrow.) This will let you take off when you left in your work on the branch. Ah, I think that's what made me think I wasn't understanding branches; the first thing I tried hit this big. git update for seeking between commits is probably extremely important for any kind of binary search when you are wondering when did this bug appeared first, or when you are exploring how certain branch evolved over time. Doing git fork for each successive iteration sounds horrible. Even if there isn't a performance hit, it's semantically wrong, because you're looking at different versions that were in the same place at different times. Now, what about git branch and git update for switching between branches? I think this is the most controversial part; these are basically just shortcuts for not having to do git fork, and I wouldn't mind so much removing them, if you people really consider them too ugly a wart for the soft clean git skin. I admit that they both come from a hidden prejudice that git fork is going to be slow and eat a lot of disk. I think that this just confuses matters. The idea for git update for switching between branches is that especially when you have two rather similar branches and mostly do stuff on one of them, but sometimes you want to do something on the other one, you can do just quick git update, do stuff, and git update back, without any forking. I still think that fork should be quick enough, or you could leave the extra tree around. I'm not against having such a command, but I think it should be a separate command rather than a different use of update, since it would be used by poeople working in different ways. I think I can make this space efficient by hardlinking unmodified blobs to a directory of cached expanded blobs. I don't know but I really feel *very* unsafe when doing that. What if something screws up and corrupts my base... way too easy. And it gets pretty inconvenient and even more dangerous when you get the idea to do some modifications on your tree by something else than your favorite editor (which you've already checked does the right thing). It should only be an option, not required and maybe not even default. I think it should be possible to prevent stuff from screwing up, since we really don't want anything to ever modify those inodes (as opposed to some cases, where you want to modify inodes only in certain ways). For that matter, relatively
Re: optimize gitdiff-do script
Petr wrote: Please don't reindent the scripts. It violates the current coding style and the patch is unreviewable. Sorry - I had not realized that there was a style in this case. I am all in favor of such coding styles, and will gladly fit this one. Do you want the patch resent, or a patch to restore indent on top of this one? the patch is unreviewable. The section that I indented the wrong way was such a total rewrite, that you aren't going to be able to review it line by line compared to the old anyway. So in this case, it wasn't that I was modifying and reindenting, rather that I was rewriting a page of code from scratch. But that's a nit. Honoring the coding style is necessary in any case. The idea behind that was that diffing could take a significant portion of disk space, Here I don't understand, or don't agree, not sure which. This won't eat more disk space, because the same tmp files are reused, over and over. Instead of unlinking them just before reopening them truncating (O_WRONLY|O_CREAT|O_TRUNC), I just reopen them truncating. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Use libcurl to use HTTP to get repositories
This enables the use of HTTP to download commits and associated objects from remote repositories. It now uses libcurl instead of local hack code. Still causes warnings for fsck-cache and rev-tree, due to unshared code. Still leaks a bit of memory due to bug copied from read-tree. Needs libcurl post 7.7 or so. Signed-Off-By: Daniel Barkalow [EMAIL PROTECTED] Index: Makefile === --- ed4f6e454b40650b904ab72048b2f93a068dccc3/Makefile (mode:100644 sha1:b39b4ea37586693dd707d1d0750a9b580350ec50) +++ d332a8ddffb50c1247491181af458970bf639942/Makefile (mode:100644 sha1:ca5dfd41b750cb1339128e4431afbbbc21bf57bb) @@ -14,7 +14,7 @@ PROG= update-cache show-diff init-db write-tree read-tree commit-tree \ cat-file fsck-cache checkout-cache diff-tree rev-tree show-files \ - check-files ls-tree merge-tree + check-files ls-tree merge-tree http-get all: $(PROG) @@ -23,6 +23,11 @@ LIBS= -lssl -lz +http-get: LIBS += -lcurl + +http-get:%:%.o read-cache.o + $(CC) $(CFLAGS) -o $@ $^ $(LIBS) + init-db: init-db.o update-cache: update-cache.o read-cache.o Index: http-get.c === --- /dev/null (tree:ed4f6e454b40650b904ab72048b2f93a068dccc3) +++ d332a8ddffb50c1247491181af458970bf639942/http-get.c (mode:100644 sha1:106ca31239e6afe6784e7c592234406f5c149e44) @@ -0,0 +1,126 @@ +#include fcntl.h +#include unistd.h +#include string.h +#include stdlib.h +#include cache.h +#include revision.h +#include errno.h +#include stdio.h + +#include curl/curl.h +#include curl/easy.h + +static CURL *curl; + +static char *base; + +static int fetch(unsigned char *sha1) +{ + char *hex = sha1_to_hex(sha1); + char *filename = sha1_file_name(sha1); + + char *url; + char *posn; + FILE *local; + struct stat st; + + if (!stat(filename, st)) { + return 0; + } + + local = fopen(filename, w); + + if (!local) { + fprintf(stderr, Couldn't open %s\n, filename); + return -1; + } + + curl_easy_setopt(curl, CURLOPT_FILE, local); + + url = malloc(strlen(base) + 50); + strcpy(url, base); + posn = url + strlen(base); + strcpy(posn, objects/); + posn += 8; + memcpy(posn, hex, 2); + posn += 2; + *(posn++) = '/'; + strcpy(posn, hex + 2); + + curl_easy_setopt(curl, CURLOPT_URL, url); + + curl_easy_perform(curl); + + fclose(local); + + return 0; +} + +static int process_tree(unsigned char *sha1) +{ + void *buffer; +unsigned long size; +char type[20]; + +buffer = read_sha1_file(sha1, type, size); + if (!buffer) + return -1; + if (strcmp(type, tree)) + return -1; + while (size) { + int len = strlen(buffer) + 1; + unsigned char *sha1 = buffer + len; + unsigned int mode; + int retval; + + if (size len + 20 || sscanf(buffer, %o, mode) != 1) + return -1; + + buffer = sha1 + 20; + size -= len + 20; + + retval = fetch(sha1); + if (retval) + return -1; + + if (S_ISDIR(mode)) { + retval = process_tree(sha1); + if (retval) + return -1; + } + } + return 0; +} + +static int process_commit(unsigned char *sha1) +{ + struct revision *rev = lookup_rev(sha1); + if (parse_commit_object(rev)) + return -1; + + fetch(rev-tree); + process_tree(rev-tree); + return 0; +} + +int main(int argc, char **argv) +{ + char *commit_id = argv[1]; + char *url = argv[2]; + + unsigned char sha1[20]; + + get_sha1_hex(commit_id, sha1); + + curl_global_init(CURL_GLOBAL_ALL); + + curl = curl_easy_init(); + + base = url; + + fetch(sha1); + process_commit(sha1); + + curl_global_cleanup(); + return 0; +} Index: revision.h === --- ed4f6e454b40650b904ab72048b2f93a068dccc3/revision.h (mode:100664 sha1:28d0de3261a61f68e4e0948a25a416a515cd2e83) +++ d332a8ddffb50c1247491181af458970bf639942/revision.h (mode:100664 sha1:523bde6e14e18bb0ecbded8f83ad4df93fc467ab) @@ -24,6 +24,7 @@ unsigned int flags; unsigned char sha1[20]; unsigned long date; + unsigned char tree[20]; struct parent *parent; }; @@ -111,4 +112,29 @@ } } +static int parse_commit_object(struct revision *rev) +{ + if (!(rev-flags SEEN)) { + void *buffer, *bufptr; + unsigned long size; + char type[20]; + unsigned char parent[20]; + +
Re: [PATCH] Use libcurl to use HTTP to get repositories
Needs libcurl post 7.7 or so. That could be mentioned in the README, which has a list of 'Software requirements.' Actually, zlib-devel and openssl should be on this list as well. My laziness got in the way of my sending in a patch for that. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
On Sat, Apr 16, 2005 at 05:02:21PM -0700, Paul Jackson wrote: And racy. And not guaranteed to come up with fresh new files. In theory perhaps. In practice no. Even mktemp(1) can collide, in theory, since there is no practical way in shell scripts to hold open and locked the file from the instant of it is determined to be a unique name. Using the pid as a 'random' number is a bad idea. all an attacker has to do is create 65535 symlinks in /usr/tmp, and he can now overwrite any file you own. mktemp is being used here to provide randomness in the filename, not just a uniqueness. The window of vulnerability for shell script tmp files is the lifetime of the script - while the file sits there unlocked. Anyone else with permissions can mess with it. Attacker doesnt need to touch the script. Just take advantage of flaws in it, and wait for someone to run it. More people will fail, and are already failing, using mktemp than I have ever seen using $$ (I've never seen a documented case, and since such files are not writable to other user accounts, such a collision would typically not go hidden.) Fast, simple portable solutions that work win over solutions with some theoretical advantage that don't matter in practice, but also that are less portable or less efficient. I'd suggest fixing your distributions mktemp over going with an inferior solution. Dave - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
On Sat, Apr 16, 2005 at 08:33:25PM -0400, Dave Jones wrote: On Sat, Apr 16, 2005 at 05:02:21PM -0700, Paul Jackson wrote: And racy. And not guaranteed to come up with fresh new files. In theory perhaps. In practice no. Even mktemp(1) can collide, in theory, since there is no practical way in shell scripts to hold open and locked the file from the instant of it is determined to be a unique name. Using the pid as a 'random' number is a bad idea. all an attacker has to do is create 65535 symlinks in /usr/tmp, and he can now overwrite any file you own. mktemp is being used here to provide randomness in the filename, not just a uniqueness. How about putting using .git/tmp.$$ or similar as tempfile? This should satisfy both the portability and security requirements, since the warnings against using $$ only apply to public directories. Regards, Erik - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] show-diff shell safety
Junio wrote: The command line for running diff command is built without taking shell metacharacters into account. Ack - you're right. One should avoid popen and system in all but personal hacking code. There are many ways, beyond just embedded shell redirection, to cause problems with these calls. One should directly code execve(), execv(), or execl(). Search for popen system IFS PATH -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
Does it really make sense to store full permissions in the trees? I think that remembering the x-bit should be good enough for almost all purposes and the other permissions should be left to the local environment. It makes some sense in principle, but without storing what they mean (i.e., group==?) it certainly makes no sense. It's a bit like unpacking a tar file. I suspect a non-readable file would cause a bit of a problem in the low-level commands. Morten - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Rename confusing variable in show-diff
The show-diff command uses a variable new but it is always used to point at the original data recorded in the dircache before the user started editing in the working file. Rename it to old to avoid confusion. To be applied on top of my previous patches: [PATCH] Optionally tell show-diff to show only named files. [PATCH] show-diff -z option for machine readable output. [PATCH] show-diff shell safety. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c |6 +++--- 1 files changed, 3 insertions(+), 3 deletions(-) show-diff.c: e52eee21c2f682bef2dba06445699cca8e34c63a --- show-diff.c +++ show-diff.c 2005-04-16 18:05:55.0 -0700 @@ -162,7 +162,7 @@ int changed; unsigned long size; char type[20]; - void *new; + void *old; if (1 argc ! matches_pathspec(ce, argv+1, argc-1)) @@ -193,8 +193,8 @@ if (silent) continue; - new = read_sha1_file(ce-sha1, type, size); - show_differences(ce-name, new, size); + old = read_sha1_file(ce-sha1, type, size); + show_differences(ce-name, old, size); free(new); } return 0; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
Dave wrote: http://www.linuxsecurity.com/content/view/115462/151/ Nice - thanks. Pasky - would you be interested in a patch that used a more robust tmp file creation, along the lines of replacing t=${TMPDIR:-/usr/tmp}/gitdiff.$$ trap 'set +f; rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 with: tmp=${TMPDIR-/tmp} tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$ (umask 077 mkdir $tmp) || { echo Could not create temporary directory! Exiting. 12 exit 1 } t=$tmp/tmp trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15 If interested, would you want it instead of my previous mktemp removal patch, or on top of it? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
Erik wrote: How about putting using .git/tmp.$$ or similar as tempfile? One could, but best to normally honor the users TMPDIR setting. Could one 'git diff' a readonly git repository? Perhaps someone has a reason for putting their tmp files where they choose - say a local file system in a heavy NFS environment. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] (take 2) Rename confusing variable in show-diff
Oops, sorry I screwed up and sent a wrong patch. Please discard the previous one. The show-diff command uses a variable new but it is always used to point at the original data recorded in the dircache before the user started editing in the working file. Rename it to old to avoid confusion. To be applied on top of my previous patches: [PATCH] Optionally tell show-diff to show only named files. [PATCH] show-diff -z option for machine readable output. [PATCH] show-diff shell safety. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c |8 1 files changed, 4 insertions(+), 4 deletions(-) show-diff.c: e52eee21c2f682bef2dba06445699cca8e34c63a --- show-diff.c +++ show-diff.c 2005-04-16 18:23:57.0 -0700 @@ -162,7 +162,7 @@ int changed; unsigned long size; char type[20]; - void *new; + void *old; if (1 argc ! matches_pathspec(ce, argv+1, argc-1)) @@ -193,9 +193,9 @@ if (silent) continue; - new = read_sha1_file(ce-sha1, type, size); - show_differences(ce-name, new, size); - free(new); + old = read_sha1_file(ce-sha1, type, size); + show_differences(ce-name, old, size); + free(old); } return 0; } - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
Morten wrote: It makes some sense in principle, but without storing what they mean (i.e., group==?) it certainly makes no sense. There's no they there. I think Martin's proposal, to which I agreed, was to store a _single_ bit. If any of the execute permissions of the incoming file are set, then the bit is stored ON, else it is stored OFF. On 'checkout', if the bit is ON, then the file permission is set mode 0777 (modulo umask), else it is set mode 0666 (modulo umask). You might disagree that this is a good idea, but it certainly does 'make sense' (as in 'is sensibly well defined'). I suspect a non-readable file would cause a bit of a problem in the low-level commands. Probably so. If someone sets their umask 0333 or less, then they are either fools or QA (software quality assurance, or test) engineers. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] show-diff style fix.
This fixes some stylistic problems introduced by my previous set of patches. I'll be sending my last patch to show-diff next, which depends on this cleanup. To be applied on top of my previous patches: [PATCH] Optionally tell show-diff to show only named files. [PATCH] show-diff -z option for machine readable output. [PATCH] show-diff shell safety. [PATCH] (take 2) Rename confusing variable in show-diff. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c |7 +++ 1 files changed, 3 insertions(+), 4 deletions(-) --- ./show-diff.c 2005-04-16 18:59:09.0 -0700 +++ ./show-diff.c 2005-04-16 19:01:28.0 -0700 @@ -111,7 +111,7 @@ } } -static const char *show_diff_usage = show-diff [-s] [-q] [-z] [paths...]; +static const char *show_diff_usage = show-diff [-q] [-s] [-z] [paths...]; static int matches_pathspec(struct cache_entry *ce, char **spec, int cnt) { @@ -141,9 +141,8 @@ silent_on_nonexisting_files = silent = 1; else if (!strcmp(argv[1], -q)) silent_on_nonexisting_files = 1; - else if (!strcmp(argv[1], -z)) { + else if (!strcmp(argv[1], -z)) machine_readable = 1; - } else usage(show_diff_usage); argv++; argc--; @@ -164,7 +163,7 @@ char type[20]; void *old; - if (1 argc + if (1 argc ! matches_pathspec(ce, argv+1, argc-1)) continue; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
No, you have to: (a) create a unique, pid specific file name /var/tmp/myapp.$$.xyzzy (b) create it in O_EXCL mode, so you wont smash another's held lock (b-1) It worked, OK (b-2) open failed, try ...xyzzz repeat until (b-1) There are thousands of examples of how to do this with bash. Paul Jackson wrote: Dave wrote: mktemp is being used here to provide randomness in the filename, not just a uniqueness. Ok - useful point. How about: t=${TMPDIR:-/usr/tmp}/gitdiff.$$.$RANDOM all an attacker has to do is create 65535 symlinks in /usr/tmp the point of the xyzzy seed is to make creating all possible files in-feasable. And how about if I removed the tmp files at the top: t=${TMPDIR:-/usr/tmp}/gitdiff.$$.$RANDOM trap 'rm -fr $t.?; trap 0; exit 0' 0 1 2 3 15 rm -fr $t.? ... rest of script ... How close does that come to providing the same level of safety, while remaining portable over a wider range of systems, and not requiring that a separate command be forked? I'd suggest fixing your distributions ... It's not just my distro; it's the distros of all git users. If apps can avoid depending on inessential details of their environment, that's friendlier to all concerned. And actually my distro is fine - it's just that I am running an old version of it on one of my systems. Newer versions of the mktemp -t option. -- mit freundlichen Grüßen, Brian. Dr. Brian O'Mahoney Mobile +41 (0)79 334 8035 Email: [EMAIL PROTECTED] Bleicherstrasse 25, CH-8953 Dietikon, Switzerland PGP Key fingerprint = 33 41 A2 DE 35 7C CE 5D F5 14 39 C9 6D 38 56 D5 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: fix mktemp (remove mktemp ;)
No, you have to: How does this compare with the one I posted about 1 hour 30 minuts ago: tmp=${TMPDIR-/tmp} tmp=$tmp/gitdiff-do.$RANDOM.$RANDOM.$RANDOM.$$ (umask 077 mkdir $tmp) || { echo Could not create temporary directory! Exiting. 12 exit 1 } t=$tmp/tmp trap 'rm -fr $tmp; trap 0; exit 0' 0 1 2 3 15 derived from the reference that Dave Jones provided? create it in O_EXCL mode, What can one do that and hold that O_EXCL from within bash? There are thousands of examples of how to do this with bash. Care to provide one? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] libgit
commit b0550573055abcf8ad19dcb8a036c32dd00a3be4 tree b77882b170769c07732381b9f19ff2dd5c9f1520 parent 866b4aea9313513612f2b0d66814a2f526d17f21 author Mike Taht [EMAIL PROTECTED] 1113704772 -0700 committer Mike Taht [EMAIL PROTECTED] 1113704772 -0700 looks my 1878 line patch to convert git to libgit got eaten by vger.. I put it up at http://pbx.picketwyre.com/~mtaht/libgit.patch if anyone wants to comment. from my log: Converted git to libgit. Moved all the main() calls into a single multi-call binary - git-main. Made extern a bunch of functions that were static. Verified it at least still minimally worked. Note: this is only a first step towards creating a generic library. Figuring out what functions and variables *truly* need to be exported, renaming them to a git_function api, making it thread safe ... and not least of all, keeping up with everybody working out of the base tree... are problems that remain. Also - cleaning up the UI. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Get commits from remote repositories by HTTP
How about building a file list and doing a batch download via 'wget -i /tmp/foo'? A quick test (on my ancient wget-1.7) indicates that it reuses connectionss when successive URLs point to the same server. Here's a script that does just that. So there is a burst of individual wget commands to get HEAD, the top commit object, and all the tree objects. The just one to get all the missing blobs. Subsequent runs will do far less work as many of the tree objects will not have changed, so we don't descend into any tree that we already have. -Tony Not a patch ... it is a whole file. I called it git-wget, but it might also want to be called git-pulltop. Signed-off-by: Tony Luck [EMAIL PROTECTED] -- script starts here - #!/bin/sh # Copyright (C) 2005 Tony Luck REMOTE=http://www.kernel.org/pub/linux/kernel/people/torvalds/linux-2.6.git/ rm -rf .gittmp # set up a temp git repository so that we can use cat-file and ls-tree on the # objects we pull without installing them into our tree. This allows us to # restart if the download is interrupted mkdir .gittmp cd .gittmp init-db wget -q $REMOTE/HEAD if cmp -s ../.git/HEAD HEAD then echo Already have HEAD = `cat ../.git/HEAD` cd .. rm -rf .gittmp exit 0 fi sha1=`cat HEAD` sha1file=${sha1:0:2}/${sha1:2} if [ -f ../.git/objects/$sha1file ] then echo Already have most recent commit. Update HEAD to $sha1 cd .. rm -rf .gittmp exit 0 fi wget -q $REMOTE/objects/$sha1file -O .git/objects/$sha1file treesha1=`cat-file commit $sha1 | (read tag tree ; echo $tree)` get_tree() { treesha1file=${1:0:2}/${1:2} if [ -f ../.git/objects/$treesha1file ] then return fi wget -q $REMOTE/objects/$treesha1file -O .git/objects/$treesha1file ls-tree $1 | while read mode tag sha1 name do subsha1file=${sha1:0:2}/${sha1:2} if [ -f ../.git/objects/$subsha1file ] then continue fi if [ $mode = 4 ] then get_tree $sha1 `expr $2 + 1` else echo objects/$subsha1file needbloblist fi done } # get all the tree objects to our .gittmp area, and create list of needed blobs get_tree $treesha1 # now get the blobs cd ../.git if [ -s ../.gittmp/needbloblist ] then wget -q -r -nH --cut-dirs=6 --base=$REMOTE -i ../.gittmp/needbloblist fi # Now we have the blobs, move the trees and commit from .gitttmp cd ../.gittmp/.git/objects find ?? -type f -print | while read f do mv $f ../../../.git/objects/$f done # update HEAD cd ../.. mv HEAD ../.git cd .. rm -rf .gittmp -- script ends here - - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Brian == Brian O'Mahoney [EMAIL PROTECTED] writes: Brian (1) I _have_ seen real-life collisions with MD5, in the context Brian of Document management systems containing ~10^6 ms-WORD Brian documents. Was this whole-document based, or was it blocked or otherwise chunked? I'm wondering, because (SFAIK) the MS word on-disk format is some serialized version of one or more containers, possibly nested. If you're blocks are sized so that the first block is the same across multiple files, this could cause collisions -- but they're the good kind, that allow us to save disk space, so they're not a problem. Are you saying that, within 1e7 documents, that you found two documents with the same MD5 hash yet different contents? That's not an accusation, btw; I'm just trying to get clarity on the terminology. I'm fascinated by the idea of using this sort of content-addressable filesystem, but the chance of any collision at all wigs me out. I look at the probabilities, but still. Thanks, t. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-cache --refresh cache entry leak
On Sat, 16 Apr 2005, Junio C Hamano wrote: When update-cache --refresh replaces an existing cache entry with a new one, it forgets to free the original. I've seen this patch now three times, and it's been wrong every single time. Maybe we should add a comment? That active-cache entry you free()'d was not necessarily allocated with malloc(). Most cache-entries are just mmap'ed directly from the index file. Leaking is ok. We cannot leak too much. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libgit
On Sat, 16 Apr 2005 20:12:56 -0700 Mike Taht wrote: | commit b0550573055abcf8ad19dcb8a036c32dd00a3be4 | tree b77882b170769c07732381b9f19ff2dd5c9f1520 | parent 866b4aea9313513612f2b0d66814a2f526d17f21 | author Mike Taht [EMAIL PROTECTED] 1113704772 -0700 | committer Mike Taht [EMAIL PROTECTED] 1113704772 -0700 | | looks my 1878 line patch to convert git to libgit got eaten by vger.. | I put it up at http://pbx.picketwyre.com/~mtaht/libgit.patch if anyone | wants to comment. from my log: Connection refused. --- ~Randy - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Yet another base64 patch
Paul Jackson wrote: Earlier, hpa wrote: The base64 version has 2^12 subdirectories instead of 2^8 (I just used 2 characters as the hash key just like the hex version.) Later, hpa wrote: Ultimately the question is: do we care about old (broken) filesystems? I'd imagine we care a little - just not alot. Some people (e.g., me) would really like for git to be more forgiving of nasty filesystems, so that git can be used very widely. I.E., be forgiving about case insensitivity, poor performance or problems with a large # of files in a directory, etc. You're already working to make sure git handles filenames with spaces i18n filenames, a common failing of many other SCM systems. If git is used for Linux kernel development nothing else, it's still a success. But it'd be even better from my point of view if git was a useful tool for MANY other projects. I think there are advantages, even if you only plan to use git for the kernel, to making git easier to use for other projects. By making git less sensitive to the filesystem, you'll attract more (non-kernel-dev) users, some of whom will become new git developers who add cool new functionality. As noted in my SCM survey (http://www.dwheeler.com/essays/scm.html), I think SCM Windows support is really important to a lot of OSS projects. Many OSS projects, even if they start Unix/Linux only, spin off a Windows port, and it's painful if their SCM can't run on Windows then. Problems running on NFS filesystems have caused problems with GNU Arch users (there are workarounds, but now you need to learn about workarounds instead of things just working). If nothing else, look at the history of other SCM projects: all too many have undergone radical and painful surgeries so that they can be more portable to various filesystems. It's a trade-off, I know. --- David A. Wheeler - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Yet another base64 patch
David wrote: It's a trade-off, I know. So where do you recommend we make that trade-off? -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] libgit
Fixed. Randy.Dunlap wrote: On Sat, 16 Apr 2005 20:12:56 -0700 Mike Taht wrote: | commit b0550573055abcf8ad19dcb8a036c32dd00a3be4 | tree b77882b170769c07732381b9f19ff2dd5c9f1520 | parent 866b4aea9313513612f2b0d66814a2f526d17f21 | author Mike Taht [EMAIL PROTECTED] 1113704772 -0700 | committer Mike Taht [EMAIL PROTECTED] 1113704772 -0700 | | looks my 1878 line patch to convert git to libgit got eaten by vger.. | I put it up at http://pbx.picketwyre.com/~mtaht/libgit.patch if anyone | wants to comment. from my log: Connection refused. --- ~Randy -- Mike Taht FLASH! Intelligence of mankind decreasing. Details at ... uh, when the little hand is on the - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
but the chance of any collision at all wigs me out. Guess you're just going to get wigged out then. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Yet another base64 patch
On Thu, 14 Apr 2005, H. Peter Anvin wrote: Linus Torvalds wrote: Even something as simple as ls -l has been known to have O(n**2) behaviour for big directories. For filesystems with linear directories, sure. For sane filesystems, it should have O(n log n). note that default configs of ext2 and ext3 don't qualify as sane filesystems by this definition. ext3 does have an extention that you can enable to have it hash the directory access, but even if you enable that on a filesystem you aren't guaranteed that it will be active (if the directory existed before it was turned on, or has been accessed by a kernel that didn't understand the extention then the htree functionality won't be used until you manually tell the system to generate the tree) David Lang -- There are two ways of constructing a software design. One way is to make it so simple that there are obviously no deficiencies. And the other way is to make it so complicated that there are no obvious deficiencies. -- C.A.R. Hoare - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use libcurl to use HTTP to get repositories
On Sat, 16 Apr 2005, Paul Jackson wrote: Daniel wrote: I'm working off of Linus's tree when not working on scripts, and it doesn't have that section at all. Ah so - nevermind my README comments then. Well, actually, I suspect that something like this should go to Pasky. I really see my repo as purely a internal git datastructures, and when it gets to how do we interact with other peoples web-sites, I suspect Pasky's tree is better. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
Paul Jackson wrote: what I'm talking about is the chance that somewhere, sometime there will be two different documents that end up with the same hash I have vastly greater chance of a file colliding due to hardware or software glitch than a random message digest collision of two legitimate documents. The probability of an accidental overlap for SHA-1 for two different files is absurdly remote; it's just not worth worrying about. However, the possibility of an INTENTIONAL overlap is a completely different matter. I think the hash algorithm should change in the future; I have a proposal below. Someone has ALREADY broken into a server to modify the Linux kernel code already, so the idea of an attack on kernel code is not an idle fantasy. MD5 is dead, and SHA-1's work factor has already been sufficiently broken that people have already been told walk to the exits (i.e., DO NOT USE SHA-1 for new programs like git). The fact that blobs are compressed first, with a length header in front, _may_ make it harder to attack. But maybe not. I haven't checked for this case, but most decompression algorithms I know of have a don't change mode that essentially just copies the data behind it. If the one used in git has such a mode (I bet it does!), an attacker could use that mode to make it MUCH easier to create an attack vector than it would appear at first. Now the attacker just needs to create a collision (hmmm, where was that paper?). Remember, you don't need to run a hash algorithm over an entire file; you can precompute to near the end, and then try your iterations from there. A little hardware (inc. FPGAs) would speed the attack. Of course, that assumes you actually check everything to make sure that an attacker can't slip in something different. After each rsync, are all new files' hash values checked? Do they uncompress to right length? Do they have excess data after the decompression? I'm hoping that sort of input-checking (since the data might be from an attacker, if indirectly!) is already going on, though I haven't reviewed the git source code. While the jury's still out, the current belief by most folks I talk to is that SHA-1 variants with more bits, such as SHA-256, are the way to go now. The SHA-1 attack simply reduces the work factor (it's not a COMPLETE break), so adding more bits is believed to increase the work factor enough to counter it. Adding more information to the hash can make attacking even harder. Here's one idea: whenever that hash algorithm switch occurs, create a new hash value as this: SHA-256 + uncompressed-length Where SHA-256 is computed just like SHA-1 is now, e.g., SHA-256(file) where file = typecode + length + compressed data. Leave the internal format as-is (with the length embedded as well). This means that an attacker has to come up with an attack that creates the same length uncompressed, yet has the same hash of the compressed result. That's harder to do. Length is also really, really cheap to compute :-). That also might help the convince the what happens if there's an accidental collision crowd: now, if the file lengths are different, you're GUARANTEED that the hash values are different, though that's not the best reason to do that. One reason to think about switching sooner rather than later is that it'd be really nice if the object store also included signatures, so that in one fell swoop you could check who signed what (and thus you could later on CONFIRM with much more certainty who REALLY submitted a given change... say if it was clearly malicious). If you switch hash algorithms, the signatures might not work, depending on how you do it. --- David A. Wheeler - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] update-cache --refresh cache entry leak
LT == Linus Torvalds [EMAIL PROTECTED] writes: LT I've seen this patch now three times, and it's been wrong every single LT time. Maybe we should add a comment? I found out the previous two just after I sent it out. Sorry about that. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
On Sat, 16 Apr 2005, Paul Jackson wrote: Morten wrote: It makes some sense in principle, but without storing what they mean (i.e., group==?) it certainly makes no sense. There's no they there. I think Martin's proposal, to which I agreed, was to store a _single_ bit. If any of the execute permissions of the incoming file are set, then the bit is stored ON, else it is stored OFF. On 'checkout', if the bit is ON, then the file permission is set mode 0777 (modulo umask), else it is set mode 0666 (modulo umask). I think I agree. Anybody willing to send me a patch? One issue is that if done the obvious way it's an incompatible change, and old tree objects won't be valid any more. It might be ok to just change the compare cache check to only care about a few bits, though: S_IXUSR and S_IFDIR. And then always write new tree objects out with mode set to one of - 04: we already do this for directories - 100644: normal files without S_IXUSR set - 100755: normal files _with_ S_IXUSR set Then, at compare time, we only look at S_IXUSR matching for files (we never compare directory modes anyway). And at file create time, we create them with 0666 and 0777 respectively, and let the users umask sort it out (and if the user has 0100 set in his umask, he can damn well blame himself). This would pretty much match the existing kernel tree, for example. We'd end up with some new trees there (and in git), but not a lot of incompatibility. And old trees would still work fine, they'd just get written out differently. Anybody want to send a patch to do this? Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] Use libcurl to use HTTP to get repositories
* Daniel Barkalow [EMAIL PROTECTED] wrote: Still leaks a bit of memory due to bug copied from read-tree. Linus, should i resend the 18 fixes i sent the other day? (as a GIT repository perhaps?) I found roughly 6 common memory leaks, 8 theoretical memory leaks, 2 overflows and did a couple of cleanups. One of the patches [the cache collision related thing] we agreed was not needed, the rest is still very much valid i think. I did some basic testing with the fixes applied, nothing seemed to break in any visible way in these tests. Ingo - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
Paul Jackson wrote: Junio wrote: Sounds like svn I have no idea what svn is. svn = common abbreviation for Subversion, a widely-used centralized SCM tool intentionally similar to CVS. --- David A. Wheeler - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: SHA1 hash safety
I have nothing further to contribute to this subtopic. Good luck with it. -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issues with higher-order stages in dircache
Linus, earlier I wrote [*R1*]: - An explicit update-cache [--add] [--remove] path should be taken as a signal from the user (or Cogito) to tell the dircache layer the merge is done and here is the result. So just delete higher-order stages for the path and record the specified path at stage 0 (or remove it altogether). and I think this commit of yours implements the adding half. commit be7b1f05cea8e5213ffef8f74ebdefed2aacb6fc:1 author Linus Torvalds [EMAIL PROTECTED] 1113678345 -0700 committer Linus Torvalds [EMAIL PROTECTED] 1113678345 -0700 When inserting a index entry of stage 0, remove all old unmerged entries. I am wondering if you have a particular reason not to do the same for the removing half. Without it, currently I do not see a way for the user or Cogito to tell dircache layer that the merge should result in removal. That is, other than first adding a phony entry there (which brings the entry down to stage 0) and then immediately doing a regular update-cache --remove. That is two instead of one reading of 1.6MB index file for the kernel case. Also do you have any comments on this one from the same message? * read-tree - When merging two trees, i.e. read-tree -m A B, shouldn't we collapse identical stage-1/2 into stage-0? [References] *R1* http://marc.theaimsgroup.com/?l=gitm=111366023126466w=2 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Issues with higher-order stages in dircache
On Sat, 16 Apr 2005, Junio C Hamano wrote: I am wondering if you have a particular reason not to do the same for the removing half. No. Except for me being silly. Please just make it so. Also do you have any comments on this one from the same message? * read-tree - When merging two trees, i.e. read-tree -m A B, shouldn't we collapse identical stage-1/2 into stage-0? How do you actually intend to merge two trees? That sounds like a total special case, and better done with diff-tree. But regardless, since I assume the result is the later tree, why do a read-tree -m A B, since what you really want is read-tree B? The real merge always needs the base tree, and I'd hate to complicate the real merge with some special-case that isn't relevant for that real case. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] checkout-cache -a should not extract unmerged stages
When checkout-cache -a is run, currently it attempts to extract each existing unmerged stage to the same destination and complains to what it itself has done. This is nonsensical. Presumably, the user is running checkout-cache -a in order to verify the result of the part that has cleanly been merged. So check out only stage 0 entries and give warnings for paths that are unmerged. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- checkout-cache.c | 11 +++ 1 files changed, 11 insertions(+) checkout-cache.c: 431b7032576f40d93a08be801e26f76c168ed57b --- checkout-cache.c +++ checkout-cache.c2005-04-16 22:23:15.0 -0700 @@ -121,10 +121,21 @@ static int checkout_all(void) { + struct cache_entry *unmerge_skipping = NULL; int i; for (i = 0; i active_nr ; i++) { struct cache_entry *ce = active_cache[i]; + if (ce_stage(ce)) { + if (!unmerge_skipping || + strcmp(unmerge_skipping-name, ce-name)) + fprintf(stderr, + checkout-cache: needs merge %s\n, + ce-name); + unmerge_skipping = ce; + continue; + } + unmerge_skipping = NULL; if (checkout_entry(ce) 0) return -1; } - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
Linus wrote: It might be ok to just change the compare cache check to only care about a few bits, though: S_IXUSR and S_IFDIR. And then ... I think I agree. But since I am reluctant to take enough time to understand the code well enough to write this patch, I'll shut up now ;). -- I won't rest till it's the best ... Programmer, Linux Scalability Paul Jackson [EMAIL PROTECTED] 1.650.933.1373, 1.925.600.0401 - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Storing permissions
On Sat, 16 Apr 2005, Linus Torvalds wrote: Anybody want to send a patch to do this? Actually, I just did it. Seems to work for the only test-case I tried, namely I just committed it, and checked that the permissions all ended up being recorded as 0644 in the tree (if it has the -x bit set, they get recorded as 0755). When checking out, we always check out with 0666 or 0777, and just let umask do its thing. We only test bit 0100 when checking for differences. Maybe I missed some case, but this does indeed seem saner than the try to restore all bits case. If somebody sees any problems, please holler. (Btw, you may or may not need to blow away your index file by just re-creating it with a read-tree after you've updated to this. I _tried_ to make sure that the compare just ignored the ce_mode bits, but the fact is, your index file may be corrupt in the sense that it has permission sets that sparse expects to never generate in an index file any more..) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] show-diff.c: do not include unused header file
This is my bad. I added #include ctype.h to the file, which I ended up not using and failed to remove it. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c: d85d79b97a59342390bd34da09049dd58d56900f --- show-diff.c +++ show-diff.c 2005-04-16 22:37:29.0 -0700 @@ -4,7 +4,6 @@ * Copyright (C) Linus Torvalds, 2005 */ #include cache.h -#include ctype.h static char *diff_cmd = diff -L '%s' -u -N - '%s'; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add lsremote command.
This is a fairly trivial addition, but if users are adding remote repositories with git addremote, then those users should be able to list out the remote list without having to know the details of where the remotes file is kept. Steven Adds lsremote command to list remotes. Signed-Off-By: Steven Cole [EMAIL PROTECTED] - diff -urN git-pasky-orig/git git-pasky/git --- git-pasky-orig/git 2005-04-16 22:47:22.0 -0600 +++ git-pasky/git 2005-04-16 22:49:14.0 -0600 @@ -41,6 +41,7 @@ log ls [TREE_ID] lsobj [OBJTYPE] + lsremote merge -b BASE_ID FROM_ID pull [RNAME] rm FILE... @@ -105,6 +106,7 @@ log)gitlog.sh $@;; ls) gitls.sh $@;; lsobj) gitlsobj.sh $@;; +lsremote) gitlsremote.sh $@;; merge) gitmerge.sh $@;; pull) gitpull.sh $@;; rm) gitrm.sh $@;; diff -urN git-pasky-orig/gitlsremote.sh git-pasky/gitlsremote.sh --- git-pasky-orig/gitlsremote.sh 1969-12-31 17:00:00.0 -0700 +++ git-pasky/gitlsremote.sh 2005-04-16 22:58:15.0 -0600 @@ -0,0 +1,7 @@ +#!/bin/sh +# +# ls remotes in GIT repository +# +[ -e .git/remotes ] cat .git/remotes exit 1 + +echo 'List of remotes is empty. See git addremote.'
[PATCH] Fix off-by-one error in show-diff
The patch to introduce shell safety to show-diff has an off-by-one error. Here is an fix. Signed-off-by: Junio C Hamano [EMAIL PROTECTED] --- show-diff.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) show-diff.c: 8a24ff62b85a6e23469e3f0e7a20170dfe543ebf --- show-diff.c +++ show-diff.c 2005-04-16 22:53:11.0 -0700 @@ -27,8 +27,8 @@ int cnt, c; char *cp; - /* count single quote characters */ - for (cnt = 0, cp = src; *cp; cnt++, cp++) + /* count bytes needed to store the quoted string. */ + for (cnt = 1, cp = src; *cp; cnt++, cp++) if (*cp == '\'') cnt += 3; - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html