Re: Mozilla SHA1 implementation
Linus Torvalds writes: I've just integrated the Mozilla SHA1 library implementation that Adgar Toernig sent me into the standard git archive (but I did the integration differently). Here is a new PPC SHA1 patch that integrates better with this... Interestingly, the Mozilla SHA1 code is about twice as fast as the openssl code on my G5, and judging by the disassembly, it's because it's much simpler. I think the openssl people have unrolled all the loops totally, which tends to be a disaster on any half-way modern CPU. But hey, it could be something as simple as optimization flags too. Very interesting. On my G4 powerbook (since I am at LCA), for a fsck-cache on a linux-2.6 tree, it takes 6.6 seconds with the openssl SHA1, 10.7 seconds with the Mozilla SHA1, and ~5.8 seconds with my SHA1. I'll test it on a G5 tonight, hopefully. Paul. diff -urN git.orig/Makefile git/Makefile --- git.orig/Makefile 2005-04-22 16:23:44.0 +1000 +++ git/Makefile2005-04-22 16:43:31.0 +1000 @@ -34,9 +34,14 @@ SHA1_HEADER=mozilla-sha1/sha1.h LIB_OBJS += mozilla-sha1/sha1.o else +ifdef PPC_SHA1 + SHA1_HEADER=ppc/sha1.h + LIB_OBJS += ppc/sha1.o ppc/sha1ppc.o +else SHA1_HEADER=openssl/sha.h LIBS += -lssl endif +endif CFLAGS += '-DSHA1_HEADER=$(SHA1_HEADER)' @@ -77,7 +82,7 @@ write-tree.o: $(LIB_H) clean: - rm -f *.o mozilla-sha1/*.o $(PROG) $(LIB_FILE) + rm -f *.o mozilla-sha1/*.o ppc/*.o $(PROG) $(LIB_FILE) backup: clean cd .. ; tar czvf dircache.tar.gz dir-cache diff -urN git.orig/ppc/sha1.c git/ppc/sha1.c --- /dev/null 2005-04-04 12:56:19.0 +1000 +++ git/ppc/sha1.c 2005-04-22 16:29:19.0 +1000 @@ -0,0 +1,72 @@ +/* + * SHA-1 implementation. + * + * Copyright (C) 2005 Paul Mackerras [EMAIL PROTECTED] + * + * This version assumes we are running on a big-endian machine. + * It calls an external sha1_core() to process blocks of 64 bytes. + */ +#include stdio.h +#include string.h +#include sha1.h + +extern void sha1_core(uint32_t *hash, const unsigned char *p, + unsigned int nblocks); + +int SHA1_Init(SHA_CTX *c) +{ + c-hash[0] = 0x67452301; + c-hash[1] = 0xEFCDAB89; + c-hash[2] = 0x98BADCFE; + c-hash[3] = 0x10325476; + c-hash[4] = 0xC3D2E1F0; + c-len = 0; + c-cnt = 0; + return 0; +} + +int SHA1_Update(SHA_CTX *c, const void *ptr, unsigned long n) +{ + unsigned long nb; + const unsigned char *p = ptr; + + c-len += n 3; + while (n != 0) { + if (c-cnt || n 64) { + nb = 64 - c-cnt; + if (nb n) + nb = n; + memcpy(c-buf.b[c-cnt], p, nb); + if ((c-cnt += nb) == 64) { + sha1_core(c-hash, c-buf.b, 1); + c-cnt = 0; + } + } else { + nb = n 6; + sha1_core(c-hash, p, nb); + nb = 6; + } + n -= nb; + p += nb; + } + return 0; +} + +int SHA1_Final(unsigned char *hash, SHA_CTX *c) +{ + unsigned int cnt = c-cnt; + + c-buf.b[cnt++] = 0x80; + if (cnt 56) { + if (cnt 64) + memset(c-buf.b[cnt], 0, 64 - cnt); + sha1_core(c-hash, c-buf.b, 1); + cnt = 0; + } + if (cnt 56) + memset(c-buf.b[cnt], 0, 56 - cnt); + c-buf.l[7] = c-len; + sha1_core(c-hash, c-buf.b, 1); + memcpy(hash, c-hash, 20); + return 0; +} diff -urN git.orig/ppc/sha1.h git/ppc/sha1.h --- /dev/null 2005-04-04 12:56:19.0 +1000 +++ git/ppc/sha1.h 2005-04-22 16:45:28.0 +1000 @@ -0,0 +1,20 @@ +/* + * SHA-1 implementation. + * + * Copyright (C) 2005 Paul Mackerras [EMAIL PROTECTED] + */ +#include stdint.h + +typedef struct sha_context { + uint32_t hash[5]; + uint32_t cnt; + uint64_t len; + union { + unsigned char b[64]; + uint64_t l[8]; + } buf; +} SHA_CTX; + +int SHA1_Init(SHA_CTX *c); +int SHA1_Update(SHA_CTX *c, const void *p, unsigned long n); +int SHA1_Final(unsigned char *hash, SHA_CTX *c); diff -urN git.orig/ppc/sha1ppc.S git/ppc/sha1ppc.S --- /dev/null 2005-04-04 12:56:19.0 +1000 +++ git/ppc/sha1ppc.S 2005-04-22 16:29:19.0 +1000 @@ -0,0 +1,185 @@ +/* + * SHA-1 implementation for PowerPC. + * + * Copyright (C) 2005 Paul Mackerras. + */ +#define FS 80 + +/* + * We roll the registers for T, A, B, C, D, E around on each + * iteration; T on iteration t is A on iteration t+1, and so on. + * We use registers 7 - 12 for this. + */ +#define RT(t) t)+5)%6)+7) +#define RA(t) t)+4)%6)+7) +#define RB(t) t)+3)%6)+7) +#define RC(t) t)+2)%6)+7) +#define RD(t) t)+1)%6)+7) +#define RE(t)
Re: [ANNOUNCE] git-pasky-0.6.3 request for testing
On Fri, Apr 22, 2005 at 05:09:31AM +0200, Petr Baudis wrote: Hello, FYI, I've released git-pasky-0.6.3 earlier in the night. Hm, fun thing to try: go into a kernel git tree. rm Makefile git diff Watch it as it thinks that every Makefile in the kernel tree is now gone... thanks, greg k-h - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] git-pasky-0.6.3 request for testing
With git-pasky 0.6.3, git log is unusable on my Mandrake 10.1 system. Basically I get a neverending flood of these until I press 'q' to quit less: /home/barryn/softbag/git-pasky-0.6.3/gitlog.sh: line 73: 7598 Segmentation faul t sed -re ' / *Signed-off-by.*/Is//'$colsignoff''$c oldefault'/ s/^// ' /home/barryn/softbag/git-pasky-0.6.3/gitlog.sh: line 73: 7609 Segmentation faul t sed -re ' / *Signed-off-by.*/Is//'$colsignoff''$c oldefault'/ s/^// ' /home/barryn/softbag/git-pasky-0.6.3/gitlog.sh: line 73: 7620 Segmentation faul t sed -re ' / *Signed-off-by.*/Is//'$colsignoff''$c oldefault'/ s/^// ' git-pasky-0.6.2 works fine. I'm not sure if I have time tonight (or tomorrow) to troubleshoot this further, but I'll see if I can. -Barry K. Nathan [EMAIL PROTECTED] - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Mozilla SHA1 implementation
Linus Torvalds writes: Interestingly, the Mozilla SHA1 code is about twice as fast as the openssl code on my G5, and judging by the disassembly, it's because it's much simpler. I think the openssl people have unrolled all the loops totally, which tends to be a disaster on any half-way modern CPU. But hey, it could be something as simple as optimization flags too. Which gcc version are you using? I get the opposite result on my 2GHz G5: the Mozilla version does 45MB/s, the openssl version does 135MB/s, and my version does 218MB/s. The time for a fsck-cache on a linux-2.6 tree (cache hot) is 8.0 seconds for the Mozilla version, 5.2 seconds for the openssl version, and 4.4 seconds for my version. Paul. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
proposal: delta based git archival
I noticed people on this mailing list start talking about using blob deltas for compression, and the basic issue that the resulting files are too small for efficient filesystem storage. I thought about this a little and decided I should send out my ideas for discussion. In my proposal, the current git object storage model (one compressed object per file) remains as the primary storage mechanism, however there would be some kind of backup mechanism based on multiple deltas grouped in one file. For example, suppose you're looking for an object with a hash of eab75ce51622aa312bb0b03572d43769f420c347 First you'd look at .git/objects/ea/b75ce51622aa312bb0b03572d43769f420c347 - if the file exists, that's your object. If the file does not exist, you'd then look for .git/deltas/ea/b, .git/deltas/ea/b7, .git/deltas/ea/b75, .git/deltas/ea/b75c, ... up to some maximum search path lenght. You stop at the first file you can find. Supposing that file is .git/deltas/ea/b7, it would contain a diff (let's assume unified format for now, though ideally it'd be better to have something that allows binary file deltas too) of many archived objects with hashes starting with eab7, compared to a different object (presumably some direct or indirect ancestor): diff -u 8f5ba0203e31204c5c052d995a5b4449226bcfb5 eab75ce51622aa312bb0b03572d43769f420c347 --- 8f5ba0203e31204c5c052d995a5b4449226bcfb5 +++ eab75ce51622aa312bb0b03572d43769f420c347 @@ -522,7 +522,7 @@ diff -u 77dc2cb94930017f62b55b9706cbadda8c90f650 eab71c51dbc62797d6c903203de44cc6a734c05c --- 77dc2cb94930017f62b55b9706cbadda8c90f650 +++ eab71c51dbc62797d6c903203de44cc6a734c05c @@ -560,13 +563,17 @@ ... Based on this delta file, we'd then look for the object 8f5ba0203e31204c5c052d995a5b4449226bcfb5 (this process could require recursively rebuilding that object) and try to build eab75ce51622aa312bb0b03572d43769f420c347 by applying the delta and then double checking the hash. To me the strenghts of this proposal would be: * It does not muddy the git object model - it just acts independently of it, as a way to rebuild git objects from deltas * Old objects can be compressed by creating a delta with a close ancestor, then erasing the original file storage for that object. The object delta can be appended to an existing delta file (which avoids the small-file storage issue), or if the delta file gets too big, it can be split off into 16 smaller files based on the hashes of the objects this file stores deltas for. * The system is flexible enough to explore different delta strategies. For example one could decide to keep one object every 10 in the database and store other 9 as deltas based on the immediate object ancestor, or any other tradeoff - and the system would still work the same (with different performance tradeoffs though). Does this sound insane ? Too complicated maybe ? Is there any kind of semi-standard binary-capable multiple-file diff format that could be used for this application instead of unified diffs ? -- Michel Walken Lespinasse Bill Gates is a monocle and a Persian cat away from being the villain in a James Bond movie. -- Dennis Miller - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GIT_INDEX_FILE environment variable
Howdy, Linus Torvalds wrote: On Thu, 21 Apr 2005, Junio C Hamano wrote: I am thinking about an alternative way of doing the above by some modifications to the git core. I think the root of this problem is that there is no equivalent to GIT_INDEX_FILE and SHA1_FILE_DIRECTORY that tells the core git where the project top directory (i.e. the root of the working tree that corresponds to what $GIT_INDEX_FILE describes) is. I'd _really_ prefer to just try to teach people to work from the top directory instead. Would it be okay if that were settable on a per-repository basis? :) Or do you have specific subset of operations you want restricted? - A new environment variable GIT_WORKING_TREE points at the root of the working tree. [snip] I really don't like it that much, but to some degree it obviously is exactly what --prefix= does to checkout-cache. It's basically saying that all normal file operations have to be prefixed with a magic string. I'm going to script it one way or the other, but the environment route allows me to set things up after a fork and before exec in Perl. This works regardless of what git command I'm running, and should work even with ithreads. This ease of use would not be the case with the '--prefix' solution, as scripting the commands would requiring passing arguments to those commands that need/support them at a higher level than is desirable. At present, I have implemented Yogi to support being able to run commands from a different working directory than the root of the repository, and that behavior might be per-repository settable (someday). If I had my way, I would like to see git support the following variables: GIT_WORKING_DIRECTORY - default to '.' GIT_CACHE_DIRECTORTY- default to ${GIT_WORKING_DIRECTORY}/.git GIT_OBJECT_DIRECTORY- defaults to ${GIT_CACHE_DIRECTORY}/objects The reasoning is simple: One object repository can be shared among numerous working caches, which can be shared among multiple working directories (e.g. any directories under the project root, but maybe also import/exports, or other magic...). There are two layers of one to many relationships between the three classes of directories, and my scripts want to make use of that flexibility to the hilt. Also, do you really think git will only ever have the index file, and not someday possibly other related bits? (You may have said that elsewhere, but I missed it.) If that's ever the case, the directory variable is the way to go; scripts can be forward compatible and won't risk accidentally mingling repository data when their scripts have only set GIT_INDEX_FILE and not GIT_SOME_OTHER_FILE. That said, I think GIT_INDEX_FILE would supplement the above scheme nicely, overriding a default of ${GIT_CACHE_DIRECTORY}/index, because of use cases you've described. Cheers, Zach - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] multi item packed files
Linus Torvalds [EMAIL PROTECTED] writes: And dammit, if I'm the original author and likely biggest power-user, and _I_ can't be bothered to use special filesystems, then who can? Nobody. If someone is motivated enough, and if the task is quite trivial (as it seems to be) someone may try it. I can see nothing wrong with it as long as it doesn't affect other people. This is why I absolutely do not believe in arguments like if your filesystem doesn't do tail packing, you shouldn't use it or if your don't have name hashing enabled in your filesystem it's broken. Of course. But one may consider using a filesystem with, say, different settings. Or a special filesystem for this task, such as CNFS used by news servers (it seems news servers do quite the same what git does, except they also purge old contents, i.e., container files don't grow up). I'm perfectly willing to optimize for the common case, but that's as far as it goes. I do not want to make fundamental design decisions that depend on the target filesystem having some particular feature. The optimization would be (in) the underlying filesystem (i.e., the OS thing, or possibly a shared preloaded library?), not git itself. -- Krzysztof Halasa - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: proposal: delta based git archival
On 4/22/05, Michel Lespinasse [EMAIL PROTECTED] wrote: I noticed people on this mailing list start talking about using blob deltas for compression, and the basic issue that the resulting files are too small for efficient filesystem storage. I thought about this a little and decided I should send out my ideas for discussion. I've been thinking in another simpler approach. The main benefit of using deltas is reducing the bandwith use in pull/push. My idea is leaving the blob storage as it is by now and adding a new kind of object (remote) that acts as a link to an object in another repository. So that, when you rsync, you don't have to get all the blobs (which can be a lot of data), but only the sha1 of the new objects created. Then a remote object is created for each new object in the local repository pointing to its location in the external repository. Once the rsync is done, when git has to access any of the new objects they can be fetched from the original location, so that only necessary objects are transfered. This way, the cost of a sync in terms of bandwith is nearly zero. I've been working on this, so if you think it to be a good idea, I can send a patch when I get it fully working. Regards, Jaime Medrano. http://jmedrano.sl-form.com - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] git-pasky-0.6.3 request for testing
Dear diary, on Fri, Apr 22, 2005 at 09:24:37AM CEST, I got a letter where Barry K. Nathan [EMAIL PROTECTED] told me that... On Fri, Apr 22, 2005 at 12:16:26AM -0700, Barry K. Nathan wrote: With git-pasky 0.6.3, git log is unusable on my Mandrake 10.1 system. Basically I get a neverending flood of these until I press 'q' to quit less: [snip sed segmentation faults which happen with 0.6.3 but not 0.6.2] I'm not sure if I have time tonight (or tomorrow) to troubleshoot this further, but I'll see if I can. I had sed-4.1.1-2mdk. I downloaded sed-4.1.4-2mdk (from Mandriva 2005 Limited Edition) and updated to that, and the problem went away. FWIW this is the second package I've had to update to the Mandriva 2005 LE level (the first was mktemp). I don't mind however. Duh, segfaulting sed! Could you please check which of the sed invocations actually segfault for you? Thanks, -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
First web interface and service API draft
Hi, me again after a couple of hours of sleep ;-) This probably gets a bit longer so if you are not interested in a web service api or the web interface now is your chance to get off the train. I'm probably making a complete git of myself but that's not uncalled for in this contxt ;-) For those that are still with me let me start by iterating again that I _do_ care for URIs as the primary API for web service applications _and_ humans. I probably don't have to tell Linux people anything about the importance to get the API right ;-) As it's fairly early in the web service interface cycle I like to change things around a little bit and starting to get the API straight. The following considerations should be pretty implementation agnostic and not specific to wit. The interface should be flexible enough to be used as a kind of web command line. --- /project Ok. The URI should start by stating the project name e.g. /linux-2.6. This does bloat the URI slightly but I don't think that we want to have one root namespace per git archive in the long run. Additionally you can always put rewriting or redirecting rules at the root level for additional convenience when there's an obvious default project. Should provide some meta data, stats, etc. if available. --- /project/blob/blob-sha1 /project/commit/commit-sha1 These are the easy ones: the web interface should be able to spit out the plain text data of a blob and a commit at these URIs. Users would be probably scripts and other downloads. Open questions: * Blob data should be probably binary ? * Should it be commit or changeset ? Linus seems to have changed nomenclature in the REAME * If we serve the pristine commit objects we will put the email addresses in plain sight. If we remove or change the email addresses it's not the original commit object anymore. Thoughts ? --- /project/tree/tree-sha1 Tree objects are served in binary form. Primary audience are scripts, etc. Human beings will probably get a heart attack when they accidentally visit this URI. --- /project/blob/blob-sha1.html /project/commit/commit-sha1.html /project/tree/tree-sha1.html A HTML version of blob, commit and tree fully linked aimed at human beings. --- /project/tree/tree-sha1.tar.bz2 /project/tree/tree-sha1.tar.gz /project/commit/commit-sha1.tar.bz2 /project/commit/commit-sha1.tar.gz Tarballs of the specified commits or trees. Note that these can be individual subtrees too. --- /project/tree/tree-sha1/diff/ancestor-tree-sha1 Unified plain text recursive diff of the given trees. I guess the user could specify any two tree ids but the relevance of the results would vary greatly ;-) * Possibly a DOS issue * does something like /project/tree/tree-sha1/diff/ make sense producing a full diff from scratch ? --- /project/tree/tree-sha1/diff/ancestor-tree-sha1/html Non recursive HTML view of the objects which are contained in the diff fully linked with the individual HTML views. --- /project/blob/blob-sha1/diff/ancestor-sha1 Unified plain text diff of the given blobs. * again /project/blob/blob-sha1/diff/ sensible ? --- /project/blob/blob-sha1/diff/ancestor-sha1/html HTML view (probably colorized) view of a single blob diff. --- /project/changelog/time-spec HTML changelog for the given time-spec. I think valid values for timespec should be number of days nnnd, number of entries nnn and the keyword 'all'. * perhaps additionally number of hours nnnh, number of months nnnm, number of years nnny. Combinations shouldn't be allowed * time ranges are probably overkill * is a plain text version needed /project/changelog/time-spec/plain? --- /project/changelog/time-spec/search/regexp HTML changelog for the given time-spec filtered by the regexp. * again plain version needed ? -- /project/changelog/time-spec/search/author/regexp /project/changelog/time-spec/search/committer/regexp /project/changelog/time-spec/search/signedoffby/regexp convenience wrappers for generic search restricted to these fields. -- open questions: * how to generate and publish additional merge information ? * how to generate and publish tree and blob history information ? This is probably expensive with git. * how to represent branches ? should we code up the branches in the project id like linux-2.6-mm or whatever ? Comments ? Ideas ? Other feedback ? Christian -- Christian Meder, email: [EMAIL PROTECTED] The Way-Seeking Mind of a tenzo is actualized by rolling up your sleeves. (Eihei Dogen Zenji) - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First web interface and service API draft
On 4/22/05, Christian Meder [EMAIL PROTECTED] wrote: Comments ? Ideas ? Other feedback ? I'd suggest serving XML rather than HTML and using client side XSLT to transform it into HTML. Client-side XSLT works well in IE 6 and all versions of Firefox, so there is no question that it is a mature technology. Provide a fall back via server transformed HTML if need be, but that is trivial to do once you have the client-side XSLT stylesheets. Serving XML is as easy as serving HTML and gives you a much more flexible outcome. jon. - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First web interface and service API draft
Dear diary, on Fri, Apr 22, 2005 at 12:41:56PM CEST, I got a letter where Christian Meder [EMAIL PROTECTED] told me that... Hi, Hi, /project Ok. The URI should start by stating the project name e.g. /linux-2.6. This does bloat the URI slightly but I don't think that we want to have one root namespace per git archive in the long run. Additionally you can always put rewriting or redirecting rules at the root level for additional convenience when there's an obvious default project. Should provide some meta data, stats, etc. if available. I don't think this makes much sense. I think you should just apply -p1 to all the directories, and define that there should be some / page which should contain some metadata regarding the repository you are accessing (probably branches, tags, and such). --- /project/blob/blob-sha1 /project/commit/commit-sha1 These are the easy ones: the web interface should be able to spit out the plain text data of a blob and a commit at these URIs. Users would be probably scripts and other downloads. Open questions: * Blob data should be probably binary ? What do you mean by binary? * Should it be commit or changeset ? Linus seems to have changed nomenclature in the REAME We call it commit everywhere but in the README. :-) The changeset name is bad anyway. It is a commit of a complete tree state, diff against one of its parent commits is the set of changes. --- /project/tree/tree-sha1 Tree objects are served in binary form. Primary audience are scripts, etc. Human beings will probably get a heart attack when they accidentally visit this URI. Binary form is unusable for scripts. Anything wrong with putting ls-tree output there? We should also have /gitobj/sha1 for fetching the raw git objects. --- /project/blob/blob-sha1.html /project/commit/commit-sha1.html /project/tree/tree-sha1.html A HTML version of blob, commit and tree fully linked aimed at human beings. How can I imagine an HTML version of blob? --- /project/tree/tree-sha1/diff/ancestor-tree-sha1/html Non recursive HTML view of the objects which are contained in the diff fully linked with the individual HTML views. Why not .html? --- /project/changelog/time-spec I'd personally prefer /log/, but whatever. For consistency, I'd stay with the plaintext output by default, .html if requested. And I think abusing directories for this is bad. Query string seems much more appropriate, since this is something that changes dynamically a lot, not a permanent resource identifier. OTOH, I'd use /log/commit to specify what commit to start at. It just does not make sense otherwise, you would not know where to start. I think the commit should follow the same or similar rules as Cogito id decoding. E.g. to get latest Linus' changelog, you'd do /log/linus --- /project/changelog/time-spec/search/regexp HTML changelog for the given time-spec filtered by the regexp. * again plain version needed ? -- /project/changelog/time-spec/search/author/regexp /project/changelog/time-spec/search/committer/regexp /project/changelog/time-spec/search/signedoffby/regexp convenience wrappers for generic search restricted to these fields. Same here. just ?author=...committer=...signedoffby=... etc. You can even combine several criteria. -- open questions: * how to generate and publish additional merge information ? I don't understand * how to generate and publish tree and blob history information ? This is probably expensive with git. ...this either. * how to represent branches ? should we code up the branches in the project id like linux-2.6-mm or whatever ? See above. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First web interface and service API draft
Dear diary, on Fri, Apr 22, 2005 at 01:34:45PM CEST, I got a letter where Jon Seymour [EMAIL PROTECTED] told me that... On 4/22/05, Christian Meder [EMAIL PROTECTED] wrote: Comments ? Ideas ? Other feedback ? I'd suggest serving XML rather than HTML and using client side XSLT to transform it into HTML. Client-side XSLT works well in IE 6 and all versions of Firefox, so there is no question that it is a mature technology. Provide a fall back via server transformed HTML if need be, but that is trivial to do once you have the client-side XSLT stylesheets. Serving XML is as easy as serving HTML and gives you a much more flexible outcome. Why rather than? Why not in addition to? You just append either .html or .xml, based on what you want. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: First web interface and service API draft
On 4/22/05, Petr Baudis [EMAIL PROTECTED] wrote: Dear diary, on Fri, Apr 22, 2005 at 01:34:45PM CEST, I got a letter where Jon Seymour [EMAIL PROTECTED] told me that... On 4/22/05, Christian Meder [EMAIL PROTECTED] wrote: Comments ? Ideas ? Other feedback ? I'd suggest serving XML rather than HTML and using client side XSLT to transform it into HTML. ... Why rather than? Why not in addition to? You just append either .html or .xml, based on what you want. You are right - there is no good reason that an implementation should not to support both. From the point of view of a specification, though, I think it would be useful to focus on an XML content model rather than the details of one particular HTML model - get the XML model right and you can do whatever you like with the HTML model at any time after that. jon. On 4/22/05, Petr Baudis [EMAIL PROTECTED] wrote: Dear diary, on Fri, Apr 22, 2005 at 01:34:45PM CEST, I got a letter where Jon Seymour [EMAIL PROTECTED] told me that... On 4/22/05, Christian Meder [EMAIL PROTECTED] wrote: Comments ? Ideas ? Other feedback ? I'd suggest serving XML rather than HTML and using client side XSLT to transform it into HTML. Client-side XSLT works well in IE 6 and all versions of Firefox, so there is no question that it is a mature technology. Provide a fall back via server transformed HTML if need be, but that is trivial to do once you have the client-side XSLT stylesheets. Serving XML is as easy as serving HTML and gives you a much more flexible outcome. Why rather than? Why not in addition to? You just append either .html or .xml, based on what you want. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor -- homepage: http://www.zeta.org.au/~jon/ blog: http://orwelliantremors.blogspot.com/ - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-pasky spec file
Chris Wright wrote: Here's a simple spec file to do rpm builds. (snip) Creates a package named git, which seems fine since Linus' isn't likely to be packaged directly. Um. Really? I can't imagine why Linus's git wouldn't be packaged directly. He has strongly indicated that folks who want to build on top of it should not expect to see libgit any time soon, so git will be an important independent tool. But presumably you'll change the name of this package to cogito soon anyway, as soon as git-pasky itself is renamed. Kevin - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[patch] fixup GECOS handling
Hi, This still applies - any reason for not doing this? Thanks, The GECOS is delimited by ',' or ';', so we should only use whatever is before the first ',' or ';' for the full name, rather than just stripping those. Signed-off-by: Martin Schlemmer [EMAIL PROTECTED] commit-tree.c: ec53a4565ec0033aaf6df2a48d233ccf4823e8b0 --- 1/commit-tree.c +++ 2/commit-tree.c 2005-04-18 12:22:18.0 +0200 @@ -96,21 +96,6 @@ if (!c) break; } - - /* -* Go back, and remove crud from the end: some people -* have commas etc in their gecos field -*/ - dst--; - while (--dst = p) { - unsigned char c = *dst; - switch (c) { - case ',': case ';': case '.': - *dst = 0; - continue; - } - break; - } } static const char *month_names[] = { @@ -313,6 +298,11 @@ if (!pw) die(You don't exist. Go away!); realgecos = pw-pw_gecos; + /* The name is seperated from the room no., tel no, etc via [,;] */ + if (strchr(realgecos, ',')) + *strchr(realgecos, ',') = 0; + else if (strchr(realgecos, ';')) + *strchr(realgecos, ';') = 0; len = strlen(pw-pw_name); memcpy(realemail, pw-pw_name, len); realemail[len] = '@'; -- Martin Schlemmer commit-tree.c: ec53a4565ec0033aaf6df2a48d233ccf4823e8b0 --- 1/commit-tree.c +++ 2/commit-tree.c 2005-04-18 12:22:18.0 +0200 @@ -96,21 +96,6 @@ if (!c) break; } - - /* - * Go back, and remove crud from the end: some people - * have commas etc in their gecos field - */ - dst--; - while (--dst = p) { - unsigned char c = *dst; - switch (c) { - case ',': case ';': case '.': - *dst = 0; - continue; - } - break; - } } static const char *month_names[] = { @@ -313,6 +298,11 @@ if (!pw) die(You don't exist. Go away!); realgecos = pw-pw_gecos; + /* The name is seperated from the room no., tel no, etc via ',' or ';' */ + if (strchr(realgecos, ',')) + *strchr(realgecos, ',') = 0; + else if (strchr(realgecos, ';')) + *strchr(realgecos, ';') = 0; len = strlen(pw-pw_name); memcpy(realemail, pw-pw_name, len); realemail[len] = '@'; signature.asc Description: This is a digitally signed message part
[git pasky] tarball question
Hi, I understand why you have the git-pasky-0.6.x.tar.bz2 tarballs with the .git database included as well (btw, great stuff renaming it to something more distributable), but its going to be a pita for users of source based distro's like us (Gentoo), as well as our mirrors if it gets much bigger. (Already asked r3pek to add it to portage). How about ripping the .git directory from the next release, and just have a un-numbered tarball (like you used to) that have the latest snapshot of the .git directory for those that want to do git-pasky development? Should even make things easier your side, as you could just do a cron to update it one a day/whatever. Thanks, -- Martin Schlemmer signature.asc Description: This is a digitally signed message part
Re: git pull on ia64 linux tree
On Fri, 22 Apr 2005 [EMAIL PROTECTED] wrote: git log seems to have problems interpreting the dates ... looking at the commit entries, the time is right ... but it appears that git log applies the timezone correction twice, so the changes I just applied at 14:46 PDT look like I made them at quarter to five tomorrow morning (+14 hours from when I did). Looks like you are right. The seconds are already in UTC format, so I think git log is wrong to pass the UTC seconds in to date, and then tell date that it was done in the original timezone. I think it would be nice to use the TZ data to show the thing in the timezone of the committer, though. Dunno how to do that, maybe something like TZ=$tz date -d 1970-01-01 + $sec sec or whatever. Sadly, it looks like date doesn't understand timezone syntax like that - looks like TZ has to be in the long machine-unreadable format like US/Pacific etc. Stupid (either TZ or me - maybe I just don't know what the right format is). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [git pasky] tarball question
On Sat, 2005-04-23 at 00:42 +0200, Petr Baudis wrote: Dear diary, on Fri, Apr 22, 2005 at 04:31:43PM CEST, I got a letter where Martin Schlemmer [EMAIL PROTECTED] told me that... Hi, Hi, I understand why you have the git-pasky-0.6.x.tar.bz2 tarballs with the .git database included as well (btw, great stuff renaming it to something more distributable), but its going to be a pita for users of source based distro's like us (Gentoo), as well as our mirrors if it gets much bigger. (Already asked r3pek to add it to portage). yes; that was actually the plan, it's just that my memory is so volatile... Yep, saw before you posted about the change in URL, thanks. How about ripping the .git directory from the next release, and just have a un-numbered tarball (like you used to) that have the latest snapshot of the .git directory for those that want to do git-pasky development? Should even make things easier your side, as you could just do a cron to update it one a day/whatever. Does it actually make sense to keep a tarball with history? Just build git-pasky and do git init. (Or rsync it manually.) Well, I did not know about kernel.org hosting it, so I thought it might help due to your reasons for initially tarballing the whole thing =) Thanks, -- Martin Schlemmer signature.asc Description: This is a digitally signed message part
RE: [3/5] Add http-pull
But if you download 1000 files of the 1010 you need, and then your network goes down, you will need to download those 1000 again when it comes back, because you can't save them unless you have the full history. So you could make the temporary object repository persistant between pulls to avoid reloading them across the wire. Something like: get_commit(sha1) { if (sha1 in real_repo) - done if (!(sha1 in tmp_repo)) load sha1 to tmp_repo get_tree(sha1-tree) for each parent get_commit(sha1-parent) move sha1 from tmp_repo to real_repo } get_tree(sha1) { if (sha1 in real_repo) - done if (!(sha1 in tmp_repo)) load sha1 to tmp repo for_each (sha1-entry) { case blob: if (!sha1 in real_repo) load to real_repo case tree: get_tree() } move sha1 from tmp_repo to real_repo } The load sha1 to xxx_repo needs to be smarter than my dumb wget based script ... it must confirm the sha1 of the object being loaded before installing (even into the tmp_repo). -Tony - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] git-pasky: Add .gitrc directory to allow command defaults like with .cvsrc
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi, one thing I liked about CVS was its way to configure default parameters for commands. And as I really like the colored log output, I wanted it as default. While .cvsrc parsing would be quite expensive, using a directory + files should be fairly cheap and result just in one additional stat-call. So I added -c to ~/.gitrc/log and some code to parse this. Index: git === - --- 0a9ee5a4d947b998a7ce489242800b39f985/git (mode:100755 sha1:39969debd59ed51c57973c819cdcc3ca8a7da819) +++ uncommitted/git (mode:100755) @@ -67,6 +67,7 @@ exit 1 fi +[ -e $HOME/.gitrc/$cmd ] set -- $(cat $HOME/.gitrc/$cmd) $@ case $cmd in add)gitadd.sh $@;; cu Fabian PS: Should the commandline parsing be cleaned up or do you want to do that after first release of cogito? And if yes, do you want to use getopts or would this be not supported on some systems? PPS: I'm fairly new to git, how do I create a diff with the signed-by fields and with what do I need to sign it? -BEGIN PGP SIGNATURE- Version: GnuPG v1.2.4 (GNU/Linux) iD8DBQFCaSZDI0lSH7CXz7MRAoq8AJwM2lxPfl0ej32WU7q6bh6WIq5+EACgghGn mvJzbvg6/bxWLFKfsP1ZEeI= =03wm -END PGP SIGNATURE- - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GIT_INDEX_FILE environment variable
Dear diary, on Sat, Apr 23, 2005 at 12:14:16AM CEST, I got a letter where Linus Torvalds [EMAIL PROTECTED] told me that... (And I personally think that show-diff is really part of the wrapper scripts around git. I wrote it originally just because I needed something to verify the index file handling, not because it's core like the other programs. I do _not_ consider show-diff to be part of the core git code, really. Same goes for git-export, btw - for the same reasons. It's not fundamental). Note that Cogito almost actually does not use show-diff anymore. I'm doing diff-cache now, since that is what matters to me. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/5] Add http-pull
Dear diary, on Sat, Apr 23, 2005 at 01:00:33AM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... On Sat, 23 Apr 2005, Petr Baudis wrote: Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... Huh. Why? You just go back to history until you find a commit you already have. If you did it the way as Tony described, if you have that commit, you can be sure that you have everything it depends on too. But if you download 1000 files of the 1010 you need, and then your network goes down, you will need to download those 1000 again when it comes back, because you can't save them unless you have the full history. Why can't I? I think I can do that perfectly fine. The worst thing that can happen is that fsck-cache will complain a bit. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] multi item packed files
On Thu, 21 Apr 2005, Chris Mason wrote: We can sort by the files before reading them in, but even if we order things perfectly, we're spreading the io out too much across the drive. No we don't. It's easy to just copy the repository in a way where this just isn't true: you sort the objects by how far they are from the current HEAD, and you just copy the repository in that order (furthest objects first - commits last). That's what I meant by defragmentation - you can actually do this on your own, even if your filesystem doesn't support it. Do it twice a year, and I pretty much guarantee that your performance will stay pretty constant over time. The one exception is fsck, which doesn't seek in history order. And this works exactly because: - we don't do no steenking delta's, and don't have deep chains of data to follow. The longest chain we ever have is just a few deep, and it's trivial to just encourage the filesystem to have recent things together. - we have an append-only mentality. In fact, it works for exactly the same reason that makes us able to drop old history if we want to. We essentially drop the history to another part of the disk. Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [ANNOUNCE] git-pasky-0.6.3 request for testing
On Sat, 23 Apr 2005, Petr Baudis wrote: Just FYI, this is bug in core git's diff-cache; Nice find. Yes, I told you guys I hadn't tested it well ;) diff-cache does the same diff trees in lockstep thing that diff-tree does, but it's actually more complex, since the _tree_ part always needs to be recursively followed, while the _cache_ part is this linear list that is already expanded. Which just made the whole algorithm very messy. Once I found out how nasty it was to do that compare, I was actually planning to re-write the thing using the same approach that read-tree -m tree does - ie move the tree information _into_ the in-memory cache, at which point it should be absolutely trivial to compare the two. But since the horrid algorithm seemed to end up working, I never did. I'm not even going to debug this bug. I'm just going to rewrite diff-cache to do what I should have done originally, ie use the power of the in-memory cache. That's also automatically going to properly warn about unmerged files. Give me five minutes ;) Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: GIT_INDEX_FILE environment variable
On Fri, 22 Apr 2005, Junio C Hamano wrote: Almost, with a counter-example. Please try this yourself: I agree that what git outputs is always based on the archive base. But that's an independent issue from where is the working directory. That's the issue of how do you want me to print out the results. To see just how independent that is, think about how git-pasky (and, indeed, standard show-diff) already prints out the results in a _different_ base than the working directory _or_ the base. Ie the way we already do --- a/Makefile +++ b/Makefile ... patch ... for a patch to Makefile in the top-level directory. IOW, showing pathnames is different from _using_ them. And if you were planning on using the same logic for both, you'd have been making a mistake in the first place. To _use_ pathnames, you use pwd. To _show_ them, you use some other mechanism. You must not mix up those two issues, or you'd always get show-diff wrong. I actually think that showing the pathnames is up to the wrapper scripts. Git core really always just works on the canonical format. (And I personally think that show-diff is really part of the wrapper scripts around git. I wrote it originally just because I needed something to verify the index file handling, not because it's core like the other programs. I do _not_ consider show-diff to be part of the core git code, really. Same goes for git-export, btw - for the same reasons. It's not fundamental). Linus - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
git remote repositories
Hi, It wasn't that long ago that the pasky git tree was relocated. This required a modification to the .git directory in a local pull. A dns system could be built to ensure the following: A) quick easy lookup of archive locations B) handle changes of repository location C) add mirror support So heres the plan... I do a lot of work in sip/voip field, and our approach to handling backup proxies and routers is to use a dns srv record. Here's how it works for voip/sip. _{protocol}._{transport}.{name}.hostname.org A sample lookup: dig SRV _sip._udp.proxy-dca.broadvoice.com ;; QUESTION SECTION: ;_sip._udp.proxy-dca.broadvoice.com. IN SRV ;; ANSWER SECTION: _sip._udp.proxy-dca.broadvoice.com. 86400 IN SRV 1 0 5060 proxy.mia.broadvoice.com. _sip._udp.proxy-dca.broadvoice.com. 86400 IN SRV 0 0 5060 proxy.dca.broadvoice.com. Now of course we could null out some of those fields and swap sip for git and udp for rsync, then replace proxy.foo to rsync://host/path/to/git. Since we're using rsync, mirroring is simplified by just rsyncing the trees. Dan - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [3/5] Add http-pull
Dear diary, on Fri, Apr 22, 2005 at 09:46:35PM CEST, I got a letter where Daniel Barkalow [EMAIL PROTECTED] told me that... On Thu, 21 Apr 2005 [EMAIL PROTECTED] wrote: On Wed, 20 Apr 2005, Brad Roberts wrote: How about fetching in the inverse order. Ie, deepest parents up towards current. With that method the repository is always self consistent, even if not yet current. Daniel Barkalow replied: You don't know the deepest parents to fetch until you've read everything more recent, since the history you'd have to walk is the history you're downloading. You just need to defer adding tree/commit objects to the repository until after you have inserted all objects on which they depend. That's what my wget based version does ... it's very crude, in that it loads all tree commit objects into a temporary repository (.gittmp) ... since you can only use cat-file and ls-tree on things if they live in objects/xx/xxx..xxx The blobs can go directly into the real repo (but to be really safe you'd have to ensure that the whole blob had been pulled from the network before inserting it ... it's probably a good move to validate everything that you pull from the outside world too). The problem with this general scheme is that it means that you have to start over if something goes wrong, rather than resuming from where you left off (and being able to use what you got until then). Huh. Why? You just go back to history until you find a commit you already have. If you did it the way as Tony described, if you have that commit, you can be sure that you have everything it depends on too. -- Petr Pasky Baudis Stuff: http://pasky.or.cz/ C++: an octopus made by nailing extra legs onto a dog. -- Steve Taylor - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
wit suggestion
Hi Christian Can I suggest a 'summary diff' option It's basically a diff between the tree of the commit and the tree of the parent commit It would show what files have changed rather than the diff of the files that have changed. (kinda like diffstat without the for now) (or maybe just do a diffstat if it's easier) Of course you could click through to a per-file diff eventually... David -- - To unsubscribe from this list: send the line unsubscribe git in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html