Re: [PATCH] Undefine strlcpy if needed.
On 25/08/14 02:54, tsuna wrote: On Sun, Aug 24, 2014 at 5:32 PM, Ramsay Jones ram...@ramsay1.demon.co.uk wrote: Again, I don't have access to an OS X system, so I don't know which package provides libintl/gettext, but it seems to be missing on your system. Probably yeah, those libraries don’t seem to be provided in standard with OS X or OS X’s development tools, so maybe the Makefile should also default to having NO_GETTEXT=YesPlease when on OS X. You can avoid the build failure, without running configure, by setting NO_GETTEXT=YesPlease in your config.mak file. I need to run configure first: $ make configure GEN configure $ ./configure configure: Setting lib to 'lib' (the default) […] So, presumably, configure has set NO_GETEXT=YesPlease in your config.mak.autogen file. Yes it did. I don’t mind running configure, but so far Git has compiled fine without doing it. Should we fix the default values of NO_STRLCPY / NO_GETEXT on OS X? Is NO_STRLCPY still a problem with a fresh clone (and putting NO_GETEXT=YesPlease in your config.mak)? I still do not understand why you were getting those warnings; AFAICT it should not be happening! Also, Torsten could not reproduce. As far as NO_GETTEXT is concerned, I have to defer to someone who has experience on that platform (I have _zero_ experience on OS X). ATB, Ramsay Jones -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote: diff --git a/wrapper.c b/wrapper.c index bc1bfb8..69d1c9b 100644 --- a/wrapper.c +++ b/wrapper.c @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = do_nothing; static void memory_limit_check(size_t size) { - static int limit = -1; - if (limit == -1) { - const char *env = getenv(GIT_ALLOC_LIMIT); - limit = env ? atoi(env) * 1024 : 0; + static size_t limit = SIZE_MAX; + if (limit == SIZE_MAX) { You use SIZE_MAX as the sentinel for not set, and 0 as the sentinel for no limit. That seems kind of backwards. I guess you are inheriting this from the existing code, which lets GIT_ALLOC_LIMIT=0 mean no limit. I'm not sure if we want to keep that or not (it would be backwards incompatible to change it, but we are already breaking compatibility here by assuming bytes rather than kilobytes; I think that's OK because this is not a documented feature, or one intended to be used externally). + const char *var = GIT_ALLOC_LIMIT; + unsigned long val = 0; + const char *env = getenv(var); + if (env !git_parse_ulong(env, val)) + die(Failed to parse %s, var); + limit = val; } This and the next patch both look OK to me, but I notice this part is largely duplicated between the two. We already have git_env_bool to do a similar thing for boolean environment variables. Should we do something similar like: diff --git a/config.c b/config.c index 058505c..11919eb 100644 --- a/config.c +++ b/config.c @@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def) return v ? git_config_bool(k, v) : def; } +unsigned long git_env_ulong(const char *k, unsigned long val) +{ + const char *v = getenv(k); + if (v !git_parse_ulong(k, val)) + die(failed to parse %s, k); + return val; +} + int git_config_system(void) { return !git_env_bool(GIT_CONFIG_NOSYSTEM, 0); It's not a lot of code, but I think the callers end up being much easier to read: if (limit == SIZE_MAX) limit = git_env_ulong(GIT_ALLOC_LIMIT, 0); if (limit size limit) - die(attempting to allocate %PRIuMAX over limit %d, - (intmax_t)size, limit); + die(attempting to allocate %PRIuMAX over limit %PRIuMAX, + (uintmax_t)size, (uintmax_t)limit); This part is duplicated, too, though I do not know if the infrastructure to avoid that is worth the trouble. Unless you wanted to do a whole: check_limit(limit, GIT_ALLOC_LIMIT, size); or something, but I am also not convinced that is not just obfuscating things. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote: The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. The original data is not available to git, so it must fail if the filter fails. Can you clarify this second paragraph a bit more? If I understand correctly, we handle a non-required filter failing by just reading the data again (which we can do because we either read it into memory ourselves, or mmap it). With the streaming approach, we will read the whole file through our stream; if that fails we would then want to read the stream from the start. Couldn't we do that with an lseek (or even an mmap with offset 0)? That obviously would not work for non-file inputs, but I think we address that already in index_fd: we push non-seekable things off to index_pipe, where we spool them to memory. So it seems like the ideal strategy would be: 1. If it's seekable, try streaming. If not, fall back to lseek/mmap. 2. If it's not seekable and the filter is required, try streaming. We die anyway if we fail. 3. If it's not seekable and the filter is not required, decide based on file size: a. If it's small, spool to memory and proceed as we do now. b. If it's big, spool to a seekable tempfile. Your patch implements part 2. But I would think part 1 is the most common case. And while part 3b seems unpleasant, it is better than the current code (with or without your patch), which will do 3a on a large file. Hmm. Though I guess in (3) we do not have the size up front, so it's complicated (we could spool N bytes to memory, then start dumping to a file after that). I do not think we necessarily need to implement that part, though. It seems like (1) is the thing I would expect to hit the most (i.e., people do not always mark their filters are required). - write_err = (write_in_full(child_process.in, params-src, params-size) 0); + if (params-src) { + write_err = (write_in_full(child_process.in, params-src, params-size) 0); Style: 4-space indentation (rather than a tab). There's more of it in this function (and in would_convert...) that I didn't mark. + } else { + /* dup(), because copy_fd() closes the input fd. */ + fd = dup(params-fd); Not a problem you are introducing, but this seem kind of like a misfeature in copy_fd. Is it worth fixing? The function only has two existing callers. + /* Apply a filter to an fd only if the filter is required to succeed. + * We must die if the filter fails, because the original data before + * filtering is not available. + */ Style nit: /* * We have a blank line at the top of our * multi-line comments. */ -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/5] git-prompt: do not look for refs/stash in $GIT_DIR
On Sun, Aug 24, 2014 at 08:22:41PM +0700, Gábor Szeder wrote: On Aug 23, 2014 12:26 PM, Jeff King p...@peff.net wrote: Since dd0b72c (bash prompt: use bash builtins to check stash state, 2011-04-01), git-prompt checks whether we have a stash by looking for $GIT_DIR/refs/stash. Generally external programs should never do this, because they would miss packed-refs. Not sure whether the prompt script is external program or not, but doesn't matter, this is the right thing to do. Yeah, by external I just meant nothing outside of refs.c should make this assumption. That commit claims that packed-refs does not pack refs/stash, but that is not quite true. It does pack the ref, but due to a bug, fails to prune the ref. When we fix that bug, we would want to be doing the right thing here. Signed-off-by: Jeff King p...@peff.net --- I know we are pretty sensitive to forks in the prompt code (after all, that was the point of dd0b72c). This patch is essentially a reversion of this hunk of dd0b72c, and is definitely safe. I'm not sure, but if I remember correctly (don't have the means to check it at the moment, sorry) in that commit I also added a 'git pack-ref' invocation to the relevant test(s?) to guard us against breakages due to changes in 'git pack-refs'. If that is so, then I think those invocations should be removed as well, as they'll become useless. It did add that change (that's actually how I noticed the problem! Thank you for being thorough in dd0b72c). My inclination is to leave the pack-refs invocations, as they protect against a certain class of errors (we are not doing the risky behavior now, but the purpose of the test suite is to detect regressions; the next person to touch that code may not be so careful as you were). I don't feel too strongly, though, so if we want them gone, I'm OK with that. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
On Sun, Aug 24, 2014 at 04:39:37PM -0700, Junio C Hamano wrote: On Sun, Aug 24, 2014 at 8:10 AM, Stefan Beller stefanbel...@gmail.com wrote: for (p = list, i = 0; i cnt; i++) { - struct name_decoration *r = xmalloc(sizeof(*r) + 100); + char name[100]; Would it make sense to convert the 'name' into a git strbuf? Please have a look at Documentation/technical/api-strbuf.txt Why not go one step further and format it twice, once only to measure the necessary size to allocate, allocate and then format into it for real? Then you do not need to print into a temporary strbuf, copy the result and free the strbuf, no? BUT. The string will always be dist= followed by decimal representation of a count that fits in int anyway, so I actually think use of strbuf is way overkill (and formatting it twice also is); the patch as posted should be just fine. I think you are right, and the patch is the right direction (assuming we want to do this; I question whether there are enough elements in the list for us to care about the size, and if there are, we are probably better off storing the int and formatting the strings on the fly). I wonder if there is a way we could get rid of the magic 100 here, though. Its meaning is enough to hold 'dist=' and any integer. But you have to read carefully to see that this call to sprintf is not a buffer overflow. A strbuf is one way to get rid of it, though it is awkward because we then have to copy the result into a flex-array structure. It would be nice if there was some way to abstract the idea of formatting a buffer directly into a flex-array. That would involve the double-format you mention, but we could use it in lots of places to make the code nicer. Maybe like: void *fmt_flex_array(size_t base, const char *fmt, ...) { va_list ap; size_t flex; unsigned char *ret; va_start(ap, fmt); flex = vsnprintf(NULL, 0, fmt, ap); va_end(ap); ret = xmalloc(base + flex + 1); va_start(ap, fmt); /* Eek, see below */ vsnprintf(ret + flex, flex + 1, fmt, ap); return ret; } and you'd call it like: struct name_decoration *r = fmt_flex_array(sizeof(*r), dist=%d, x); Except that I don't think we are guaranteed that offsetof(mystruct, flex_member) is equal to sizeof(mystruct). If FLEX_ARRAY is 0, it should be, but some platforms use FLEX_ARRAY=1. So you'd have to pass in the offset like: struct name_decoration *r = fmt_flex_array(sizeof(*r), offsetof(*r, name), dist=%d, x); which is a little less nice. You could make it nicer with a macro, but we don't assume variadic macros. sigh -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
On Sun, Aug 24, 2014 at 07:47:24PM +0530, Arjun Sreedharan wrote: diff --git a/bisect.c b/bisect.c index d6e851d..c96aab0 100644 --- a/bisect.c +++ b/bisect.c @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n } qsort(array, cnt, sizeof(*array), compare_commit_dist); for (p = list, i = 0; i cnt; i++) { - struct name_decoration *r = xmalloc(sizeof(*r) + 100); + char name[100]; + sprintf(name, dist=%d, array[i].distance); + int name_len = strlen(name); + struct name_decoration *r = xmalloc(sizeof(*r) + name_len); This allocation should be name_len + 1 for the NUL-terminator, no? It looks like add_name_decoration in log-tree already handles half of what you are adding here. Can we just make that available globally (it is manipulating the already-global struct decoration name_decoration)? I also notice that we do not set r-type at all, meaning the decoration lookup code in log-tree will access uninitialized memory (worse, it will use it as a pointer offset into the color list; I got a segfault when I tried to run git rev-list --bisect-all v1.8.0..v1.9.0). I think we need this: diff --git a/bisect.c b/bisect.c index d6e851d..e2a7682 100644 --- a/bisect.c +++ b/bisect.c @@ -219,6 +219,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n struct object *obj = (array[i].commit-object); sprintf(r-name, dist=%d, array[i].distance); + r-type = 0; r-next = add_decoration(name_decoration, obj, r); p-item = array[i].commit; p = p-next; at a minimum. It looks like this was a regression caused by eb3005e (commit.h: add 'type' to struct name_decoration, 2010-06-19). Which makes me wonder if anybody actually _uses_ --bisect-all (which AFAICT is the only way to trigger the problem), but since it's public, I guess we should keep it. I think the sane thing here is to stop advertising name_decoration as a global, and make all callers use add_name_decoration. That makes it easier for callers like this one, and would have caught the regression caused be eb3005e (the compiler would have noticed that we were not passing a type parameter to the function). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Relative submodule URLs
On Fri, Aug 22, 2014 at 11:00 AM, Marc Branchaud marcn...@xiplink.com wrote: A couple of years ago I started to work on such a thing ([1] [2] [3]), mainly because when we tried to change to relative submodules we got bitten when someone used clone's -o option so that his super-repo had no origin remote *and* his was checked out on a detached HEAD. So get_default_remote() failed for him. I didn't have time to complete the work -- it ended up being quite involved. But Junio did come up with an excellent transition plan [4] for adopting a default remote setting. [1] (v0) http://thread.gmane.org/gmane.comp.version-control.git/200145 [2] (v1) http://thread.gmane.org/gmane.comp.version-control.git/201065 [3] (v2) http://thread.gmane.org/gmane.comp.version-control.git/201306 [4] http://article.gmane.org/gmane.comp.version-control.git/201332 I think you're on the right path. However I'd suggest something like the following: [submodule] remote = remote_for_relative_submodules (e.g. `upstream`) I think remote.default would be more generally useful, especially when working with detached checkouts. Honestly speaking I don't use default.remote, even now that I know about it thanks to the discussion ongoing here. The reason is that sometimes I push my branches to origin, sometimes I push them to my fork. I like explicit control as to which one I push to. I also sync my git config file to dropbox and I use it on multiple projects and platforms. I don't use the same push destination workflow on all projects. It seems to get in the way of my workflow more than it helps. I really only ever have two needs: 1. Push explicitly to my remote (e.g. `git push fork` or `git push origin`) 2. Push to the tracked branch (e.g. `git push`) I'm also not sure how `push.default = simple` conflicts with the usage of `remote.default`, since in the tracked-repo case, you must explicitly specify the source ref to push. Is this behavior documented somewhere? (For the record, I would also be happy if clone got rid of its -o option and origin became the sacred, reserved remote name (perhaps translated into other languages as needed) that clone always uses no matter what.) [branch.name] submoduleRemote = remote_for_relative_submodule If I understand correctly, you want this so that your branch can be a fork of only the super-repo while the submodules are not forked and so they should keep tracking their original repo. That's correct. But this is case-by-case. Sometimes I make a change where I want the submodule forked (rare), most times I don't. Sometimes I can get away with pushing changes to the submodule and worrying about it later since I know the submodule ref won't move forward unless someone does update --remote (which isn't often or only done as needed). To me this seems to be going in the opposite direction of having branches recursively apply to submodules, which I think most of us want. A branch should fork the entire repo, including its submodules. The implication is that if you want to push that branch somewhere, that somewhere needs to be able to accept the forks of the submodules *even if those submodules aren't changed in your branch* because at the very least the branch ref has to exist in the submodules' repositories. There are many levels on which this can apply. When it comes to checkouts and such, I agree. However, how will this impact *creating* branches? What about forking? Do you expect submodule forking branching to be automatic as well? Based on your description, it seems so (although a new branch doesn't necessarily have to correspond to a new fork, unless I'm misunderstanding you). This seems difficult to do, especially the forking part since you would need an API for this (Github, Atlassian Stash, etc), unless you are thinking of something clever like local/relative forks. However the inconvenience of forking manually isn't the main reason why I avoid forking submodules. It's the complication of pull requests. There is no uniformity there, which is unfortunate. Recursive pull requests are something outside the scope of git, I realize that, but it would still be nice. However the suggestion you make here lays the foundation for that I think. With absolute-path submodules, the push is a simple as creating the branch ref in the submodules' home repositories -- even if the main somewhere you're pushing to isn't one of those repositories. With relative-path submodules, the push's target repo *must* also have the submodules in their proper places, so that they can get updated. Furthermore, if you clone a repo that has relative-path submodules you *must* also clone the submodules. Robert, I think what you'll say to this is that you still want your branch to track the latest submodules updates from their home repository. (BTW, I'm confused with how you're using the terms upstream and origin. I'll use home to refer to the repository where everything
Re: [PATCH] bisect: save heap memory. allocate only the required amount
On Mon, Aug 25, 2014 at 3:35 PM, Jeff King p...@peff.net wrote: On Sun, Aug 24, 2014 at 07:47:24PM +0530, Arjun Sreedharan wrote: diff --git a/bisect.c b/bisect.c index d6e851d..c96aab0 100644 --- a/bisect.c +++ b/bisect.c @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n } qsort(array, cnt, sizeof(*array), compare_commit_dist); for (p = list, i = 0; i cnt; i++) { - struct name_decoration *r = xmalloc(sizeof(*r) + 100); + char name[100]; + sprintf(name, dist=%d, array[i].distance); + int name_len = strlen(name); + struct name_decoration *r = xmalloc(sizeof(*r) + name_len); This allocation should be name_len + 1 for the NUL-terminator, no? I wondered about that too, but as struct name_decoration is defined like this: struct name_decoration { struct name_decoration *next; int type; char name[1]; }; the .name field of this struct already has one char, so the allocation above should be ok. It looks like add_name_decoration in log-tree already handles half of what you are adding here. Can we just make that available globally (it is manipulating the already-global struct decoration name_decoration)? Yeah, it looks like it should be better. Note that add_name_decoration() does: int nlen = strlen(name); struct name_decoration *res = xmalloc(sizeof(struct name_decoration) + nlen); so it also relies on the fact that .name contains one char. I also notice that we do not set r-type at all, meaning the decoration lookup code in log-tree will access uninitialized memory (worse, it will use it as a pointer offset into the color list; I got a segfault when I tried to run git rev-list --bisect-all v1.8.0..v1.9.0). I think we need this: diff --git a/bisect.c b/bisect.c index d6e851d..e2a7682 100644 --- a/bisect.c +++ b/bisect.c @@ -219,6 +219,7 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n struct object *obj = (array[i].commit-object); sprintf(r-name, dist=%d, array[i].distance); + r-type = 0; r-next = add_decoration(name_decoration, obj, r); p-item = array[i].commit; p = p-next; at a minimum. Yeah if we don't use add_name_decoration() we would need that. Thanks for noticing. It looks like this was a regression caused by eb3005e (commit.h: add 'type' to struct name_decoration, 2010-06-19). Which makes me wonder if anybody actually _uses_ --bisect-all (which AFAICT is the only way to trigger the problem), but since it's public, I guess we should keep it. Yeah, we should probably keep it. I think the sane thing here is to stop advertising name_decoration as a global, and make all callers use add_name_decoration. That makes it easier for callers like this one, and would have caught the regression caused be eb3005e (the compiler would have noticed that we were not passing a type parameter to the function). I agree. Thanks, Christian. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Relative submodule URLs
On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt hvo...@hvoigt.net wrote: New --with--remote parameter for 'git submodule' While having said all that about submodule settings I think a much much simpler start is to go ahead with a commandline setting, like Robert proposed here[2]. For that we do not have to worry about how it can be stored, transported, defined per submodule or on a branch, since answers to this are given at the commandline (and current repository state). There are still open questions about this though: * Should the name in the submodule be 'origin' even though you specified --with-remote=somewhere? For me its always confusing to have the same/similar remotes named differently in different repositories. That why I try to keep the names the same in all my clones of repositories (i.e. for my private, github, upstream remotes). * When you do a 'git submodule sync --with-remote=somewhere' should the remote be added or replaced. My opinion on these are: The remote should be named as in the superproject so --with-remote=somewhere adds/replaces the remote 'somewhere' in the submodules named on the commandline (or all in case no submodule is specified). In case of a fresh clone of the submodule, there would be no origin but only a remote under the new name. Would the --with-remote feature I describe be a feasible start for you Robert? What do others think? Is the naming of the parameter '--with-remote' alright? Cheers Heiko [1] http://article.gmane.org/gmane.comp.version-control.git/255512 [2] http://article.gmane.org/gmane.comp.version-control.git/255512 [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values Hi Heiko, My last email response was in violation of your request to keep the two topics separate, sorry about that. I started typing it this weekend and completed the draft this morning, without having read this response from you first. At this point my only intention was to start discussion on a possible short-term solution. I realize the Git developers are working hard on improving submodule workflow for the long term. In addition I do not have the domain expertise to properly make suggestions in regards to longer-term solutions, so I leave that to you :-) The --with-remote feature would allow me to begin using relative submodules because: On a per-submodule basis, I can specify the remote it will use. When I fork a submodule and need to start tracking it, I can run `git submodule sync --with-remote fork`, which will take my super repo's 'fork' remote, REPLACE 'origin' in the submodule with that URL, and also redo the relative URL calculation. This is ideal since I use HTTP at home (so I can use my proxy server to access git behind firewall at work) and at work physically I use SSH for performance (to avoid HTTP protocol). I also like the idea of never having to update my submodule URLs again if the git server moves, domain name changes, or whatever else. Here is what I think would make the feature most usable. I think you went over some of these ideas but I just want to clarify, to make sure we're on the same page. Please correct me as needed. 1. Running `git submodule update --with-remote name` shall fail the command unconditionally. 2. Using the `--with-remote` option on submodule `update` or `sync` will fail if it detects absolute submodule URLs in .gitmodule 3. Running `git submodule update --init --with-remote name` shall fail the command ONLY if a submodule is being processed that is NOT also being initialized. 4. The behavior of git submodule's `update` or `sync` commands combined with `--with-remote` will REPLACE or CREATE the 'origin' remote in each submodule it is run in. We will not allow the user to configure what the submodule remote name will end up being (I think this is current behavior and forces good practice; I consider `origin` an adopted standard for git, and actually wish it was more enforced for super projects as well!) Let me know if I've missed anything. Once we clarify requirements I'll attempt to start work on this during my free time. I'll start by testing this through msysgit, since I do not have linux installed, but I have Linux Mint running in a Virtual Machine so I can test on both platforms as needed (I don't have a lot of experience on Linux though). I hope you won't mind me reaching out for questions as needed, however I will attempt to be as resourceful as possible since I know you're all busy. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Re: Relative submodule URLs
On Mon, Aug 25, 2014 at 9:29 AM, Robert Dailey rcdailey.li...@gmail.com wrote: On Sun, Aug 24, 2014 at 8:34 AM, Heiko Voigt hvo...@hvoigt.net wrote: New --with--remote parameter for 'git submodule' While having said all that about submodule settings I think a much much simpler start is to go ahead with a commandline setting, like Robert proposed here[2]. For that we do not have to worry about how it can be stored, transported, defined per submodule or on a branch, since answers to this are given at the commandline (and current repository state). There are still open questions about this though: * Should the name in the submodule be 'origin' even though you specified --with-remote=somewhere? For me its always confusing to have the same/similar remotes named differently in different repositories. That why I try to keep the names the same in all my clones of repositories (i.e. for my private, github, upstream remotes). * When you do a 'git submodule sync --with-remote=somewhere' should the remote be added or replaced. My opinion on these are: The remote should be named as in the superproject so --with-remote=somewhere adds/replaces the remote 'somewhere' in the submodules named on the commandline (or all in case no submodule is specified). In case of a fresh clone of the submodule, there would be no origin but only a remote under the new name. Would the --with-remote feature I describe be a feasible start for you Robert? What do others think? Is the naming of the parameter '--with-remote' alright? Cheers Heiko [1] http://article.gmane.org/gmane.comp.version-control.git/255512 [2] http://article.gmane.org/gmane.comp.version-control.git/255512 [3] https://github.com/jlehmann/git-submod-enhancements/wiki#special-ref-overriding-gitmodules-values Hi Heiko, My last email response was in violation of your request to keep the two topics separate, sorry about that. I started typing it this weekend and completed the draft this morning, without having read this response from you first. At this point my only intention was to start discussion on a possible short-term solution. I realize the Git developers are working hard on improving submodule workflow for the long term. In addition I do not have the domain expertise to properly make suggestions in regards to longer-term solutions, so I leave that to you :-) The --with-remote feature would allow me to begin using relative submodules because: On a per-submodule basis, I can specify the remote it will use. When I fork a submodule and need to start tracking it, I can run `git submodule sync --with-remote fork`, which will take my super repo's 'fork' remote, REPLACE 'origin' in the submodule with that URL, and also redo the relative URL calculation. This is ideal since I use HTTP at home (so I can use my proxy server to access git behind firewall at work) and at work physically I use SSH for performance (to avoid HTTP protocol). I also like the idea of never having to update my submodule URLs again if the git server moves, domain name changes, or whatever else. Here is what I think would make the feature most usable. I think you went over some of these ideas but I just want to clarify, to make sure we're on the same page. Please correct me as needed. 1. Running `git submodule update --with-remote name` shall fail the command unconditionally. 2. Using the `--with-remote` option on submodule `update` or `sync` will fail if it detects absolute submodule URLs in .gitmodule 3. Running `git submodule update --init --with-remote name` shall fail the command ONLY if a submodule is being processed that is NOT also being initialized. 4. The behavior of git submodule's `update` or `sync` commands combined with `--with-remote` will REPLACE or CREATE the 'origin' remote in each submodule it is run in. We will not allow the user to configure what the submodule remote name will end up being (I think this is current behavior and forces good practice; I consider `origin` an adopted standard for git, and actually wish it was more enforced for super projects as well!) Let me know if I've missed anything. Once we clarify requirements I'll attempt to start work on this during my free time. I'll start by testing this through msysgit, since I do not have linux installed, but I have Linux Mint running in a Virtual Machine so I can test on both platforms as needed (I don't have a lot of experience on Linux though). I hope you won't mind me reaching out for questions as needed, however I will attempt to be as resourceful as possible since I know you're all busy. Thanks. Thought of a few more: 5. If `--with-remote` is unspecified, behavior will continue as it currently does (I'm not clear on the precedence here of various options, but I assume: `remote.default` first, then `branch.name.remote`) 6. `--with-remote` will take
Re: [PATCH] bisect: save heap memory. allocate only the required amount
On Mon, Aug 25, 2014 at 04:06:52PM +0200, Christian Couder wrote: This allocation should be name_len + 1 for the NUL-terminator, no? I wondered about that too, but as struct name_decoration is defined like this: struct name_decoration { struct name_decoration *next; int type; char name[1]; }; the .name field of this struct already has one char, so the allocation above should be ok. Yeah, you're right. I would argue it should just be FLEX_ARRAY for consistency with other spots, though (in which case add_name_decoration needs to be updated with a +1). Running git grep '^char [^ ]*\[[01]]' -- '*.[ch]' shows that this is one of only two spots that don't use FLEX_ARRAY (and the other has a comment explaining why not). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
On Aug 25, 2014, at 1:38 PM, Jeff King p...@peff.net wrote: On Sun, Aug 24, 2014 at 06:07:44PM +0200, Steffen Prohaska wrote: diff --git a/wrapper.c b/wrapper.c index bc1bfb8..69d1c9b 100644 --- a/wrapper.c +++ b/wrapper.c @@ -11,14 +11,18 @@ static void (*try_to_free_routine)(size_t size) = do_nothing; static void memory_limit_check(size_t size) { -static int limit = -1; -if (limit == -1) { -const char *env = getenv(GIT_ALLOC_LIMIT); -limit = env ? atoi(env) * 1024 : 0; +static size_t limit = SIZE_MAX; +if (limit == SIZE_MAX) { You use SIZE_MAX as the sentinel for not set, and 0 as the sentinel for no limit. That seems kind of backwards. I guess you are inheriting this from the existing code, which lets GIT_ALLOC_LIMIT=0 mean no limit. I'm not sure if we want to keep that or not (it would be backwards incompatible to change it, but we are already breaking compatibility here by assuming bytes rather than kilobytes; I think that's OK because this is not a documented feature, or one intended to be used externally). I think it's reasonable that GIT_ALLOC_LIMIT=0 means no limit, so that the limit can easily be disabled temporarily. But I could change the sentinel and handle 0 like: if (git_parse_ulong(env, val)) { if (!val) { val = SIZE_MAX; } } Maybe we should do this. +const char *var = GIT_ALLOC_LIMIT; +unsigned long val = 0; +const char *env = getenv(var); +if (env !git_parse_ulong(env, val)) +die(Failed to parse %s, var); +limit = val; } This and the next patch both look OK to me, but I notice this part is largely duplicated between the two. We already have git_env_bool to do a similar thing for boolean environment variables. Should we do something similar like: diff --git a/config.c b/config.c index 058505c..11919eb 100644 --- a/config.c +++ b/config.c @@ -1122,6 +1122,14 @@ int git_env_bool(const char *k, int def) return v ? git_config_bool(k, v) : def; } +unsigned long git_env_ulong(const char *k, unsigned long val) +{ + const char *v = getenv(k); + if (v !git_parse_ulong(k, val)) + die(failed to parse %s, k); + return val; +} + int git_config_system(void) { return !git_env_bool(GIT_CONFIG_NOSYSTEM, 0); It's not a lot of code, but I think the callers end up being much easier to read: if (limit == SIZE_MAX) limit = git_env_ulong(GIT_ALLOC_LIMIT, 0); I think you're right. I'll change it. Steffen-- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 2/4] Change GIT_ALLOC_LIMIT check to use git_parse_ulong()
On Mon, Aug 25, 2014 at 05:06:22PM +0200, Steffen Prohaska wrote: I think it's reasonable that GIT_ALLOC_LIMIT=0 means no limit, so that the limit can easily be disabled temporarily. IMHO, GIT_ALLOC_LIMIT= (i.e., the empty string) would be a good way to say that (and I guess that even works currently, due to the way atoi works, but I suspect git_parse_ulong might complain). It is probably not worth worrying about too much. This is not even a user-facing interface, and the test scripts just set it to 0. So I'd be OK going that direction, or just leaving it as-is. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Help improving the future of debugging
Dear Git Developers, since 1974 researchers and software developers try to ease software debugging. Over the last years, they created many new tools and formalized methods. We are interested if these advancements have reached professional software developers and how they influenced their approach. To find this out, we are conducting an Online Survey for Software Developers. From the results we expect new insights into debugging practice that help us to suggest new directions for future research. So if you are a software developer or know any software developers, you can really help us. The survey is, of course, fully anonymous and will take about 15 minutes to fill out. Feel free to redistribute this message to anyone who you think might be interested. The survey can be reached at: http://www.uni-potsdam.de/skopie-up/index.php/689349 Thank you for your interest, Benjamin Siegmund -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] resolved deltas
Am 23.08.2014 um 13:18 schrieb Jeff King: On Sat, Aug 23, 2014 at 07:04:59AM -0400, Jeff King wrote: On Sat, Aug 23, 2014 at 06:56:40AM -0400, Jeff King wrote: So I think your patch is doing the right thing. By the way, if you want to add a test to your patch, there is infrastructure in t5308 to create packs with duplicate objects. If I understand the problem correctly, you could trigger this by having a delta object whose base is duplicated, even without the missing object. This actually turned out to be really easy. The test below fails without your patch and passes with it. Please feel free to squash it in. diff --git a/t/t5308-pack-detect-duplicates.sh b/t/t5308-pack-detect-duplicates.sh index 9c5a876..50f7a69 100755 --- a/t/t5308-pack-detect-duplicates.sh +++ b/t/t5308-pack-detect-duplicates.sh @@ -77,4 +77,19 @@ test_expect_success 'index-pack can reject packs with duplicates' ' test_expect_code 1 git cat-file -e $LO_SHA1 ' +test_expect_success 'duplicated delta base does not trigger assert' ' + clear_packs + { + A=01d7713666f4de822776c7622c10f1b07de280dc + B=e68fe8129b546b101aee9510c5328e7f21ca1d18 + pack_header 3 + pack_obj $A $B + pack_obj $B + pack_obj $B + } dups.pack + pack_trailer dups.pack + git index-pack --stdin dups.pack + test_must_fail git index-pack --stdin --strict dups.pack +' + test_done Thanks, that looks good. But while preparing the patch I noticed that the added test sometimes fails. Helgrind pointed outet a race condition. It is not caused by the patch to turn the asserts into regular ifs, however -- here's a Helgrind report for the original code with the new test: ==34949== Helgrind, a thread error detector ==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al. ==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==34949== Command: /home/lsr/src/git/t/../bin-wrappers/git index-pack --stdin ==34949== ==34949== Helgrind, a thread error detector ==34949== Copyright (C) 2007-2013, and GNU GPL'd, by OpenWorks LLP et al. ==34949== Using Valgrind-3.9.0 and LibVEX; rerun with -h for copyright info ==34949== Command: /home/lsr/src/git/git index-pack --stdin ==34949== ==34949== ---Thread-Announcement-- ==34949== ==34949== Thread #3 was created ==34949==at 0x594DF7E: clone (clone.S:74) ==34949==by 0x544A2B9: do_clone.constprop.3 (createthread.c:75) ==34949==by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245) ==34949==by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269) ==34949==by 0x43ABB8: cmd_index_pack (index-pack.c:1097) ==34949==by 0x405B6A: handle_builtin (git.c:351) ==34949==by 0x404CE8: main (git.c:575) ==34949== ==34949== ---Thread-Announcement-- ==34949== ==34949== Thread #2 was created ==34949==at 0x594DF7E: clone (clone.S:74) ==34949==by 0x544A2B9: do_clone.constprop.3 (createthread.c:75) ==34949==by 0x544B762: pthread_create@@GLIBC_2.2.5 (createthread.c:245) ==34949==by 0x4C2D55D: pthread_create_WRK (hg_intercepts.c:269) ==34949==by 0x43ABB8: cmd_index_pack (index-pack.c:1097) ==34949==by 0x405B6A: handle_builtin (git.c:351) ==34949==by 0x404CE8: main (git.c:575) ==34949== ==34949== ==34949== ==34949== Possible data race during read of size 4 at 0x5E15910 by thread #3 ==34949== Locks held: none ==34949==at 0x439327: find_unresolved_deltas (index-pack.c:918) ==34949==by 0x439666: threaded_second_pass (index-pack.c:1002) ==34949==by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233) ==34949==by 0x544B0A3: start_thread (pthread_create.c:309) ==34949== ==34949== This conflicts with a previous write of size 4 by thread #2 ==34949== Locks held: none ==34949==at 0x4390E2: resolve_delta (index-pack.c:865) ==34949==by 0x439340: find_unresolved_deltas (index-pack.c:919) ==34949==by 0x439666: threaded_second_pass (index-pack.c:1002) ==34949==by 0x4C2D6F6: mythread_wrapper (hg_intercepts.c:233) ==34949==by 0x544B0A3: start_thread (pthread_create.c:309) ==34949== ==34949== Address 0x5E15910 is 48 bytes inside a block of size 256 alloc'd ==34949==at 0x4C2A7D0: calloc (vg_replace_malloc.c:618) ==34949==by 0x50CA83: xcalloc (wrapper.c:119) ==34949==by 0x439AF6: cmd_index_pack (index-pack.c:1643) ==34949==by 0x405B6A: handle_builtin (git.c:351) ==34949==by 0x404CE8: main (git.c:575) ==34949== git: builtin/index-pack.c:918: find_unresolved_deltas_1: Assertion `child-real_type == OBJ_REF_DELTA' failed. ==34949== ==34949== For counts of detected and suppressed errors, rerun with: -v ==34949== Use --history-level=approx or =none to gain increased speed, at ==34949== the cost of reduced accuracy of conflicting-access
Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
On Aug 25, 2014, at 2:43 PM, Jeff King p...@peff.net wrote: On Sun, Aug 24, 2014 at 06:07:46PM +0200, Steffen Prohaska wrote: The data is streamed to the filter process anyway. Better avoid mapping the file if possible. This is especially useful if a clean filter reduces the size, for example if it computes a sha1 for binary data, like git media. The file size that the previous implementation could handle was limited by the available address space; large files for example could not be handled with (32-bit) msysgit. The new implementation can filter files of any size as long as the filter output is small enough. The new code path is only taken if the filter is required. The filter consumes data directly from the fd. The original data is not available to git, so it must fail if the filter fails. Can you clarify this second paragraph a bit more? If I understand correctly, we handle a non-required filter failing by just reading the data again (which we can do because we either read it into memory ourselves, or mmap it). We don't read the data again. convert_to_git() assumes that it is already in memory and simply keeps the original buffer if the filter fails. With the streaming approach, we will read the whole file through our stream; if that fails we would then want to read the stream from the start. Couldn't we do that with an lseek (or even an mmap with offset 0)? That obviously would not work for non-file inputs, but I think we address that already in index_fd: we push non-seekable things off to index_pipe, where we spool them to memory. It could be handled that way, but we would be back to the original problem that 32-bit git fails for large files. The convert code path currently assumes that all data is available in a single buffer at some point to apply crlf and ident filters. If the initial filter, which is assumed to reduce the file size, fails, we could seek to 0 and read the entire file. But git would then fail for large files with out-of-memory. We would not gain anything for the use case that I describe in the commit message's first paragraph. To implement something like the ideal strategy below, the entire convert machinery for crlf and ident would have to be converted to a streaming approach. Another option would be to detect that only the clean filter would be applied and not crlf and ident. Maybe we could get away with something simpler then. But I think that if the clean filter's purpose is to reduce file size, it does not make sense to try to handle the case of a failing filter with a fallback plan. The filter should simply be marked required, because any sane operation requires it. So it seems like the ideal strategy would be: 1. If it's seekable, try streaming. If not, fall back to lseek/mmap. 2. If it's not seekable and the filter is required, try streaming. We die anyway if we fail. 3. If it's not seekable and the filter is not required, decide based on file size: a. If it's small, spool to memory and proceed as we do now. b. If it's big, spool to a seekable tempfile. Your patch implements part 2. But I would think part 1 is the most common case. And while part 3b seems unpleasant, it is better than the current code (with or without your patch), which will do 3a on a large file. Hmm. Though I guess in (3) we do not have the size up front, so it's complicated (we could spool N bytes to memory, then start dumping to a file after that). I do not think we necessarily need to implement that part, though. It seems like (1) is the thing I would expect to hit the most (i.e., people do not always mark their filters are required). Well, I think they have to mark it if the filter's purpose is to reduce size. I'll add a bit of the discussion to the commit message. I'm not convinced that we should do more at this point. +} else { +/* dup(), because copy_fd() closes the input fd. */ +fd = dup(params-fd); Not a problem you are introducing, but this seem kind of like a misfeature in copy_fd. Is it worth fixing? The function only has two existing callers. I found it confusing. I think it's worth fixing. Steffen -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/5] fast-import: fix buffer overflow in dump_tags
Jeff, We have a fix like this in the next set of transaction updates. https://code-review.googlesource.com/#/c/1012/13/fast-import.c However, if your concerns are the integrity of the servers and not taking any chances you might not want to wait for my patches to graduate. ronnie sahlberg On Fri, Aug 22, 2014 at 10:32 PM, Jeff King p...@peff.net wrote: When creating a new annotated tag, we sprintf the refname into a static-sized buffer. If we have an absurdly long tagname, like: git init repo cd repo git commit --allow-empty -m foo git tag -m message mytag git fast-export mytag | perl -lpe '/^tag/ and s/mytag/a x 8192/e' | git fast-import input we'll overflow the buffer. We can fix it by using a strbuf. Signed-off-by: Jeff King p...@peff.net --- I'm not sure how easily exploitable this is. The buffer is on the stack, and we definitely demolish the return address. But we never actually return from the function, since lock_ref_sha1 will fail in such a case and die (it cannot succeed because the name is longer than PATH_MAX, which we check when concatenating it with $GIT_DIR). Still, there is no limit to the size of buffer you can feed it, so it's entirely possible you can overwrite something else and cause some mischief. So I wouldn't call it trivially exploitable, but I would not bet my servers that it is not (and of course it is easy to trigger if you can convince somebody to run fast-import a stream, so any remote helpers produce a potentially vulnerable situation). fast-import.c | 10 ++ 1 file changed, 6 insertions(+), 4 deletions(-) diff --git a/fast-import.c b/fast-import.c index f25a4ae..a1479e9 100644 --- a/fast-import.c +++ b/fast-import.c @@ -1734,14 +1734,16 @@ static void dump_tags(void) static const char *msg = fast-import; struct tag *t; struct ref_lock *lock; - char ref_name[PATH_MAX]; + struct strbuf ref_name = STRBUF_INIT; for (t = first_tag; t; t = t-next_tag) { - sprintf(ref_name, tags/%s, t-name); - lock = lock_ref_sha1(ref_name, NULL); + strbuf_reset(ref_name); + strbuf_addf(ref_name, tags/%s, t-name); + lock = lock_ref_sha1(ref_name.buf, NULL); if (!lock || write_ref_sha1(lock, t-sha1, msg) 0) - failure |= error(Unable to update %s, ref_name); + failure |= error(Unable to update %s, ref_name.buf); } + strbuf_release(ref_name); } static void dump_marks_helper(FILE *f, -- 2.1.0.346.ga0367b9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] Loop index increases monotonically when reading unmerged entries
Jaime Soriano Pastor jsorianopas...@gmail.com writes: I think this line is dangerous, if add_cache_entry is not able to remove higher-stages it will be looping forever, as happens in the case of this thread. I cannot see why it's even needed, and removing it doesn't break any test. On Sun, Aug 24, 2014 at 7:57 PM, Jaime Soriano Pastor jsorianopas...@gmail.com wrote: Signed-off-by: Jaime Soriano Pastor jsorianopas...@gmail.com --- read-cache.c | 1 - 1 file changed, 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index c1a9619..3d70386 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1971,7 +1971,6 @@ int read_index_unmerged(struct index_state *istate) if (add_index_entry(istate, new_ce, 0)) return error(%s: cannot drop to stage #0, new_ce-name); - i = index_name_pos(istate, new_ce-name, len); I think the original idea was that regardless of how many entries with the same name were removed because of the replacement (or addition) of new_ce, by making i point at the newly added new_ce, we would make sure that the loop will continue from the next entry. The if/return expected that add_index_entry() will get rid of all the other entries with the same name as new_ce has or it will return an error. Without the bug in add_index_entry(), because new_ce always has the same name as ce, the entry we found at i by the loop, we know that index_name_pos() will give the same i we already have, so removing this line should be a no-op. Now, add_index_entry() in your case did not notice that it failed to remove all other entries with the same name as new_ce, resulting in your looping forever. Among the merged and unmerged entries with the same name exists in the index file class of index file corruption, we could treat the merged and unmerged entries with the same name not just exists but next to each other, unmerged ones coming immediately after merged one case specially (i.e. declaring that it is more likely for a broken software to leave both next to each other than otherwise) and try to accomodate it as your original patch did. I am not absolutely sure if such a special case is worth it, and with your updated [1/2] read_index_from(): check order of entries when reading index we will not be doing so, which is good. With that safety in place, the bug in add_index_entry() will disappear; it is safe not to adjust i by calling index_name_pos() and this patch, [2/2] read_index_unmerged(): remove unnecessary loop index adjustment, will be a good thing to do. Thanks. } return unmerged; } -- 2.0.4.1.g0b8a4f9.dirty -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] fast-import: clean up pack_data pointer in end_packfile
Print an error before returning when pack_data is NULL ? Otherwise, LGTM On Fri, Aug 22, 2014 at 10:27 PM, Jeff King p...@peff.net wrote: We have a global pointer pack_data pointing to the current pack we have open. Inside end_packfile we have two new pointers, old_p and new_p. The latter points to pack_data, and the former points to the new installed version of the packfile we get when we hand the file off to the regular sha1_file machinery. When then free old_p. Presumably the extra old_p pointer was there so that we could overwrite pack_data with new_p and still free old_p, but we don't do that. We just leave pack_data pointing to bogus memory, and don't overwrite it until we call start_packfile again (if ever). This can cause problems for our die routine, which calls end_packfile to clean things up. If we die at the wrong moment, we can end up looking at invalid memory in pack_data left after the last end_packfile(). Instead, let's make sure we set pack_data to NULL after we free it, and make calling endfile() again with a NULL pack_data a noop (there is nothing to end). We can further make things less confusing by dropping old_p entirely, and moving new_p closer to its point of use. Signed-off-by: Jeff King p...@peff.net --- Noticed while running fast-import under valgrind to debug the next commit. :) fast-import.c | 11 +++ 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/fast-import.c b/fast-import.c index d73f58c..f25a4ae 100644 --- a/fast-import.c +++ b/fast-import.c @@ -946,10 +946,12 @@ static void unkeep_all_packs(void) static void end_packfile(void) { - struct packed_git *old_p = pack_data, *new_p; + if (!pack_data) + return; clear_delta_base_cache(); if (object_count) { + struct packed_git *new_p; unsigned char cur_pack_sha1[20]; char *idx_name; int i; @@ -991,10 +993,11 @@ static void end_packfile(void) pack_id++; } else { - close(old_p-pack_fd); - unlink_or_warn(old_p-pack_name); + close(pack_data-pack_fd); + unlink_or_warn(pack_data-pack_name); } - free(old_p); + free(pack_data); + pack_data = NULL; /* We can't carry a delta across packfiles. */ strbuf_release(last_blob.data); -- 2.1.0.346.ga0367b9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [BUG] resolved deltas
On Sat, Aug 23, 2014 at 3:56 AM, Jeff King p...@peff.net wrote: [+cc spearce, as I think this is a bug in code.google.com's sending side, and he can probably get the attention of the right folks] ... My guess is a bug on the sending side. We have seen duplicate pack objects before, but never (AFAIK) combined with a missing object. This really seems like the sender just sent the wrong data for one object. IIRC, code.google.com is backed by their custom Dulwich implementation which runs on BigTable. We'll see if Shawn can get this to the right people as a bug report. :) Thanks. This is a bug in the code.google.com implementation that is running on Bigtable. I forwarded the report to the team that manages that service so they can investigate further. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Check order when reading index
Jaime Soriano Pastor jsorianopas...@gmail.com writes: Subject: Re: [PATCH 1/2] Check order when reading index Please be careful when crafting the commit title. This single line will be the only one that readers will have to identify the change among hundreds of entries in git shortlog output when trying to see what kind of change went into the project during the given period. Something like: read_index_from(): catch out of order entries while reading an index file perhaps? Signed-off-by: Jaime Soriano Pastor jsorianopas...@gmail.com --- read-cache.c | 18 ++ 1 file changed, 18 insertions(+) diff --git a/read-cache.c b/read-cache.c index 7f5645e..c1a9619 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1438,6 +1438,21 @@ static struct cache_entry *create_from_disk(struct ondisk_cache_entry *ondisk, return ce; } +void check_ce_order(struct cache_entry *ce, struct cache_entry *next_ce) Does this have to be global, i.e. not static void ...? +{ + int name_compare = strcmp(ce-name, next_ce-name); + if (0 name_compare) + die(Unordered stage entries in index); + if (!name_compare) { + if (!ce_stage(ce)) + die(Multiple stage entries for merged file '%s', + ce-name); OK. If ce is at stage #0, no other entry can have the same name regardless of the stage, and next_ce having the same name violates that rule. + if (ce_stage(ce) = ce_stage(next_ce)) + die(Unordered stage entries for '%s', + ce-name); Not quite. We do allow multiple higher stage entries; having two or more stage #1 entries is perfectly fine during a merge resolution, and both ce and next_ce may be pointing at the stage #1 entries of the same path. Replacing the comparison with is sufficient, I think. Thanks. + } +} + /* remember to discard_cache() before reading a different cache! */ int read_index_from(struct index_state *istate, const char *path) { @@ -1499,6 +1514,9 @@ int read_index_from(struct index_state *istate, const char *path) ce = create_from_disk(disk_ce, consumed, previous_name); set_index_entry(istate, i, ce); + if (i 0) + check_ce_order(istate-cache[i - 1], ce); + src_offset += consumed; } strbuf_release(previous_name_buf); -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 5/5] fast-import: stop using lock_ref_sha1
The next ref transaction series does a similar change and ends up removing the function lock_ref_sha1() completely. https://code-review.googlesource.com/#/c/1017/19/refs.c So I think we can drop this patch. ronnie sahlberg On Fri, Aug 22, 2014 at 10:33 PM, Jeff King p...@peff.net wrote: We can use lock_any_ref_for_update instead. Besides being more flexible, the only difference between the two is that lock_ref_sha1 does not allow top-level refs like refs/foo to be updated. However, we know that we do not have such a ref, because we explicitly add the tags/ prefix ourselves. Note that we now must feed the whole name refs/tags/X instead of just tags/X to the function. As a result, our failure error message is uses the longer name. This is probably a good thing, though. As an interesting side note, if we forgot to switch this input to the function, the tests do not currently catch it. We import a tag X and then check that we can access it at tags/X. If we accidentally created tags/X at the top-level $GIT_DIR instead of under refs/, we would still find it due to our ref lookup procedure! We do not make such a mistake in this patch, of course, but while we're thinking about it, let's make the fast-import tests more robust by checking for fully qualified refnames. Signed-off-by: Jeff King p...@peff.net --- As I mentioned, I'd be OK with dropping this one in favor of just waiting for Ronnie's transaction patches to graduate. fast-import.c | 4 ++-- t/t9300-fast-import.sh | 8 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/fast-import.c b/fast-import.c index a1479e9..04a85a4 100644 --- a/fast-import.c +++ b/fast-import.c @@ -1738,8 +1738,8 @@ static void dump_tags(void) for (t = first_tag; t; t = t-next_tag) { strbuf_reset(ref_name); - strbuf_addf(ref_name, tags/%s, t-name); - lock = lock_ref_sha1(ref_name.buf, NULL); + strbuf_addf(ref_name, refs/tags/%s, t-name); + lock = lock_any_ref_for_update(ref_name.buf, NULL, 0, NULL); if (!lock || write_ref_sha1(lock, t-sha1, msg) 0) failure |= error(Unable to update %s, ref_name.buf); } diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh index 5fc9ef2..f4c6673 100755 --- a/t/t9300-fast-import.sh +++ b/t/t9300-fast-import.sh @@ -153,7 +153,7 @@ tag series-A An annotated tag without a tagger EOF test_expect_success 'A: verify tag/series-A' ' - git cat-file tag tags/series-A actual + git cat-file tag refs/tags/series-A actual test_cmp expect actual ' @@ -165,7 +165,7 @@ tag series-A-blob An annotated tag that annotates a blob. EOF test_expect_success 'A: verify tag/series-A-blob' ' - git cat-file tag tags/series-A-blob actual + git cat-file tag refs/tags/series-A-blob actual test_cmp expect actual ' @@ -232,8 +232,8 @@ EOF test_expect_success \ 'A: tag blob by sha1' \ 'git fast-import input - git cat-file tag tags/series-A-blob-2 actual - git cat-file tag tags/series-A-blob-3 actual + git cat-file tag refs/tags/series-A-blob-2 actual + git cat-file tag refs/tags/series-A-blob-3 actual test_cmp expect actual' test_tick -- 2.1.0.346.ga0367b9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fix pack-refs pruning of refs/foo
On Fri, Aug 22, 2014 at 10:23 PM, Jeff King p...@peff.net wrote: I noticed that git pack-refs --all will pack a top-level ref like refs/foo, but will not actually prune $GIT_DIR/refs/foo. I do not see the point in having a policy not to pack refs/foo if --all is given. But even if we did have such a policy, this seems broken; we should either pack and prune, or leave them untouched. I don't see any indication that the existing behavior was intentional. The problem is that pack-refs's prune_ref calls lock_ref_sha1, which enforces this no toplevel behavior. I am not sure there is any real point to this, given that most callers use lock_any_ref_for_update, which is exactly equivalent but without the toplevel check. The first two patches deal with this by switching pack-refs to use lock_any_ref_for_update. This will conflict slightly with Ronnie's ref-transaction work, as he gets rid of lock_ref_sha1 entirely, and moves the code directly into prune_ref. This can be trivially resolved in favor of my patch, I think. The third patch is a cleanup I noticed while looking around, and I think should not conflict with anyone (and is a good thing to do). The last two are trickier. I wondered if we could get rid of lock_ref_sha1 entirely. After pack-refs, there are two callers: fast-import.c and walker.c. After converting the first, it occurred to me that Ronnie might be touching the same areas, and I see that yes, indeed, there's quite a bit of conflict (and he reaches the same end goal of dropping it entirely). So in that sense I do not mind dropping the final two patches. Ronnie's endpoint is much better, moving to a ref_transaction. However, there is actually a buffer overflow in the existing code. Ronnie's series fixes it in a similar way (moving to a strbuf), and I'm fine with that endpoint. But given that the ref transaction code is not yet merged (and would certainly never be maint-track), I think it is worth applying the buffer overflow fix separately. I think the final patch can probably be dropped, then. It is a clean-up, but one that we can just get when Ronnie's series is merged. [1/5]: git-prompt: do not look for refs/stash in $GIT_DIR [2/5]: pack-refs: prune top-level refs like refs/foo [3/5]: fast-import: clean up pack_data pointer in end_packfile [4/5]: fast-import: fix buffer overflow in dump_tags [5/5]: fast-import: stop using lock_ref_sha1 +1 on 1-3 +1 on 4. While I have a similar fix in the transaction series, you should not need to wait for that series to address a security concern. 5: I think this one is not as urgent as the others so would prefer if it is dropped, just so it doesn't cause more merge conflicts than is already present in the transaction series. 1-4: Reviewed-by: Ronnie Sahlberg sahlb...@google.com -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/3] Push the NATIVE_CRLF Makefile variable to C and added a test for native.
Torsten Bögershausen tbo...@web.de writes: On 2014-08-23 00.54, Junio C Hamano wrote: Torsten Bögershausen tbo...@web.de writes: Commit 95f31e9a correctly points out that the NATIVE_CRLF setting is incorrectly set on Mingw git. However, the Makefile variable is not propagated to the C preprocessor and results in no change. This patch pushes the definition to the C code and adds a test to validate that when core.eol as native is crlf, we actually normalize text files to this line ending convention when core.autocrlf is false. Signed-off-by: Pat Thoyts pattho...@users.sourceforge.net Signed-off-by: Stepan Kasal ka...@ucw.cz Signed-off-by: Torsten Bögershausen tbo...@web.de --- Who should I record as the author of this patch? Sorry for missing this, here is what Mingw says: commit 0caba2cacbb9d8e6a31783b45f1a13e52dec6ce8 Author: Pat Thoyts pattho...@users.sourceforge.net Date: Mon Nov 26 00:24:00 2012 + Push the NATIVE_CRLF Makefile variable to C and added a test for native. Commit 95f31e9a correctly points out that the NATIVE_CRLF setting is [] When forwarding somebody else's patch, please start the *body* of your message with the in-body header to force the author, followed by a blank line and then the message, i.e. From: Pat Thoyts pattho...@users.sourceforge.net Commit 95f31e9a correctly points out that the NATIVE_CRLF setting is incorrectly set on Mingw git. However, the Makefile variable is not... ... The request applies to other patches in the series as well. I suspect that using send-email on format-patch output may do the right thing automatically. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: check-ref-format: include refs/ in the argument or to strip it?
On Fri, Aug 22, 2014 at 10:46 PM, Jeff King p...@peff.net wrote: On Fri, Aug 22, 2014 at 11:45:15AM -0700, Jonathan Nieder wrote: Junio C Hamano wrote: implication of which is that the 'at least one slash' rule was to expect things are 'refs/anything' so there will be at least one. Even back then, that anything alone had at least one slash (e.g. heads/master), but the intention was *never* that we would forbid anything that does not have a slash by feeding anything part alone to check-ref-format, i.e. things like refs/stash were designed to be allowed. Now I'm more confused. Until 5f7b202a (2008-01-01), there was a comment if (level 2) return -2; /* at least of form heads/blah */ and that behavior has been preserved since the beginning. Why do most old callers pass a string that doesn't start with refs/ (e.g., see the callers in 03feddd6, 2005-10-13)? Has the intent been to relax the requirement since then? Yeah, this weird do not allow refs/foo behavior has continually confused me. Coincidentally I just noticed a case today where pack-refs treats refs/foo specially for no good reason: http://thread.gmane.org/gmane.comp.version-control.git/255729 After much head scratching over the years, I am of the opinion that nobody every really _meant_ to prevent refs/foo, and that code comments like the one you quote above were an attempt to document existing buggy behavior that was really trying to differentiate HEAD from refs/*. That's just my opinion, though. :) I'd be happy if all of the special-treatment of refs/foo went away and check_refname_format always got the full ref. There are also a lot of places where we assume that a refs will start with refs/heads/ and not just refs/ for_each_branch_ref(), log_ref_setup() (so no reflogs) is_branch() to name a few. This makes the api a bit confusing and hard to predict. Which functions allow refs/foo and which will ignore it? Are there any compelling reasons why refs/foo should be allowed? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/3] t0026: Add missing
Torsten Bögershausen tbo...@web.de writes: Fix the broken chain Reported-By: Eric Sunshine sunsh...@sunshineco.com Signed-off-by: Torsten Bögershausen tbo...@web.de --- Please fold this kind of oops fix breakages discovered in the version that hasn't been reached 'next' to the patch that introduces the breakage, with Helped-by: whom. Thanks. t/t0026-eol-config.sh | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/t/t0026-eol-config.sh b/t/t0026-eol-config.sh index 43a580a..4806969 100755 --- a/t/t0026-eol-config.sh +++ b/t/t0026-eol-config.sh @@ -84,9 +84,9 @@ test_expect_success NATIVE_CRLF 'eol native is crlf' ' rm -rf native_eol mkdir native_eol ( cd native_eol - printf *.txt text\n .gitattributes - printf one\r\ntwo\r\nthree\r\n filedos.txt - printf one\ntwo\nthree\n fileunix.txt + printf *.txt text\n .gitattributes + printf one\r\ntwo\r\nthree\r\n filedos.txt + printf one\ntwo\nthree\n fileunix.txt git init git config core.autocrlf false git config core.eol native -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 18/18] signed push: final protocol update
Shawn Pearce spea...@spearce.org writes: A stateless nonce could look like: nonce = HMAC_SHA1( SHA1(site+path) + '.' + now, site_key ) where site_key is a private key known to the server. It doesn't have to be per-repo. receive-pack would then be willing to accept any nonce whose timestamp is within a window, e.g. 10 minutes of the current time, and whose signature verifies in the HMAC. The 10 minute window is important to allow clients time to generate the object list, perform delta compression, and begin transmitting to the server. Hmph, don't you send the finally tell the other end the sequence of update this ref from old to new and the packdata separately? I think we have a FLUSH in between, and the push certificate is given before the FLUSH, which you do not have to wait for 10 minutes. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 00/18] Signed push
Stefan Beller stefanbel...@gmail.com writes: burden is not an issue, as I'll be signing the push certificate anyway when I push. A signed tag or a signed commit and signed push certificate solves two completely separate and orthogonal issues. What happens if you break into GitHub or k.org and did $ git tag maint_2014_08_22 master_2014_08_22 Ok, I personally haven't used tags a lot. I just tried to git tag -s testbreaktag v2.1.0 git show testbreaktag # However it would still read: tag v2.1.0 Tagger: Junio C Hamano gits...@pobox.com Date: Fri Aug 15 15:09:28 2014 -0700 So as I do not posess your private key I could not create signed tags even if I were to break into github/k.org The point was that after I push to 'maint', you break into the site and copy or move that tag as if I pushed to 'master'. You could argue that I could create a signed tag 'maint-2014-08-25', push it, and if you moved it to tags/master-2014-08-25 as an attacker, the refname would not match the tag line in the signed tag object. While that is true, nobody other thaan fsck checks the contents on the tag line in practice. But more importantly. I may deem a commit a sensible version for the 'master' branch of one repository but it would not be sensible for another repository's 'master' branch. Imagine a world just like the kernel development during 2.6 era used to be, where there was a separate tree 2.4 maintained with its own 'master' branch. What is appropriate for the tip of 'master' to one repository is not good for the other one, and your timestamped tag line may say for which branch the push was for but does not say for which repository. The exact problem is also shared with the desire to have a branch object expressed elsewhere; as there is no identity for a branch in a distributed world, trying to name branch as if it is a global entity without mentioning what repository will lead to tears. Besides, these tags/maint-2014-08-25 tags will be interesting only for those who are auditing and not for general public, and we do not have a good way to hide uninteresting refs until those with narrow niche interest ask yet, which is something we may want to add soon, but I do not want auditable push taken hostage to that unrelated feature. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: check-ref-format: include refs/ in the argument or to strip it?
Ronnie Sahlberg wrote: On Fri, Aug 22, 2014 at 10:46 PM, Jeff King p...@peff.net wrote: Yeah, this weird do not allow refs/foo behavior has continually confused me. Coincidentally I just noticed a case today where pack-refs treats refs/foo specially for no good reason: http://thread.gmane.org/gmane.comp.version-control.git/255729 After much head scratching over the years, I am of the opinion that nobody every really _meant_ to prevent refs/foo, and that code comments like the one you quote above were an attempt to document existing buggy behavior that was really trying to differentiate HEAD from refs/*. That's just my opinion, though. :) It's still very puzzling to me. The comment came at the same time as the behavior, in v0.99.9~120 (git-check-ref-format: reject funny ref names, 2005-10-13). Before that, the behavior was even stranger --- it checked that there was exactly one slash in the argument. I'm willing to believe we might not want that check any more, though. [...] There are also a lot of places where we assume that a refs will start with refs/heads/ and not just refs/ for_each_branch_ref(), log_ref_setup() (so no reflogs) is_branch() to name a few. for_each_branch_ref is for iterating over local branches, which are defined as refs that start with refs/heads/*. Likewise, the only point of is_branch is to check whether a ref is under refs/heads/*. That's not an assumption about all refs. log_ref_setup implements the policy that there are only reflogs for: * refs where the reflog was explicitly created (git branch --create-reflog does this, but for some reason there's no corresponding git update-ref --create-reflog so people have to use mkdir directly for other refs), plus * if the '[core] logallrefupdates' configuration is enabled (and it is by default for non-bare repositories), then HEAD, refs/heads/*, refs/notes/*, and refs/remotes/*. This is documented in git-config(1) --- see core.logAllRefUpdates. That way, when tools internally use other refs (e.g., FETCH_HEAD), git doesn't have to automatically incur the cost of maintaining the reflog for those. What other refs should there be reflogs for? I haven't thought carefully about this. It definitely isn't an assumption that *all* refs will match that pattern. But it might be worth changing for other reasons. Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
Jeff King p...@peff.net writes: On Mon, Aug 25, 2014 at 04:06:52PM +0200, Christian Couder wrote: This allocation should be name_len + 1 for the NUL-terminator, no? I wondered about that too, but as struct name_decoration is defined like this: struct name_decoration { struct name_decoration *next; int type; char name[1]; }; the .name field of this struct already has one char, so the allocation above should be ok. Yeah, you're right. I would argue it should just be FLEX_ARRAY for consistency with other spots, though (in which case add_name_decoration needs to be updated with a +1). Running git grep '^ char [^ ]*\[[01]]' -- '*.[ch]' shows that this is one of only two spots that don't use FLEX_ARRAY (and the other has a comment explaining why not). Good digging, and I agree that it should use the FLEX_ARRAY for consistency. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 4/4] convert: Stream from fd to required clean filter instead of mmap
Steffen Prohaska proha...@zib.de writes: Couldn't we do that with an lseek (or even an mmap with offset 0)? That obviously would not work for non-file inputs, but I think we address that already in index_fd: we push non-seekable things off to index_pipe, where we spool them to memory. It could be handled that way, but we would be back to the original problem that 32-bit git fails for large files. Correct, and you are making an incremental improvement so that such a large blob can be handled _when_ the filters can successfully munge it back and forth. If we fail due to out of memory when the filters cannot, that would be the same as without your improvement, so you are still making progress. To implement something like the ideal strategy below, the entire convert machinery for crlf and ident would have to be converted to a streaming approach. Yes, that has always been the longer term vision since the day the streaming infrastructure was introduced. So it seems like the ideal strategy would be: 1. If it's seekable, try streaming. If not, fall back to lseek/mmap. 2. If it's not seekable and the filter is required, try streaming. We die anyway if we fail. Puzzled... Is it assumed that any content the filters tell us to use the contents from the db as-is by exiting with non-zero status will always be large not to fit in-core? For small contents, isn't this ideal strategy a regression? 3. If it's not seekable and the filter is not required, decide based on file size: a. If it's small, spool to memory and proceed as we do now. b. If it's big, spool to a seekable tempfile. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
RE: [PATCH 00/18] Signed push
-Original Message- From: Junio C Hamano Sent: Monday, August 25, 2014 13:55 Stefan Beller stefanbel...@gmail.com writes: burden is not an issue, as I'll be signing the push certificate anyway when I push. A signed tag or a signed commit and signed push certificate solves two completely separate and orthogonal issues. What happens if you break into GitHub or k.org and did $ git tag maint_2014_08_22 master_2014_08_22 Ok, I personally haven't used tags a lot. I just tried to git tag -s testbreaktag v2.1.0 git show testbreaktag # However it would still read: tag v2.1.0 Tagger: Junio C Hamano gits...@pobox.com Date: Fri Aug 15 15:09:28 2014 -0700 So as I do not posess your private key I could not create signed tags even if I were to break into github/k.org The point was that after I push to 'maint', you break into the site and copy or move that tag as if I pushed to 'master'. What is needed is not a signed push per se, but rather a need for a set of signed HEADS ... You could argue that I could create a signed tag 'maint-2014-08-25', push it, and if you moved it to tags/master-2014-08-25 as an attacker, the refname would not match the tag line in the signed tag object. While that is true, nobody other thaan fsck checks the contents on the tag line in practice. But more importantly. I may deem a commit a sensible version for the 'master' branch of one repository but it would not be sensible for another repository's 'master' branch. Imagine a world just like the kernel development during 2.6 era used to be, where there was a separate tree 2.4 maintained with its own 'master' branch. What is appropriate for the tip of 'master' to one repository is not good for the other one, ... and these signed HEADS need to be tied to a particular repository instance. AFAIK git does not have any unique identifier per repository instance to leverage. If you were to make a repository instance id you could take that and the branch name as input to a signed hash for verification later. But this leads to deeper issues about new workflow, new configuration storage mechanisms, etc. and your timestamped tag line may say for which branch the push was for but does not say for which repository. The exact problem is also shared with the desire to have a branch object expressed elsewhere; as there is no identity for a branch in a distributed world, trying to name branch as if it is a global entity without mentioning what repository will lead to tears. Besides, these tags/maint-2014-08-25 tags will be interesting only for those who are auditing and not for general public, and we do not have a good way to hide uninteresting refs until those with narrow niche interest ask yet, which is something we may want to add soon, but I do not want auditable push taken hostage to that unrelated feature. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/5] fast-import: clean up pack_data pointer in end_packfile
On Mon, Aug 25, 2014 at 10:16:52AM -0700, Ronnie Sahlberg wrote: Print an error before returning when pack_data is NULL ? I don't think so. We call end_packfile in some code paths (like the die handler) as close if it's open. So I think it makes sense for it to be a noop if nothing is open. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/5] fix pack-refs pruning of refs/foo
On Mon, Aug 25, 2014 at 10:38:56AM -0700, Ronnie Sahlberg wrote: [1/5]: git-prompt: do not look for refs/stash in $GIT_DIR [2/5]: pack-refs: prune top-level refs like refs/foo [3/5]: fast-import: clean up pack_data pointer in end_packfile [4/5]: fast-import: fix buffer overflow in dump_tags [5/5]: fast-import: stop using lock_ref_sha1 +1 on 1-3 +1 on 4. While I have a similar fix in the transaction series, you should not need to wait for that series to address a security concern. 5: I think this one is not as urgent as the others so would prefer if it is dropped, just so it doesn't cause more merge conflicts than is already present in the transaction series. OK, I think we're in agreement then. Thanks for looking them over. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: check-ref-format: include refs/ in the argument or to strip it?
Ronnie Sahlberg sahlb...@google.com writes: There are also a lot of places where we assume that a refs will start with refs/heads/ and not just refs/ for_each_branch_ref(), log_ref_setup() (so no reflogs) is_branch() to name a few. for-each-BRANCH-ref and is-BRANCH are explicitly about branches and it is natural that they will work only on refs that start with refs/heads/, no? I am not sure about log-ref-setup (git grep pu does not find it). This makes the api a bit confusing and hard to predict. Which functions allow refs/foo and which will ignore it? Are there any compelling reasons why refs/foo should be allowed? The only one I can think of offhand that we internally use is the stash, but that does not give us any assurance that no third-party tool is using refs/frotz for its own use X-. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: check-ref-format: include refs/ in the argument or to strip it?
On Mon, Aug 25, 2014 at 11:26:36AM -0700, Jonathan Nieder wrote: It's still very puzzling to me. The comment came at the same time as the behavior, in v0.99.9~120 (git-check-ref-format: reject funny ref names, 2005-10-13). Before that, the behavior was even stranger --- it checked that there was exactly one slash in the argument. I'm willing to believe we might not want that check any more, though. Yeah, given that among experiences gits, nobody can figure out a motivation or a history for the feature (and given that it causes problems), I do not see any problem with loosening it. That way, when tools internally use other refs (e.g., FETCH_HEAD), git doesn't have to automatically incur the cost of maintaining the reflog for those. What other refs should there be reflogs for? I haven't thought carefully about this. I think you'd in theory want a reflog for anything. The refs in refs/tags are not meant to change, but if they do (e.g., somebody force-pushes a tag to a server) it is nice to have a log of what happened. I think the same argument could apply to anything in refs/. I think more ephemeral things (like MERGE_HEAD) tend to be in the root, outside of refs. Reflogging those _could_ be useful, but probably isn't (and in the case of something like FETCH_HEAD, would not record all of the information anyway). I wrote the patch below over a year ago and very nearly submitted it. At GitHub we use reflogs frequently for auditing and forensics, and I wanted to have such an audit trail for everything. However, we ended up doing something a little more invasive that I do not think would be that interesting to upstream (though I am happy to submit a patch if people think otherwise): we maintain a separate audit log for all refs that is never pruned, and lives on even when refs are deleted. -- 8 -- Subject: [PATCH] teach core.logallrefupdates an always mode When core.logallrefupdates is true, we only create a new reflog for refs that are under certain well-known hierarchies. The reason is that we know that some hierarchies (like refs/tags) do not typically change, and that unknown hierarchies might not want reflogs at all (e.g., a hypothetical refs/foo might be meant to change often and drop old history immediately). However, sometimes it is useful to override this decision and simply log for all refs, because the safety and audit trail is more important than the performance implications of keeping the log around. This patch introduces a new always mode for the core.logallrefupdates option which will log updates to everything under refs/, regardless where in the hierarchy it is (we still will not log things like ORIG_HEAD and FETCH_HEAD, which are known to be transient). Signed-off-by: Jeff King p...@peff.net --- Looking over the code, I am not sure that it actually works as advertised with respect to ORIG_HEAD, etc. That would be easy enough to fix, though. Documentation/config.txt | 8 +--- branch.c | 2 +- builtin/checkout.c | 2 +- builtin/init-db.c| 2 +- cache.h | 9 - config.c | 7 ++- environment.c| 2 +- refs.c | 23 +-- t/t1400-update-ref.sh| 31 +++ 9 files changed, 71 insertions(+), 15 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index c55c22a..27629df 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -405,10 +405,12 @@ core.logAllRefUpdates:: $GIT_DIR/logs/ref, by appending the new and old SHA-1, the date/time and the reason of the update, but only when the file exists. If this configuration - variable is set to true, missing $GIT_DIR/logs/ref + variable is set to `true`, a missing $GIT_DIR/logs/ref file is automatically created for branch heads (i.e. under - refs/heads/), remote refs (i.e. under refs/remotes/), - note refs (i.e. under refs/notes/), and the symbolic ref HEAD. + `refs/heads/`), remote refs (i.e. under `refs/remotes/`), + note refs (i.e. under `refs/notes/`), and the symbolic ref `HEAD`. + If it is set to `always`, then a missing reflog is automatically + created for any ref under `refs/`. + This information can be used to determine what commit was the tip of a branch 2 days ago. diff --git a/branch.c b/branch.c index 46e8aa8..1d140b7 100644 --- a/branch.c +++ b/branch.c @@ -292,7 +292,7 @@ void create_branch(const char *head, } if (reflog) - log_all_ref_updates = 1; + log_all_ref_updates = LOG_REFS_NORMAL; if (forcing) snprintf(msg, sizeof msg, branch: Reset to %s, diff --git a/builtin/checkout.c b/builtin/checkout.c index f71e745..65bc066 100644 --- a/builtin/checkout.c +++ b/builtin/checkout.c @@ -586,7 +586,7 @@ static void update_refs_for_switch(const struct checkout_opts *opts,
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
Jeff King p...@peff.net writes: Since we do not use the value $(LIB_H) unless either COMPUTE_HEADER_DEPENDENCIES is turned on or the user is building po/git.pot (where it comes in via $(LOCALIZED_C), make is smart enough to not even run this find in most cases. However, we do need to stop using the immediate variable assignment := for $(LOCALIZED_C). That's OK, because it was not otherwise useful here. Signed-off-by: Jeff King p...@peff.net --- I cannot see any reason for the :=, but maybe I am missing something. If the right-hand-side were something like $(shell find ...) that was heavy-weight then it might have made sense, but I do not think it is that. It has stayed to be := ever since it was introduced by cd5513a7 (i18n: Makefile: pot target to extract messages marked for translation, 2011-02-22). And now you use LIB_H only once ;-). Also interestingly, I notice that it is very clear that it is not LIB_H but ANY_H ;-) Makefile | 140 --- 1 file changed, 8 insertions(+), 132 deletions(-) diff --git a/Makefile b/Makefile index cf0ccdf..f2b85c9 100644 --- a/Makefile +++ b/Makefile ... @@ -2128,9 +2004,9 @@ XGETTEXT_FLAGS_C = $(XGETTEXT_FLAGS) --language=C \ XGETTEXT_FLAGS_SH = $(XGETTEXT_FLAGS) --language=Shell \ --keyword=gettextln --keyword=eval_gettextln XGETTEXT_FLAGS_PERL = $(XGETTEXT_FLAGS) --keyword=__ --language=Perl -LOCALIZED_C := $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) -LOCALIZED_SH := $(SCRIPT_SH) -LOCALIZED_PERL := $(SCRIPT_PERL) +LOCALIZED_C = $(C_OBJ:o=c) $(GENERATED_H) +LOCALIZED_SH = $(SCRIPT_SH) +LOCALIZED_PERL = $(SCRIPT_PERL) ifdef XGETTEXT_INCLUDE_TESTS LOCALIZED_C += t/t0200/test.c -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
On Mon, Aug 25, 2014 at 12:30:51PM -0700, Junio C Hamano wrote: Also interestingly, I notice that it is very clear that it is not LIB_H but ANY_H ;-) Yeah, it has been that way for quite a while. I don't know if it is that big a deal, but it would not be unreasonable to do a patch to rename on top (I am not sure what the new one would be; ANY_H is probably OK). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
On Mon, Aug 25, 2014 at 11:26:39AM -0700, Junio C Hamano wrote: Good digging, and I agree that it should use the FLEX_ARRAY for consistency. I can produce a patch, but I did not want to steal Arjun's thunder. Arjun, did my proposal make sense? Do you want to try implementing that? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Check order when reading index
On Mon, Aug 25, 2014 at 10:21:58AM -0700, Junio C Hamano wrote: + if (ce_stage(ce) = ce_stage(next_ce)) + die(Unordered stage entries for '%s', + ce-name); Not quite. We do allow multiple higher stage entries; having two or more stage #1 entries is perfectly fine during a merge resolution, and both ce and next_ce may be pointing at the stage #1 entries of the same path. Replacing the comparison with is sufficient, I think. For my own curiosity, how do you get into this situation, and what does it mean to have multiple stage#1 entries for the same path? What would git cat-file :1:path output? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
Jeff King wrote: -LOCALIZED_C := $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) +LOCALIZED_C = $(C_OBJ:o=c) $(GENERATED_H) Why is LIB_H dropped here? This would mean that po/git.pot stops including strings from macros and static inline functions in headers (e.g., in parse-options.h). The rest looks good. Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
On Mon, Aug 25, 2014 at 12:46:41PM -0700, Jonathan Nieder wrote: Jeff King wrote: -LOCALIZED_C := $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) +LOCALIZED_C = $(C_OBJ:o=c) $(GENERATED_H) Why is LIB_H dropped here? This would mean that po/git.pot stops including strings from macros and static inline functions in headers (e.g., in parse-options.h). Ick, this is an accidental leftover from the earlier iteration of the patch, which moved that part to a separate line (inside my gross GIT_REAL_POT conditional). The extra line went away, but I forgot to add $(LIB_H) back in here. Thanks for noticing. Here's a revised version of the patch which fixes it (and I double-checked to make sure it continues to not execute the find unless you are doing a make pot). -- 8 -- Subject: Makefile: use `find` to determine static header dependencies Most modern platforms will use automatically computed header dependencies to figure out when a C file needs rebuilt due to a header changing. With old compilers, however, we fallback to a static list of header files. If any of them changes, we recompile everything. This is overly conservative, but the best we can do on older platforms. It is unfortunately easy for our static header list to grow stale, as none of the regular developers make use of it. Instead of trying to keep it up to date, let's invoke find to generate the list dynamically. Since we do not use the value $(LIB_H) unless either COMPUTE_HEADER_DEPENDENCIES is turned on or the user is building po/git.pot (where it comes in via $(LOCALIZED_C), make is smart enough to not even run this find in most cases. However, we do need to stop using the immediate variable assignment := for $(LOCALIZED_C). That's OK, because it was not otherwise useful here. Signed-off-by: Jeff King p...@peff.net --- Makefile | 140 --- 1 file changed, 8 insertions(+), 132 deletions(-) diff --git a/Makefile b/Makefile index cf0ccdf..a4fc440 100644 --- a/Makefile +++ b/Makefile @@ -432,7 +432,6 @@ XDIFF_OBJS = VCSSVN_OBJS = GENERATED_H = EXTRA_CPPFLAGS = -LIB_H = LIB_OBJS = PROGRAM_OBJS = PROGRAMS = @@ -631,131 +630,11 @@ VCSSVN_LIB = vcs-svn/lib.a GENERATED_H += common-cmds.h -LIB_H += advice.h -LIB_H += archive.h -LIB_H += argv-array.h -LIB_H += attr.h -LIB_H += bisect.h -LIB_H += blob.h -LIB_H += branch.h -LIB_H += builtin.h -LIB_H += bulk-checkin.h -LIB_H += bundle.h -LIB_H += cache-tree.h -LIB_H += cache.h -LIB_H += color.h -LIB_H += column.h -LIB_H += commit.h -LIB_H += compat/bswap.h -LIB_H += compat/mingw.h -LIB_H += compat/obstack.h -LIB_H += compat/poll/poll.h -LIB_H += compat/precompose_utf8.h -LIB_H += compat/terminal.h -LIB_H += compat/win32/dirent.h -LIB_H += compat/win32/pthread.h -LIB_H += compat/win32/syslog.h -LIB_H += connected.h -LIB_H += convert.h -LIB_H += credential.h -LIB_H += csum-file.h -LIB_H += decorate.h -LIB_H += delta.h -LIB_H += diff.h -LIB_H += diffcore.h -LIB_H += dir.h -LIB_H += exec_cmd.h -LIB_H += ewah/ewok.h -LIB_H += ewah/ewok_rlw.h -LIB_H += fetch-pack.h -LIB_H += fmt-merge-msg.h -LIB_H += fsck.h -LIB_H += gettext.h -LIB_H += git-compat-util.h -LIB_H += gpg-interface.h -LIB_H += graph.h -LIB_H += grep.h -LIB_H += hashmap.h -LIB_H += help.h -LIB_H += http.h -LIB_H += kwset.h -LIB_H += levenshtein.h -LIB_H += line-log.h -LIB_H += line-range.h -LIB_H += list-objects.h -LIB_H += ll-merge.h -LIB_H += log-tree.h -LIB_H += mailmap.h -LIB_H += merge-blobs.h -LIB_H += merge-recursive.h -LIB_H += mergesort.h -LIB_H += notes-cache.h -LIB_H += notes-merge.h -LIB_H += notes-utils.h -LIB_H += notes.h -LIB_H += object.h -LIB_H += pack-objects.h -LIB_H += pack-revindex.h -LIB_H += pack.h -LIB_H += pack-bitmap.h -LIB_H += parse-options.h -LIB_H += patch-ids.h -LIB_H += pathspec.h -LIB_H += pkt-line.h -LIB_H += prio-queue.h -LIB_H += progress.h -LIB_H += prompt.h -LIB_H += quote.h -LIB_H += reachable.h -LIB_H += reflog-walk.h -LIB_H += refs.h -LIB_H += remote.h -LIB_H += rerere.h -LIB_H += resolve-undo.h -LIB_H += revision.h -LIB_H += run-command.h -LIB_H += send-pack.h -LIB_H += sequencer.h -LIB_H += sha1-array.h -LIB_H += sha1-lookup.h -LIB_H += shortlog.h -LIB_H += sideband.h -LIB_H += sigchain.h -LIB_H += strbuf.h -LIB_H += streaming.h -LIB_H += string-list.h -LIB_H += submodule.h -LIB_H += tag.h -LIB_H += tar.h -LIB_H += thread-utils.h -LIB_H += transport.h -LIB_H += tree-walk.h -LIB_H += tree.h -LIB_H += unpack-trees.h -LIB_H += unicode_width.h -LIB_H += url.h -LIB_H += urlmatch.h -LIB_H += userdiff.h -LIB_H += utf8.h -LIB_H += varint.h -LIB_H += vcs-svn/fast_export.h -LIB_H += vcs-svn/line_buffer.h -LIB_H += vcs-svn/repo_tree.h -LIB_H += vcs-svn/sliding_window.h -LIB_H += vcs-svn/svndiff.h -LIB_H += vcs-svn/svndump.h -LIB_H += walker.h -LIB_H += wildmatch.h -LIB_H += wt-status.h -LIB_H += xdiff-interface.h -LIB_H += xdiff/xdiff.h -LIB_H += xdiff/xdiffi.h -LIB_H += xdiff/xemit.h -LIB_H += xdiff/xinclude.h -LIB_H += xdiff/xmacros.h -LIB_H +=
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
On Mon, Aug 25, 2014 at 04:00:42PM -0400, Jeff King wrote: On Mon, Aug 25, 2014 at 12:46:41PM -0700, Jonathan Nieder wrote: Jeff King wrote: -LOCALIZED_C := $(C_OBJ:o=c) $(LIB_H) $(GENERATED_H) +LOCALIZED_C = $(C_OBJ:o=c) $(GENERATED_H) Why is LIB_H dropped here? This would mean that po/git.pot stops including strings from macros and static inline functions in headers (e.g., in parse-options.h). Ick, this is an accidental leftover from the earlier iteration of the patch, which moved that part to a separate line (inside my gross GIT_REAL_POT conditional). The extra line went away, but I forgot to add $(LIB_H) back in here. Thanks for noticing. As an aside, this makes an interesting case study for our git.git workflow. In some workflows, I would have made the original unacceptable patch, you would have reviewed it, and then I would have made the followup patch to adjust it, and you would have reviewed that. But it's quite hard to see my mistake in just the followup patch; the fact that $(LIB_H) was originally part of $(LOCALIZED_C) does not appear in that hunk at all. But in our workflow, we squash out the unacceptable diff, and you review from scratch the movement from the original working state (assuming the status quo was working :) ) to the newly proposed state. And there the mistake is quite a bit more obvious. Just an interesting observation. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] git-imap-send: use libcurl for implementation
Am 2014-08-19 um 19:51 schrieb Junio C Hamano: This looks strange and stands out like a sore thumb. Do any of our other sources do this kind of macro tweaking inside C source before including git-compat-util.h (or its equivalent like cache.h)? I haven't checked, but I agree that it's desirable to avoid. I think what you are trying to do is to change the meaning of NO_OPENSSL, which merely means we do not have OpenSSL library and do not want to link with it, locally to we may or may not have and use OpenSSL library elsewhere in Git, but in the code below we do not want to use the code written to work on top of OpenSSL and instead use libcurl. ...and we don't want to link to OpenSSL in that case. Yeah. Because of that, you define NO_OPENSSL before including any of our headers to cause us not to include the headers, and typedef away SSL, for example. The SSL un-typedef'ing was there before, but it's true that I'm defining NO_OPENSSL on the very top so the included headers don't require OpenSSL (and so we don't have to link to it later). #include cache.h #include credential.h #include exec_cmd.h @@ -29,6 +33,9 @@ #ifdef NO_OPENSSL typedef void *SSL; #endif +#ifdef USE_CURL_FOR_IMAP_SEND +#include http.h +#endif But does it have to be that way? For one thing, doing it this way, the user has to make a compilation-time choice, but if you separate these compilation time macro into two, one for can we even link with and use OpenSSL? (which is what NO_OPENSSL is about) and another for do we want an ability to talk to imap via libcurl?, wouldn't it make it possible for you to switch between them at runtime (e.g. you might want to go over the direct connection when tunneling, while letting libcurl do the heavy lifting in non-tunneled operation)? Yeah, but I still need to wrap the non-tunneled operation in #ifdefs in case we don't USE_CURL_FOR_IMAP_SEND in which case we fall back to the handwritten IMAP code, don't I? Maybe I'm not getting you entirely right, but if I don't typedef NO_OPENSSL if USE_CURL_FOR_IMAP_SEND is defined, I don't see any way to not link to OpenSSL, even if it's not required. I'm including a slightly modified patch which does that (hoping that I've finally managed to send a usable patch). Sorry it's nothing breathtakingly better than before, even after giving this some thought I didn't arrive at a very elegant new solution... (see below for more on that) @@ -1417,18 +1476,48 @@ int main(int argc, char **argv) } /* write it to the imap server */ + +#ifdef USE_CURL_FOR_IMAP_SEND +if (!server.tunnel) { +curl = setup_curl(server); +curl_easy_setopt(curl, CURLOPT_READDATA, msgbuf); +} else { +#endif ctx = imap_open_store(server); if (!ctx) { fprintf(stderr, failed to open store\n); return 1; } +ctx-name = server.folder; +#ifdef USE_CURL_FOR_IMAP_SEND +} +#endif fprintf(stderr, sending %d message%s\n, total, (total != 1) ? s : ); -ctx-name = imap_folder; while (1) { unsigned percent = n * 100 / total; fprintf(stderr, %4u%% (%d/%d) done\r, percent, n, total); +#ifdef USE_CURL_FOR_IMAP_SEND +if (!server.tunnel) { ... +} +} else { +#endif if (!split_msg(all_msgs, msg, ofs)) break; if (server.use_html) @@ -1436,11 +1525,19 @@ int main(int argc, char **argv) r = imap_store_msg(ctx, msg); if (r != DRV_OK) break; +#ifdef USE_CURL_FOR_IMAP_SEND +} +#endif n++; } fprintf(stderr, \n); +#ifdef USE_CURL_FOR_IMAP_SEND +curl_easy_cleanup(curl); +curl_global_cleanup(); +#else imap_close_store(ctx); +#endif return 0; } Ugly. Can we do this with less #ifdef/#else/#endif in the primary code path? It is ugly, but as much as I'd love to put e.g. +#ifdef USE_CURL_FOR_IMAP_SEND + if (!server.tunnel) { + curl = setup_curl(server); etc into imap_open_store (and similarly for imap_store_msg etc), I don't see any easy way to do it; imap_open_store's return type would still be struct imap_store* in the non-tunneling case, and CURL* otherwise. So the best I can come up with here is merging some of the #ifdef blocks, but that means duplicating the code that applies in both cases. But that isn't any better, is it? If we were to keep these two modes as a choice the users have to make at the compilation time, that is. As stated above, I'm not sure how to do entirely without at least those two compile time switches (NO_OPENSSL and USE_CURL_FOR_IMAP_SEND). Sorry, perhaps I missed something obvious; grateful for any hints on how to do it better. Bernhard Documentation/git-imap-send.txt | 3 +- INSTALL | 15 --- Makefile
Re: [PATCH 1/2] Check order when reading index
On Mon, Aug 25, 2014 at 7:21 PM, Junio C Hamano gits...@pobox.com wrote: Jaime Soriano Pastor jsorianopas...@gmail.com writes: Subject: Re: [PATCH 1/2] Check order when reading index Please be careful when crafting the commit title. This single line will be the only one that readers will have to identify the change among hundreds of entries in git shortlog output when trying to see what kind of change went into the project during the given period. Something like: read_index_from(): catch out of order entries while reading an index file perhaps? Ok, reprashing it. +void check_ce_order(struct cache_entry *ce, struct cache_entry *next_ce) Does this have to be global, i.e. not static void ...? Not really, changing it to static. + if (ce_stage(ce) = ce_stage(next_ce)) + die(Unordered stage entries for '%s', + ce-name); Not quite. We do allow multiple higher stage entries; having two or more stage #1 entries is perfectly fine during a merge resolution, and both ce and next_ce may be pointing at the stage #1 entries of the same path. Replacing the comparison with is sufficient, I think. Ok, but like Jeff, I'm also curious about how to have multiple stage #1 entries for the same path. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
On 26 August 2014 01:05, Jeff King p...@peff.net wrote: On Mon, Aug 25, 2014 at 11:26:39AM -0700, Junio C Hamano wrote: Good digging, and I agree that it should use the FLEX_ARRAY for consistency. I can produce a patch, but I did not want to steal Arjun's thunder. Please feel free to do so. Arjun, did my proposal make sense? Do you want to try implementing that? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
Jeff King wrote: It is unfortunately easy for our static header list to grow stale, as none of the regular developers make use of it. Instead of trying to keep it up to date, let's invoke find to generate the list dynamically. Yep, I like this. It does mean that people who run make pot have to be a little more vigilant about not keeping around extra .h files by mistake. But I trust them. [...] +LIB_H = $(shell $(FIND) . \ + -name .git -prune -o \ + -name t -prune -o \ + -name Documentation -prune -o \ + -name '*.h' -print) Tiny nit: I might use $(shell $(FIND) * \ -name . -o -name '.*' -prune -o \ -name t -prune -o \ -name Documentation -prune -o \ -name '*.h' -print) or $(shell $(FIND) * \ -name '.?*' -prune -o \ -name t -prune -o \ -name Documentation -prune -o \ -name '*.h' -print) to avoid spending time looking in other dot-directories like .svn, .hg, or .snapshot. With or without such a change, Reviewed-by: Jonathan Nieder jrnie...@gmail.com -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/2] Check order when reading index
On Mon, Aug 25, 2014 at 12:44 PM, Jeff King p...@peff.net wrote: For my own curiosity, how do you get into this situation, and what does it mean to have multiple stage#1 entries for the same path? What would git cat-file :1:path output? That is how we natively (read: not with the funky virtual stuff merge-recursive does) express a merge with multiple merge bases. You also should be able to read this in the way how git merge invokes merge strategies (one or more bases, double-dash and then current HEAD and the other branches). I think there are some tests in 3way merge tests that checks what should happen when our HEAD matches one of the stage #1 while their branch matches a different one of the stage #1, too. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
Jonathan Nieder jrnie...@gmail.com writes: Jeff King wrote: It is unfortunately easy for our static header list to grow stale, as none of the regular developers make use of it. Instead of trying to keep it up to date, let's invoke find to generate the list dynamically. Yep, I like this. It does mean that people who run make pot have to be a little more vigilant about not keeping around extra .h files by mistake. But I trust them. [...] +LIB_H = $(shell $(FIND) . \ +-name .git -prune -o \ +-name t -prune -o \ +-name Documentation -prune -o \ +-name '*.h' -print) Tiny nit: I might use $(shell $(FIND) * \ -name . -o -name '.*' -prune -o \ -name t -prune -o \ -name Documentation -prune -o \ -name '*.h' -print) or $(shell $(FIND) * \ -name '.?*' -prune -o \ -name t -prune -o \ -name Documentation -prune -o \ -name '*.h' -print) to avoid spending time looking in other dot-directories like .svn, .hg, or .snapshot. Wouldn't it be sufficient to start digging not from * but from ??*? That is, something like find ??* \( -name Documentation -o -name .\?\* \) -prune -o -name \*.h ;-) With or without such a change, Reviewed-by: Jonathan Nieder jrnie...@gmail.com Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC] git-imap-send: use libcurl for implementation
Bernhard Reiter ock...@raz.or.at writes: Yeah, but I still need to wrap the non-tunneled operation in #ifdefs in case we don't USE_CURL_FOR_IMAP_SEND in which case we fall back to the handwritten IMAP code, don't I? We do not mind multiple implementations of the same helper function that are guarded with #ifdef/#endif, and we do use that style quite a lot. Would it help? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
Jeff King p...@peff.net writes: On Mon, Aug 25, 2014 at 11:26:39AM -0700, Junio C Hamano wrote: Good digging, and I agree that it should use the FLEX_ARRAY for consistency. I can produce a patch, but I did not want to steal Arjun's thunder. Hmph, would it have to overlap? I think we can queue Arjun's patch with +1 fix and FLEX_ARRAY thing separately, and they can go in in any order, no? Arjun, did my proposal make sense? Do you want to try implementing that? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
Arjun Sreedharan arjun...@gmail.com writes: Find and allocate the required amount instead of allocating extra 100 bytes Signed-off-by: Arjun Sreedharan arjun...@gmail.com --- Interesting. How much memory do we typically waste (in other words, how big did cnt grew in your repository where you noticed this)? bisect.c | 7 +-- 1 file changed, 5 insertions(+), 2 deletions(-) diff --git a/bisect.c b/bisect.c index d6e851d..c96aab0 100644 --- a/bisect.c +++ b/bisect.c @@ -215,10 +215,13 @@ static struct commit_list *best_bisection_sorted(struct commit_list *list, int n } qsort(array, cnt, sizeof(*array), compare_commit_dist); for (p = list, i = 0; i cnt; i++) { - struct name_decoration *r = xmalloc(sizeof(*r) + 100); + char name[100]; + sprintf(name, dist=%d, array[i].distance); + int name_len = strlen(name); Decl-after-stmt. You do not have to run a separate strlen() for this. The return value from sprintf() should tell you how much you need to allocate, I think. + struct name_decoration *r = xmalloc(sizeof(*r) + name_len); struct object *obj = (array[i].commit-object); - sprintf(r-name, dist=%d, array[i].distance); + memcpy(r-name, name, name_len + 1); r-next = add_decoration(name_decoration, obj, r); p-item = array[i].commit; p = p-next; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
Junio C Hamano wrote: Jonathan Nieder jrnie...@gmail.com writes: Tiny nit: I might use $(shell $(FIND) * \ -name . -o -name '.*' -prune -o \ -name t -prune -o \ -name Documentation -prune -o \ -name '*.h' -print) or $(shell $(FIND) * \ -name '.?*' -prune -o \ -name t -prune -o \ -name Documentation -prune -o \ -name '*.h' -print) to avoid spending time looking in other dot-directories like .svn, .hg, or .snapshot. Wouldn't it be sufficient to start digging not from * but from ??*? Gah, the * was supposed to be . in my examples (though it doesn't hurt). find ??* \( -name Documentation -o -name .\?\* \) -prune -o -name \*.h Heh. Yeah, that would work. ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] bisect: save heap memory. allocate only the required amount
Jeff King p...@peff.net writes: The string will always be dist= followed by decimal representation of a count that fits in int anyway, so I actually think use of strbuf is way overkill (and formatting it twice also is); the patch as posted should be just fine. I think you are right, and the patch is the right direction (assuming we want to do this; I question whether there are enough elements in the list for us to care about the size, and if there are, we are probably better off storing the int and formatting the strings on the fly). ;-) It would be nice if there was some way to abstract the idea of formatting a buffer directly into a flex-array. That would involve the double-format you mention, but we could use it in lots of places to make the code nicer ... struct name_decoration *r = fmt_flex_array(sizeof(*r), offsetof(*r, name), dist=%d, x); which is a little less nice. You could make it nicer with a macro, but we don't assume variadic macros. sigh At first I thought Yuck. A helper only to format into the flex member that holds a string?, and I tried to change my mind, but I couldn't quite convince myself. At least not yet. Among the flex arrays we use, some are arrays of bools, some others are arrays of object names, and there are many arrays of even more esoteric types. Only a small number of them are We want a struct with a constant string, and we do not want to do two allocations and to pay an extra dereference cost by using 'const char *'. For them, by the time we allocate a struct, by definition we should have sufficient information to compute how large to make that structure and a printf-format plus its args would be the preferred form of that sufficient information, I would think. The name fmt_flex_array(), which stresses too much on the formatting side without implying that it is the way to allocate the thing, may be horrible, and I agree with you that without variadic macros the end result may not read very well. Unless we have great many number of places we can use the helper to make the code to create these objects look nicer, I am afraid that the pros-and-cons may not be very favourable. Thanks for an interesting tangent. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/3] Makefile: use `find` to determine static header dependencies
Jonathan Nieder jrnie...@gmail.com writes: Wouldn't it be sufficient to start digging not from * but from ??*? Gah, the * was supposed to be . in my examples (though it doesn't hurt). find ??* \( -name Documentation -o -name .\?\* \) -prune -o -name \*.h Heh. Yeah, that would work. ;-) Continuing useless discussion... Actually as you are not excluding CVS, RCS, etc., and using ??* as the starting point will exclude .git, .hg, etc. at the top, I think we can shorten it even further and say find ??* -name Documentation -prune -o -name \*.h or something. ...and time to go back to something more serious and practical. Don't we want to exclude contrib/ by the way? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [GIT PULL] l10n updates for 2.1.0 round 1
Jiang Xin worldhello@gmail.com writes: The following changes since commit 49f1cb93a2f11845cfa2723611a729d3d7f02f0d: Git 2.1.0-rc0 (2014-07-27 15:22:22 -0700) are available in the git repository at: git://github.com/git-l10n/git-po for you to fetch changes up to f7fbc357f863ecc5323f3fcf2fc9cbf2aa2a8587: l10n: fr.po (2257t) update for version 2.1.0 (2014-08-07 09:07:18 +0200) Thanks. I finally got around pulling this one. Sorry for the delay. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v5 1/4] convert: Refactor would_convert_to_git() to single arg 'path'
Steffen Prohaska proha...@zib.de writes: It is only the path that matters in the decision whether to filter or not. Clarify this by making path the single argument of would_convert_to_git(). Signed-off-by: Steffen Prohaska proha...@zib.de --- I've retitled this as: convert: drop arguments other than 'path' from would_convert_to_git() to match the output from git shortlog --since=3.months --no-merges by using lowercase 'd' after the convert: area name, and also more importantly avoid calling refactor which this change is not. Thanks. convert.h | 5 ++--- sha1_file.c | 2 +- 2 files changed, 3 insertions(+), 4 deletions(-) diff --git a/convert.h b/convert.h index 0c2143c..c638b33 100644 --- a/convert.h +++ b/convert.h @@ -40,10 +40,9 @@ extern int convert_to_working_tree(const char *path, const char *src, size_t len, struct strbuf *dst); extern int renormalize_buffer(const char *path, const char *src, size_t len, struct strbuf *dst); -static inline int would_convert_to_git(const char *path, const char *src, -size_t len, enum safe_crlf checksafe) +static inline int would_convert_to_git(const char *path) { - return convert_to_git(path, src, len, NULL, checksafe); + return convert_to_git(path, NULL, 0, NULL, 0); } /* diff --git a/sha1_file.c b/sha1_file.c index 3f70b1d..00c07f2 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3144,7 +3144,7 @@ int index_fd(unsigned char *sha1, int fd, struct stat *st, if (!S_ISREG(st-st_mode)) ret = index_pipe(sha1, fd, type, path, flags); else if (size = big_file_threshold || type != OBJ_BLOB || - (path would_convert_to_git(path, NULL, 0, 0))) + (path would_convert_to_git(path))) ret = index_core(sha1, fd, size, type, path, flags); else ret = index_stream(sha1, fd, size, type, path, flags); -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Show containing branches in log?
On Thu, Jul 3, 2014 at 2:41 PM, Øyvind A. Holm su...@sunbase.org wrote: On 2 July 2014 16:50, Robert Dailey rcdailey.li...@gmail.com wrote: I know that with the `git branch` command I can determine which branches contain a commit. Is there a way to represent this graphically with `git log`? Sometimes I just have a commit, and I need to find out what branch contains that commit. The reason why `git branch --contains` doesn't solve this problem for me is that it names almost all branches because of merge commits. Too much ancestry has been built since this commit, so there is no way to find the closest branch that contains that commit. Is there a way to graphically see what is the nearest named ref to the specified commit in the logs? I have created a script for just this functionality which I use very often, and have created a gist with the files at https://gist.github.com/sunny256/2eb583f21e0ffcfe994f, I think it should solve your problem. It contains these files: git-wn wn means What's New and will create a visual graph of all commits which has a specified ref as ancestor. It also needs the following script, just put it into your $PATH somewhere: git-lc lc means List branches Containing this commit and generates a list of all branches containing a specified ref. The files originates from https://github.com/sunny256/utils, but I've modified them in the gist to make your life easier. :) Hope that helps, Øyvind I'm finally getting around to trying this out but it isn't working on Windows because there is no fmt command in msysgit. Do you have a workaround I can use? Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Transaction patch series overview
Jonathan Nieder wrote: Ronnie Sahlberg wrote: ref-transaction-1 (2014-07-16) 20 commits - Second batch of ref transactions - refs.c: make delete_ref use a transaction - refs.c: make prune_ref use a transaction to delete the ref - refs.c: remove lock_ref_sha1 - refs.c: remove the update_ref_write function - refs.c: remove the update_ref_lock function - refs.c: make lock_ref_sha1 static - walker.c: use ref transaction for ref updates - fast-import.c: use a ref transaction when dumping tags - receive-pack.c: use a reference transaction for updating the refs - refs.c: change update_ref to use a transaction - branch.c: use ref transaction for all ref updates - fast-import.c: change update_branch to use ref transactions - sequencer.c: use ref transactions for all ref updates - commit.c: use ref transactions for updates - replace.c: use the ref transaction functions for updates - tag.c: use ref transactions when doing updates - refs.c: add transaction.status and track OPEN/CLOSED/ERROR - refs.c: make ref_transaction_begin take an err argument - refs.c: update ref_transaction_delete to check for error and return status - refs.c: change ref_transaction_create to do error checking and return status (this branch is used by rs/ref-transaction, rs/ref-transaction-multi, rs/ref-transaction-reflog and rs/ref-transaction-rename.) [...] I've having trouble keeping track of how patches change, which patches have been reviewed and which haven't, unaddressed comments, and so on, so as an experiment I've pushed this part of the series to the Gerrit server at https://code-review.googlesource.com/#/q/topic:ref-transaction-1 Outcome of the experiment: patches gained some minor changes and then 1-12 Reviewed-by: Jonathan Nieder jrnie...@gmail.com 13 Reviewed-by: Michael Haggerty mhag...@alum.mit.edu Reviewed-by: Jonathan Nieder jrnie...@gmail.com 14 Reviewed-by: Jonathan Nieder jrnie...@gmail.com 15-16 Reviewed-by: Michael Haggerty mhag...@alum.mit.edu Reviewed-by: Jonathan Nieder jrnie...@gmail.com 17 Reviewed-by: Michael Haggerty mhag...@alum.mit.edu 18 Reviewed-by: Michael Haggerty mhag...@alum.mit.edu Reviewed-by: Jonathan Nieder jrnie...@gmail.com 19 Reviewed-by: Jonathan Nieder jrnie...@gmail.com 20 Reviewed-by: Michael Haggerty mhag...@alum.mit.edu Reviewed-by: Jonathan Nieder jrnie...@gmail.com I found the web UI helpful in seeing how each patch evolved since I last looked at it. Interdiff relative to the version in pu is below. I'm still hoping for a tweak in response to a minor comment and then I can put up a copy of the updated series to pull. The next series from Ronnie's collection is available at https://code-review.googlesource.com/#/q/topic:ref-transaction in case someone wants a fresh series to look at. diff --git a/branch.c b/branch.c index c1eae00..37ac555 100644 --- a/branch.c +++ b/branch.c @@ -305,6 +305,7 @@ void create_branch(const char *head, ref_transaction_commit(transaction, msg, err)) die(%s, err.buf); ref_transaction_free(transaction); + strbuf_release(err); } if (real_ref track) diff --git a/builtin/commit.c b/builtin/commit.c index 668fa6a..9bf1003 100644 --- a/builtin/commit.c +++ b/builtin/commit.c @@ -1765,8 +1765,8 @@ int cmd_commit(int argc, const char **argv, const char *prefix) transaction = ref_transaction_begin(err); if (!transaction || ref_transaction_update(transaction, HEAD, sha1, - current_head ? - current_head-object.sha1 : NULL, + current_head + ? current_head-object.sha1 : NULL, 0, !!current_head, err) || ref_transaction_commit(transaction, sb.buf, err)) { rollback_index_files(); @@ -1801,5 +1801,6 @@ int cmd_commit(int argc, const char **argv, const char *prefix) if (!quiet) print_summary(prefix, sha1, !current_head); + strbuf_release(err); return 0; } diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 91099ad..224fadc 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -194,7 +194,7 @@ static void write_head_info(void) struct command { struct command *next; - char *error_string; + const char *error_string; unsigned int skip_update:1, did_not_exist:1; int index; @@ -468,7 +468,7 @@ static int update_shallow_ref(struct command *cmd, struct shallow_info *si) return 0; } -static char *update(struct command *cmd, struct shallow_info *si) +static const char *update(struct command *cmd, struct