Re: Troubleshoot clone issue to NFS.
On Fri, May 22, 2015 at 2:12 PM, Jeff King p...@peff.net wrote: So I think there are two possibilities for improving this: 1. Find places where we expect the object will not exist (like the collision_test check you pointed out) and use a has_sha1_file_fast that accepts that it may very occasionally erroneously return false. In this case it would mean potentially skipping a collision check, but I think that is OK. That could have security implications, but only if an attacker: a. has broken sha1 to generate a colliding object b. can manipulate the victim into repacking in a loop c. can manipulate the victim into fetching (or receiving a push) simultaneously with (b) at which point they can try to race the repack procedure to add their colliding object to the repository. It seems rather unlikely (especially part a). In case you want to back away from option 2 because it starts to leak raciness, which your old commit tried to fix in the first place. I think the only other place that tests for lots of non-existent loose objects is write_sha1_file (e.g. tar -xf bigtarball.tar.gz; cd bigtarball; git init; git add .). But the number of calls should be much smaller compared to index-pack and it does not use has_sha1_file, it uses check_and_freshen_file() instead. There are other places where has_sha1_file() may return 0, but I think the number of calls is even smaller to bother (shallow.c, fetch-pack.c, apply.c, buik-checkin.c) -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 2/2] git-p4: fix handling of multi-word P4EDITOR
On 07/05/15 23:16, Junio C Hamano wrote: Luke Diamand l...@diamand.org writes: [Resurrecting old thread] Looking at run-command.c, GIT_WINDOES_NATIVE and POSIX seems to use pretty much the same construct, except that they use SHELL_PATH instead of sh. I think the state of git on Windows is a bit shaky (I'm happy to be proved wrong of course), but I think the only seriously active port is the msys one. That, as far as I can tell, uses an msys version of 'sh', so it will be perfectly happy with the sh -c ... construct. There may be a native windows port in existence, but I can't find how to build this, and I assume it's going to need Visual Studio, which makes it a lot more complex to get going. The code you were looking at in run-command.c says this: #ifndef GIT_WINDOWS_NATIVE nargv[nargc++] = SHELL_PATH; !GIT_WINDOWS_NATIVE #else nargv[nargc++] = sh; GIT_WINDOWS_NATIVE #endif nargv[nargc++] = -c; To me, that seems to imply that for GIT_WINDOWS_NATIVE, we take the *second* branch and use sh, so again, the the code as it stands will be fine. msysgit uses that path. (The next line, trying to use -c has no chance of working if Cmd is being used). So something like this may be sufficient, perhaps? Makefile | 1 + git-p4.py | 2 +- 2 files changed, 2 insertions(+), 1 deletion(-) diff --git a/Makefile b/Makefile index 20058f1..fda44bf 100644 --- a/Makefile +++ b/Makefile @@ -1776,6 +1776,7 @@ $(SCRIPT_PYTHON_GEN): GIT-CFLAGS GIT-PREFIX GIT-PYTHON-VARS $(SCRIPT_PYTHON_GEN): % : %.py $(QUIET_GEN)$(RM) $@ $@+ \ sed -e '1s|#!.*python|#!$(PYTHON_PATH_SQ)|' \ + -e 's|SHELL_PATH|$(SHELL_PATH_SQ)|g' \ $ $@+ \ chmod +x $@+ \ mv $@+ $@ diff --git a/git-p4.py b/git-p4.py index de06046..eb6d4b1 100755 --- a/git-p4.py +++ b/git-p4.py @@ -1220,7 +1220,7 @@ class P4Submit(Command, P4UserMap): editor = os.environ.get(P4EDITOR) else: editor = read_pipe(git var GIT_EDITOR).strip() -system([sh, -c, ('%s $@' % editor), editor, template_file]) +system(['''SHELL_PATH''', -c, ('%s $@' % editor), editor, template_file]) This seems to be expanded to '''sh''' which doesn't then work at all. I didn't take the time to investigate further though. # If the file was not saved, prompt to see if this patch should # be skipped. But skip this verification step if configured so. I don't think we need to do anything. msysgit works fine with the origin sh, -c, ... code. Thanks! Luke -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: recovering from unordered stage entries in index error
On Sat, May 23, 2015 at 9:47 AM, McHenry, Matt mmche...@carnegielearning.com wrote: So maybe you can do GIT_TRACE=2 git svn fetch and post the output. I'd expect to see something like git read-tree sha1 before fatal: unorder You can then use git ls-tree sha1 to examine this tree, try to sort the file list with LANG=C sort and compare with the original list. There is no read-tree in the output (below). The sha1 that is mentioned, 74332b7, is the one for the current trunk: Hm.. neither read-tree nor update-index is in the output. I can see git-svn closing stderr sometimes, but not sure if that's why we don't see these commands, or something else. Could you try again with GIT_TRACE=/absolute/path/to/some/where instead of GIT_TRACE=2 and post the content of /abso../some/where? -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCHv3 3/3] git-p4: tests: use test-chmtime in place of touch
Luke Diamand l...@diamand.org writes: diff --git a/t/t9813-git-p4-preserve-users.sh b/t/t9813-git-p4-preserve-users.sh index 166b840..fe65788 100755 --- a/t/t9813-git-p4-preserve-users.sh +++ b/t/t9813-git-p4-preserve-users.sh @@ -53,7 +53,8 @@ test_expect_success 'preserve users' ' git commit --author Alice al...@example.com -m a change by alice file1 git commit --author Bob b...@example.com -m a change by bob file2 git config git-p4.skipSubmitEditCheck true - P4EDITOR=touch P4USER=alice P4PASSWD=secret git p4 commit --preserve-user + P4EDITOR=test-chmtime +5 P4USER=alice P4PASSWD=secret + git p4 commit --preserve-user I think this hunk is wrong; we need to either change to \ to make it a single logical line that exports three environment variables only to git p4 commit --preserve-user, or insert export P4EDITOR P4USER P4PASSWD between these two lines. The latter seems to be what the remainder of the test is doing, so I'd fix this up to mimick them. Sorry for not catching it in the earlier review. Thanks. @@ -69,7 +70,7 @@ test_expect_success 'refuse to preserve users without perms' ' git config git-p4.skipSubmitEditCheck true echo username-noperms: a change by alice file1 git commit --author Alice al...@example.com -m perms: a change by alice file1 - P4EDITOR=touch P4USER=bob P4PASSWD=secret + P4EDITOR=test-chmtime +5 P4USER=bob P4PASSWD=secret export P4EDITOR P4USER P4PASSWD test_must_fail git p4 commit --preserve-user ! git diff --exit-code HEAD..p4/master -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] clone: add `--seed` shorthand
Jeff King p...@peff.net writes: Having slept on it, I really think --seed should be fetch from the seed into temp refs, and not what I posted earlier. Yeah, I think that is the right way to do it. And it happens to mesh well with the (not so well advertised but not so well hidden) plan to allow loaded server side to advise --seed from bundle hosted elsewhere, e.g. * clone connects to upload-pack * upload-pack advertises --seed=http://cdn.github.com/project.bundle via capability * clone disconnects from upload-pack, and runs (resumable) wget to the seed to grab bundle. * clone then fetches refs/*:refs/remotes/origin/* from the bundle * clone then continues to fetch into +refs/remotes/origin/* as usual, but does an equivalent of using --prune for this fetch to drop anything extra/stale that the seed bundle may have had. * optionally clone can immediately repack. ... which I wanted to see happen in a near future. And that --seed thing that can name a local bundle file is a very good first step toward the direction, I think. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 0/3] --seed as an alias for --dissociate --reference
Thanks for working on this. I have one little bikeshedding comment... On 05/21/2015 06:14 AM, Jeff King wrote: [...] There are a few open issues with this series: 1. Assuming that seed is a reasonable verb for this concept, is --seed=repo OK for the option? Would --seed-from=repo be better? (Also, the response bleh, seed is a terrible name is fine, too, but only if accompanied by your own suggestion :) ). I think seed is a pretty good name. The only downside is that seed suggests that the process injects just a few seeds that are much smaller than the whole. But in fact, (hopefully) this option causes the bulk of the needed objects to be pre-fetched. I can't think of any name that is clearly better. Some brainstorming: * prepare -- meh * prime (as in prime the pump). But prime alone could have too many meanings, so... * prime-from * pre-fetch or pre-fetch-from * pre-load or pre-load-from BTW I think --seed-from=repo is more self-explanatory than --seed=repo. Michael -- Michael Haggerty mhag...@alum.mit.edu -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: 'Minimal' diff-algorithm producing a different result than 'myers', 'patience' and 'histogram' ones
On Tue, May 12, 2015 at 03:07:46PM +0200, Dmitry Malikov wrote: I'm trying to compare 4 different git-diff algorithms and the 'minimal' one is the most vague and non-obvious. The documentation says Spend extra time to make sure the smallest possible diff is produced. - that's all. By any chance, is there any example of diff when 'minimal' algorithm produces a different result than a 'myers', 'patience' and 'histogram' ones? I don't know of a simple example offhand, but you can easily generate all of the patches for a repository with each type by doing: for i in myers minimal patience histogram; do git log --diff-algorithm=$i -p $i done In git.git, the output for each type is distinct. You might also find this thread interesting: http://thread.gmane.org/gmane.comp.version-control.git/143426 It mentions 717b83117 from git.git, whose minimal diff is over 300 lines shorter than the non-minimal one. It's not exactly a simple example, but I suspect you won't find a short case; minimal only matters in the complicated ones. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH/RFC 0/3] using stat() to avoid re-scanning pack dir
On Sat, May 23, 2015 at 08:19:03AM +0700, Duy Nguyen wrote: On Sat, May 23, 2015 at 6:51 AM, Jeff King p...@peff.net wrote: The other problem is that I'm not sure stat data is enough to notice when a directory changes. Certainly the mtime should change, but if you have only one-second resolution on your mtimes, we can be fooled. mtime may or may not change. I based my untracked-cache series entirely on this directory mtime and did some research about it. For UNIXy filesystems on UNIXy OSes, mtime should work as you expect. FAT on Windows does not (but FAT on Linux does..). NTFS works fine according to some MS document. No idea about CIFS. But people often just do open operation of a time and this racy is not an issue. It is only an issue on the busy server side, and you can make sure you run on the right fs+OS. Even on Linux+ext4, I think there is some raciness. For instance, the program at the end of this mail will loop forever, running stat on an open directory fd, then enumerating the entries in the directory. If we ever get the same stat data as the last iteration but different contents, then it complains. If you run it alongside something simple, like touching 10,000 files in the directory, it fails pretty quickly. This is because we have no atomicity guarantees with directories. We can stat() them, and then while we are reading them, somebody else can be changing the entries. Whether we see the before or after state depends on the timing. I'm not 100% sure this translates into real-world problems for packfiles. If you notice that an object is missing and re-scan the pack directory (stat-ing it during the re-scan), then the change that made the object go missing must have happened _before_ the stat, and would presumably be reflected in it (modulo mtime resolution issues). But I haven't thought that hard about it yet, and I have a nagging feeling that there will be problem cases. It might be that you could get an atomic read of the directory by doing a stat before and after and making sure they're the same (and if not, repeating the readdir() calls). But I think that suffers from the same mtime-resolution problems. Linux+ext4 _does_ have nanosecond mtimes, which perhaps is enough to assume that any change will be reflected. I dunno. I guess the most interesting test would be something closer to the real world: one process repeatedly making sure the object pointed to by master exists, and another one constantly rewriting master and then repacking the object. -- 8 -- #include cache.h #include string-list.h static void get_data(const char *path, struct string_list *entries, struct stat_validity *validity) { DIR *dir = opendir(path); struct dirent *de; stat_validity_update(validity, dirfd(dir)); while ((de = readdir(dir))) string_list_insert(entries, de-d_name); closedir(dir); } static int list_eq(const struct string_list *a, const struct string_list *b) { int i; if (a-nr != b-nr) return 0; for (i = 0; i a-nr; i++) if (strcmp(a-items[i].string, b-items[i].string)) return 0; return 1; } static void monitor(const char *path) { struct string_list last_entries = STRING_LIST_INIT_DUP; struct stat_validity last_validity; get_data(path, last_entries, last_validity); while (1) { struct string_list cur_entries = STRING_LIST_INIT_DUP; struct stat_validity cur_validity; get_data(path, cur_entries, cur_validity); if (!memcmp(last_validity, cur_validity, sizeof(last_validity)) !list_eq(cur_entries, last_entries)) die(mismatch); string_list_clear(last_entries, 0); memcpy(last_entries, cur_entries, sizeof(last_entries)); stat_validity_clear(last_validity); memcpy(last_validity, cur_validity, sizeof(last_validity)); } } int main(int argc, const char **argv) { monitor(*++argv); return 0; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/3] stat_validity: handle non-regular files
On Sat, May 23, 2015 at 01:00:06PM +0200, Michael Haggerty wrote: I don't have any insight about whether mtimes are reliable change indicators for directories. But if you make this change, you are changing the contract of the stat_validity functions: * Have you verified that no callers rely on the old stat_validity's check that the file is a regular file? I think they are OK if a directory appears (they notice the error in reading and call stat_validity_clear to throw away the result). But it would be a problem, I suppose, if you put a FIFO or something into $GIT_DIR/packed-refs. OTOH, that would suffer from the stat data changing constantly, so we would fall back to safe behavior. I don't know how careful we want to be. I guess the conservative choice would be to barf if it's not a file or directory, both of which can be handled pretty sanely. * The docstrings in cache.h need to be updated. Thanks, will fix if I re-roll (I'm still not convinced this is a robust thing to be doing overall). struct stat_validity { - struct stat_data *sd; + struct stat_data sd; + unsigned mode; }; Could the mode be stored directly in stat_data? I'd rather not. stat_data is shared with the cache_entry code, which holds the mode separately. I'd like to avoid impacting the index code at all. By comparing modes, you are not only comparing file type (S_ISREG vs S_ISDIR etc.) but also all of the file permissions. That seems OK with me but you might want to document that fact. Plus, I don't know whether POSIX allows additional, implementation-defined information to be stored in st_mode. If so, you would be comparing that, too. Yeah, I considered that. My feeling is that testing _more_ information is probably OK. Even if there is extra data there, it presumably doesn't change from call to call of stat() unless the directory changes. And if we are wrong, we fail safely (e.g., if the permissions change we don't technically need to re-read the pack directory, but it is OK to do so). -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html